quanteda is an R package for managing and analyzing textual data developed by Kenneth Benoit and other contributors. Its initial development was supported by the European Research Council grant ERC-2011-StG 283794-QUANTESS.

The package is designed for R users needing to apply natural language processing to texts, from documents to final analysis. Its capabilities match or exceed those provided in many end-user software applications, many of which are expensive and not open source. The package is therefore of great benefit to researchers, students, and other analysts with fewer financial resources. While using quanteda requires R programming knowledge, its API is designed to enable powerful, efficient analysis with a minimum of steps. By emphasizing consistent design, furthermore, quanteda lowers the barriers to learning and using NLP and quantitative text analysis even for proficient R programmers.

How to Install

The normal way from CRAN, using your R GUI or

Or for the latest development version:

Because this compiles some C++ and Fortran source code, you will need to have installed the appropriate compilers.

If you are using a Windows platform, this means you will need also to install the Rtools software available from CRAN.

If you are using macOS, you should install the macOS tools, namely the Clang 6.x compiler and the GNU Fortran compiler (as quanteda requires gfortran to build). If you are still getting errors related to gfortran, follow the fixes here.

System Requirements

quanteda is cross-platform but we recommend MacOS or Linux as an operating system for their better handling of Unicode. RAM depends on the size and the structure of the textual data to analyze. Usually, a text file of 100MB on disk takes 500MB to 1GB on memory as a tokens object (short texts require more memory than long texts when the total numbers of words are the same).

Minimum Recommended
OS Windows/MacOS/Linux MacOS/Linux
CPU 1 core 4 cores or more
RAM 2GB 8GB more more
IDE R Studio

How to Use

See the quick start guide to learn how to use quanteda.

How to cite

Benoit, Kenneth, Kohei Watanabe, Haiyan Wang, Paul Nulty, Adam Obeng, Stefan Müller, and Akitaka Matsuo. (2018) “quanteda: An R package for the quantitative analysis of textual data”. Journal of Open Source Software. 3(30), 774. https://doi.org/10.21105/joss.00774.

For a BibTeX entry, use the output from citation(package = "quanteda").

Leaving Feedback

If you like quanteda, please consider leaving feedback or a testimonial here.

Contributing

Contributions in the form of feedback, comments, code, and bug reports are most welcome. How to contribute: