Package-level

get or set package options for quanteda

An R package for the quantitative analysis of textual data

Data

Built-in data objects.

a paragraph of text for testing various text-based functions

immigration-related sections of 2010 UK party manifestos

US presidential inaugural address texts

Irish budget speeches from 2010

dfm from data in Table 1 of Laver, Benoit, and Garry (2003)

Corpus functions

Functions for constructing and manipulating corpus class objects.

coerce a compressed corpus to a standard corpus

recast the document units of a corpus

randomly sample documents from a corpus

segment texts into component elements

extract a subset of a corpus

remove sentences based on their token lengths or a pattern match

remove sentences based on their token lengths or a pattern match

base method extensions for corpus objects

construct a corpus object

get or set for document-level variables

get or set corpus metadata

get or set document-level meta-data

get or assign corpus texts

Tokens functions

Functions for constructing and manipulating tokens class objects.

coercion, checking, and combining functions for tokens objects

convert token sequences into compound tokens

apply a dictionary to a tokens object

create ngrams and skipgrams from tokens

select or remove tokens from a tokens object

convert the case of tokens

stem the terms in an object

tokenize a set of texts

Character functions

Functions for constructing and manipulating character objects.

convert the case of character objects

segment texts into component elements

create ngrams and skipgrams from tokens

stem the terms in an object

Text matrix functions

Functions for constructing and manipulating a document-feature matrix (dfm) or feature co-occurrence matrix object.

coerce a dfm to a matrix or data.frame

recombine a dfm or fcm by combining identical dimension elements

apply a dictionary to a dfm

randomly sample documents or features from a dfm

select features from a dfm or fcm

sort a dfm by frequency of one or more margins

convert the case of the features of a dfm and combine

trim a dfm using frequency threshold-based feature selection

weight the feature frequencies in a dfm

create a document-feature matrix

sort an fcm in alphabetical order of the features

create a feature co-occurrence matrix

return the first or last part of a dfm

coercion and checking functions for dfm objects

stem the terms in an object

Text Statistics

Functions for computing statistics from texts and dfm objects.

compute the sparsity of a document-feature matrix

calculate collocation statistics

calculate keyness statistics

calculate lexical diversity

calculate readability

Similarity and distance computation between documents or features

list the most frequent features

Dictionary functions

Constructor and utility functions for working with dictionaries.

convert quanteda dictionary objects to the YAML format

create a dictionary

check if an object is a dictionary

Phrase discovery functions

Functions for exploring and detecting keywords and phrases.

detect collocations from text

locate keywords-in-context

find variable-length collocations with filtering

calculate collocation statistics

Text plot functions

Plot functions for representing text and the analysis of texts.

plot a fitted scaling model

plot features as a wordcloud

plot the dispersion of key word(s)

Text Model Functions

Plot functions for fitting analytic models from text matrixes.

extract text model coefficients

correspondence analysis of a document-feature matrix

Naive Bayes classifier for texts

wordfish text model

Wordscores text model

wordshoal text model

Utility functions

R-like functions to return counts and object information.

get or set document names

get the feature labels from a dfm

count the number of documents or features

count the Scrabble letter values of text

count the number of sentences

count syllables in a text

count the number of tokens or types

access built-in stopwords

Miscellaneous functions

coerce a dist object into a list

bootstrap a dfm

convert a dfm to a non-quanteda format