Package-level

quanteda-package

An R package for the quantitative analysis of textual data

quanteda_options

get or set package options for quanteda

Data

Built-in data objects.

data_char_sampletext

a paragraph of text for testing various text-based functions

data_char_ukimmig2010

immigration-related sections of 2010 UK party manifestos

data_corpus_inaugural

US presidential inaugural address texts

data_corpus_irishbudget2010

Irish budget speeches from 2010

data_dfm_LBGexample

dfm from data in Table 1 of Laver, Benoit, and Garry (2003)

Corpus functions

Functions for constructing and manipulating corpus class objects.

corpus_reshape

recast the document units of a corpus

corpus_sample

randomly sample documents from a corpus

corpus_segment char_segment

segment texts into component elements

corpus_subset

extract a subset of a corpus

corpus_trim char_trim

remove sentences based on their token lengths or a pattern match

corpus

construct a corpus object

metacorpus metacorpus<-

get or set corpus metadata

docvars docvars<-

get or set for document-level variables

metadoc metadoc<-

get or set document-level meta-data

texts texts<- as.character

get or assign corpus texts

as.corpus

coerce a compressed corpus to a standard corpus

Tokens functions

Functions for constructing and manipulating tokens class objects.

tokens

tokenize a set of texts

tokens_compound

convert token sequences into compound tokens

tokens_lookup

apply a dictionary to a tokens object

tokens_ngrams char_ngrams tokens_skipgrams

create ngrams and skipgrams from tokens

tokens_select tokens_remove

select or remove tokens from a tokens object

tokens_tolower tokens_toupper

convert the case of tokens

tokens_wordstem char_wordstem dfm_wordstem

stem the terms in an object

as.tokens

coercion, checking, and combining functions for tokens objects

Character functions

Functions for constructing and manipulating character objects.

char_tolower char_toupper

convert the case of character objects

corpus_segment char_segment

segment texts into component elements

tokens_ngrams char_ngrams tokens_skipgrams

create ngrams and skipgrams from tokens

tokens_wordstem char_wordstem dfm_wordstem

stem the terms in an object

Text matrix functions

Functions for constructing and manipulating a document-feature matrix (dfm) or feature co-occurrence matrix object.

dfm

create a document-feature matrix

dfm_compress dfm_group fcm_compress

recombine a dfm or fcm by combining identical dimension elements

dfm_lookup

apply a dictionary to a dfm

dfm_sample

randomly sample documents or features from a dfm

dfm_select dfm_remove fcm_select fcm_remove

select features from a dfm or fcm

dfm_sort

sort a dfm by frequency of one or more margins

dfm_tolower dfm_toupper fcm_tolower fcm_toupper

convert the case of the features of a dfm and combine

dfm_trim

trim a dfm using frequency threshold-based feature selection

dfm_weight dfm_smooth

weight the feature frequencies in a dfm

tokens_wordstem char_wordstem dfm_wordstem

stem the terms in an object

head tail

return the first or last part of a dfm

is.dfm as.dfm

coercion and checking functions for dfm objects

as.matrix as.data.frame

coerce a dfm to a matrix or data.frame

fcm

create a feature co-occurrence matrix

fcm_sort

sort an fcm in alphabetical order of the features

Text Statistics

Functions for computing statistics from texts and dfm objects.

textstat_collocations is.collocations

calculate collocation statistics

textstat_keyness

calculate keyness statistics

textstat_lexdiv

calculate lexical diversity

textstat_readability

calculate readability

textstat_dist textstat_simil

Similarity and distance computation between documents or features

sparsity

compute the sparsity of a document-feature matrix

topfeatures

list the most frequent features

Dictionary functions

Constructor and utility functions for working with dictionaries.

dictionary

create a dictionary

is.dictionary

check if an object is a dictionary

as.yaml

convert quanteda dictionary objects to the YAML format

Phrase discovery functions

Functions for exploring and detecting keywords and phrases.

collocations

detect collocations from text

textstat_collocations is.collocations

calculate collocation statistics

sequences is.sequences

Find variable-length collocations

kwic is.kwic as.tokens

locate keywords-in-context

Text plot functions

Plot functions for representing text and the analysis of texts.

textplot_scale1d

plot a fitted scaling model

textplot_wordcloud

plot features as a wordcloud

textplot_xray

plot the dispersion of key word(s)

Text Model Functions

Plot functions for fitting analytic models from text matrixes.

textmodel_ca

correspondence analysis of a document-feature matrix

textmodel_NB

Naive Bayes classifier for texts

textmodel_wordfish

wordfish text model

textmodel_wordscores

Wordscores text model

textmodel_wordshoal

wordshoal text model

coef.textmodel

extract text model coefficients

Utility functions

R-like functions to return counts and object information.

ndoc nfeature

count the number of documents or features

nscrabble

count the Scrabble letter values of text

nsentence

count the number of sentences

nsyllable

count syllables in a text

ntoken ntype

count the number of tokens or types

docnames docnames<-

get or set document names

featnames

get the feature labels from a dfm

stopwords

access built-in stopwords

Miscellaneous functions

as.list

coerce a dist object into a list

convert

convert a dfm to a non-quanteda format

bootstrap_dfm

bootstrap a dfm