Package index • quanteda

Package-level

quanteda-package quanteda: An R package for the quantitative analysis of textual data
quanteda_options(): Get or set package options for quanteda

Data

Built-in data objects.

data_char_sampletext: A paragraph of text for testing various text-based functions
data_char_ukimmig2010: Immigration-related sections of 2010 UK party manifestos
data_corpus_inaugural: US presidential inaugural address texts
data_dfm_lbgexample: dfm from data in Table 1 of Laver, Benoit, and Garry (2003)
data_dictionary_LSD2015: Lexicoder Sentiment Dictionary (2015)
data-relocated data_corpus_dailnoconf1991 data_corpus_irishbudget2010: Formerly included data objects

Corpus functions

Functions for constructing and manipulating corpus class objects.

corpus(): Construct a corpus object
corpus_chunk(): Segment a corpus into chunks of a given size
corpus_group(): Combine documents in corpus by a grouping variable
corpus_reshape(): Recast the document units of a corpus
corpus_sample(): Randomly sample documents from a corpus
corpus_segment() char_segment(): Segment texts on a pattern match
corpus_subset(): Extract a subset of a corpus
corpus_trim() char_trim(): Remove sentences based on their token lengths or a pattern match
docvars() `docvars<-`() `$`(<corpus>) `$<-`(<corpus>) `$`(<tokens>) `$<-`(<tokens>) `$`(<dfm>) `$<-`(<dfm>): Get or set document-level variables
as.character(<corpus>) is.corpus() as.corpus(): Coercion and checking methods for corpus objects

Tokens functions

Functions for constructing and manipulating tokens class objects.

tokens(): Construct a tokens object
tokens_annotate(): Annotate a tokens object using a dictionary
tokens_chunk(): Segment tokens object by chunks of a given size
tokens_compound(): Convert token sequences into compound tokens
tokens_group(): Combine documents in a tokens object by a grouping variable
tokens_lookup(): Apply a dictionary to a tokens object
tokens_match(): Match the tokens IDs with given types
tokens_ngrams() char_ngrams() tokens_skipgrams(): Create n-grams and skip-grams from tokens
tokens_replace(): Replace tokens in a tokens object
tokens_sample(): Randomly sample documents from a tokens object
tokens_segment(): Segment tokens object by patterns
tokens_select() tokens_remove() tokens_keep(): Select or remove tokens from a tokens object
tokens_split(): Split tokens by a separator pattern
tokens_subset(): Extract a subset of a tokens
tokens_tolower() tokens_toupper(): Convert the case of tokens
tokens_trim(): Trim tokens using frequency threshold-based feature selection
tokens_wordstem() char_wordstem() dfm_wordstem(): Stem the terms in an object
is.tokens_xptr() as.tokens_xptr(): Methods for tokens_xptr objects
types(): Get word types from a tokens object
concat() concatenator(): Return the concatenator character from an object
as.list(<tokens>) as.character(<tokens>) is.tokens() as.tensor() as.matrix(<tokens>) as.tokens(): Coercion, checking, and combining functions for tokens objects

Character functions

Functions for constructing and manipulating character objects.

char_tolower() char_toupper(): Convert the case of character objects
corpus_segment() char_segment(): Segment texts on a pattern match
tokens_ngrams() char_ngrams() tokens_skipgrams(): Create n-grams and skip-grams from tokens
char_select() char_remove() char_keep(): Select or remove elements from a character vector
corpus_trim() char_trim(): Remove sentences based on their token lengths or a pattern match
tokens_wordstem() char_wordstem() dfm_wordstem(): Stem the terms in an object

Text matrix functions

Functions for constructing and manipulating a document-feature matrix (dfm) or feature co-occurrence matrix object.

dfm(): Create a document-feature matrix
dfm_compress() fcm_compress(): Recombine a dfm or fcm by combining identical dimension elements
dfm_group(): Combine documents in a dfm by a grouping variable
dfm_lookup(): Apply a dictionary to a dfm
dfm_match(): Match the dfm columns with given features
dfm_replace(): Replace features in dfm
dfm_sample(): Randomly sample documents from a dfm
dfm_select() dfm_remove() dfm_keep() fcm_select() fcm_remove() fcm_keep(): Select features from a dfm or fcm
dfm_sort(): Sort a dfm by frequency of one or more margins
dfm_subset(): Extract a subset of a dfm
dfm_tfidf(): Weight a dfm by tf-idf
dfm_tolower() dfm_toupper() fcm_tolower() fcm_toupper(): Convert the case of the features of a dfm and combine
dfm_trim(): Trim a dfm using frequency threshold-based feature selection
dfm_weight() dfm_smooth(): Weight the feature frequencies in a dfm
tokens_wordstem() char_wordstem() dfm_wordstem(): Stem the terms in an object
docfreq(): Compute the (weighted) document frequency of a feature
featfreq(): Compute the frequencies of features
head(<dfm>) tail(<dfm>): Return the first or last part of a dfm
as.dfm() is.dfm(): Coercion and checking functions for dfm objects
as.matrix(<dfm>): Coerce a dfm to a matrix or data.frame
fcm(): Create a feature co-occurrence matrix
fcm_sort(): Sort an fcm in alphabetical order of the features
as.fcm(): Coercion and checking functions for fcm objects

Dictionary functions

Constructor and utility functions for working with dictionaries.

dictionary(): Create a dictionary object
as.dictionary() is.dictionary(): Coercion and checking functions for dictionary objects
as.yaml(): Convert quanteda dictionary objects to the YAML format

Phrase discovery functions

Functions for exploring and detecting keywords and phrases.

is.collocations(): Check if an object is collocations
kwic() is.kwic() as.data.frame(<kwic>): Locate keywords-in-context

Utility functions

R-like functions to return counts and object information.

index() is.index(): Locate a pattern in a tokens object
ndoc() nfeat(): Count the number of documents or features
nsentence() deprecated: Count the number of sentences
ntoken() ntype(): Count the number of tokens or types
print(<corpus>) print(<dfm>) print(<dictionary2>) print(<fcm>) print(<kwic>) print(<tokens>): Print methods for quanteda core objects
docnames() `docnames<-`() docid() segid(): Get or set document names
featnames(): Get the feature labels from a dfm

Miscellaneous functions

phrase() as.phrase() is.phrase(): Declare a pattern to be a sequence of separate patterns
convert(): Convert quanteda objects to non-quanteda formats
bootstrap_dfm(): Bootstrap a dfm
meta() `meta<-`(): Get or set object metadata
spacyr-methods: Extensions for and from spacy_parse objects

Statistics, models, and plots

Functions for computing statistics, fitting models, and producing visualisations models from text.

sparsity(): Compute the sparsity of a document-feature matrix
topfeatures(): Identify the most frequent features in a dfm
textmodels: Models for scaling and classification of textual data
textplots: Plots for textual data
textstats: Statistics for textual data