List the most (or least) frequently occuring features in a dfm, either as a whole or separated by document.

topfeatures(x, n = 10, decreasing = TRUE, scheme = c("count", "docfreq"),
  groups = NULL)

Arguments

x

the object whose features will be returned

n

how many top features should be returned

decreasing

If TRUE, return the n most frequent features; otherwise return the n least frequent features

scheme

one of count for total feature frequency (within group if applicable), or docfreq for the document frequencies of features

groups

either: a character vector containing the names of document variables to be used for grouping; or a factor or object that can be coerced into a factor equal in length or rows to the number of documents. See groups for details.

Value

A named numeric vector of feature counts, where the names are the feature labels, or a list of these if groups is given.

Examples

mydfm <- dfm(corpus_subset(data_corpus_inaugural, Year > 1980), remove_punct = TRUE) mydfm_nostopw <- dfm_remove(mydfm, stopwords("english")) # most frequent features topfeatures(mydfm)
#> Error in get(".SigLength", envir = env): object '.SigLength' not found
topfeatures(mydfm_nostopw)
#> Error in get(".SigLength", envir = env): object '.SigLength' not found
# least frequent features topfeatures(mydfm_nostopw, decreasing = FALSE)
#> Error in get(".SigLength", envir = env): object '.SigLength' not found
# top features of individual documents topfeatures(mydfm_nostopw, n = 5, groups = docnames(mydfm_nostopw))
#> Error in get(".SigLength", envir = env): object '.SigLength' not found
# grouping by president last name topfeatures(mydfm_nostopw, n = 5, groups = "President")
#> Error in get(".SigLength", envir = env): object '.SigLength' not found
# features by document frequencies tail(topfeatures(mydfm, scheme = "docfreq", n = 200))
#> congress said throughout came heart find #> 7 7 7 7 7 7