List the most (or least) frequently occurring features in a dfm, either as a whole or separated by document.

topfeatures(
  x,
  n = 10,
  decreasing = TRUE,
  scheme = c("count", "docfreq"),
  groups = NULL
)

Arguments

x

the object whose features will be returned

n

how many top features should be returned

decreasing

If TRUE, return the n most frequent features; otherwise return the n least frequent features

scheme

one of count for total feature frequency (within group if applicable), or docfreq for the document frequencies of features

groups

grouping variable for sampling, equal in length to the number of documents. This will be evaluated in the docvars data.frame, so that docvars may be referred to by name without quoting. This also changes previous behaviours for groups. See news(Version >= "3.0", package = "quanteda") for details.

Value

A named numeric vector of feature counts, where the names are the feature labels, or a list of these if groups is given.

Examples

dfmat1 <- corpus_subset(data_corpus_inaugural, Year > 1980) %>% tokens(remove_punct = TRUE) %>% dfm() dfmat2 <- dfm_remove(dfmat1, stopwords("en")) # most frequent features topfeatures(dfmat1)
#> the and of to we our a in is that #> 1201 1023 838 649 627 608 472 379 329 317
topfeatures(dfmat2)
#> us america must people new world nation can time one #> 192 116 109 99 96 95 91 88 77 76
# least frequent features topfeatures(dfmat2, decreasing = FALSE)
#> hatfield mondale baker moomaw momentous occurrence #> 1 1 1 1 1 1 #> routinely unique really every-4-year #> 1 1 1 1
# top features of individual documents topfeatures(dfmat2, n = 5, groups = docnames(dfmat2))
#> Error in docnames(dfmat2): object 'dfmat2' not found
# grouping by president last name topfeatures(dfmat2, n = 5, groups = President)
#> $Biden #> us america can one nation #> 27 18 16 15 12 #> #> $Bush #> freedom us nation america can #> 36 27 27 27 24 #> #> $Clinton #> us new world must america #> 40 38 28 28 26 #> #> $Obama #> us must can nation people #> 44 25 20 18 18 #> #> $Reagan #> us government people world one #> 52 29 25 23 22 #> #> $Trump #> america american people country one #> 18 11 10 9 8 #>
# features by document frequencies tail(topfeatures(dfmat1, scheme = "docfreq", n = 200))
#> unity like cannot even challenges schools #> 8 8 8 8 8 8