Select rows of textstat objects by glob, regex or fixed patterns

Users can subset output object of textstat_collocations, textstat_keyness or textstat_frequency based on "glob", "regex" or "fixed" patterns using this method.

textstat_select(
  x,
  pattern = NULL,
  selection = c("keep", "remove"),
  valuetype = c("glob", "regex", "fixed"),
  case_insensitive = TRUE
)

Arguments

x	a `textstat` object
pattern	a character vector, list of character vectors, dictionary, or collocations object. See pattern for details.
selection	whether to `"keep"` or `"remove"` the rows that match the pattern
valuetype	the type of pattern matching: `"glob"` for "glob"-style wildcard expressions; `"regex"` for regular expressions; or `"fixed"` for exact matching. See valuetype for details.
case_insensitive	logical; if `TRUE`, ignore case when matching a `pattern` or dictionary values

Examples

period <- ifelse(docvars(data_corpus_inaugural, "Year") < 1945, "pre-war", "post-war")
dfmat <- dfm(data_corpus_inaugural, groups = period)
tstat <- textstat_keyness(dfmat)
textstat_select(tstat, 'america*')
#>          feature        chi2            p n_target n_reference
#> 7        america 177.5686921 0.000000e+00      130          54
#> 9      americans 151.2940052 0.000000e+00       67           7
#> 16     america's  94.4420979 0.000000e+00       35           0
#> 107     american  19.3289745 1.100241e-05       69          94
#> 1038    americas   0.8013128 3.707012e-01        2           1
#> 1624  american's   0.2671007 6.052833e-01        1           0
#> 5294 americanism  -0.3706871 5.426300e-01        0           1