Substitute features based on vectorized one-to-one matching for lemmatization or user-defined stemming.

dfm_replace(x, pattern, replacement, case_insensitive = TRUE,
  verbose = quanteda_options("verbose"))

Arguments

x

dfm whose features will be replaced

pattern

a character vector. See pattern for more details.

replacement

if pattern is a character vector, then replacement must be character vector of equal length, for a 1:1 match.

case_insensitive

ignore case when matching, if TRUE

verbose

print status messages if TRUE

Examples

mydfm <- dfm(data_corpus_irishbudget2010) # lemmatization infle <- c("foci", "focus", "focused", "focuses", "focusing", "focussed", "focusses") lemma <- rep("focus", length(infle)) mydfm2 <- dfm_replace(mydfm, pattern = infle, replacement = lemma) featnames(dfm_select(mydfm2, pattern = infle))
#> [1] "focus"
# stemming feat <- featnames(mydfm) stem <- char_wordstem(feat, "porter") mydfm3 <- dfm_replace(mydfm, pattern = feat, replacement = stem, case_insensitive = FALSE) identical(mydfm3, dfm_wordstem(mydfm, "porter"))
#> [1] TRUE