These function select or discard elements from a character object. For convenience, the functions char_remove and char_keep are defined as shortcuts for char_select(x, pattern, selection = "remove") and char_select(x, pattern, selection = "keep"), respectively.

These functions make it easy to change, for instance, stopwords based on pattern matching.

char_select(
  x,
  pattern,
  selection = c("keep", "remove"),
  valuetype = c("glob", "fixed", "regex"),
  case_insensitive = TRUE
)

char_remove(x, ...)

char_keep(x, ...)

Arguments

x

an input character vector

pattern

a character vector, list of character vectors, dictionary, or collocations object. See pattern for details.

selection

whether to "keep" or "remove" the tokens matching pattern

valuetype

the type of pattern matching: "glob" for "glob"-style wildcard expressions; "regex" for regular expressions; or "fixed" for exact matching. See valuetype for details.

case_insensitive

logical; if TRUE, ignore case when matching a pattern or dictionary values

...

additional arguments passed by char_remove and char_keep to char_select. Cannot include selection.

Value

a modified character vector

Examples

# character selection
mykeywords <- c("natural", "national", "denatured", "other")
char_select(mykeywords, "nat*", valuetype = "glob")
#> [1] "natural"  "national"
char_select(mykeywords, "nat", valuetype = "regex")
#> [1] "natural"   "national"  "denatured"
char_select(mykeywords, c("natur*", "other"))
#> [1] "natural" "other"  
char_select(mykeywords, c("natur*", "other"), selection = "remove")
#> [1] "national"  "denatured"

# character removal
char_remove(letters[1:5], c("a", "c", "x"))
#> [1] "b" "d" "e"
words <- c("any", "and", "Anna", "as", "announce", "but")
char_remove(words, "an*")
#> [1] "as"  "but"
char_remove(words, "an*", case_insensitive = FALSE)
#> [1] "Anna" "as"   "but" 
char_remove(words, "^.n.+$", valuetype = "regex")
#> [1] "as"  "but"

# remove some of the system stopwords
stopwords("en", source = "snowball")[1:6]
#> [1] "i"      "me"     "my"     "myself" "we"     "our"   
stopwords("en", source = "snowball")[1:6] |>
  char_remove(c("me", "my*"))
#> [1] "i"   "we"  "our"
  
# character keep
char_keep(letters[1:5], c("a", "c", "x"))
#> [1] "a" "c"