These function select or discard elements from a character object. For
convenience, the functions char_remove
and char_keep
are defined as
shortcuts for char_select(x, pattern, selection = "remove")
and
char_select(x, pattern, selection = "keep")
, respectively.
These functions make it easy to change, for instance, stopwords based on pattern matching.
an input character vector
a character vector, list of character vectors, dictionary, or collocations object. See pattern for details.
whether to "keep"
or "remove"
the tokens matching
pattern
the type of pattern matching: "glob"
for "glob"-style
wildcard expressions; "regex"
for regular expressions; or "fixed"
for
exact matching. See valuetype for details.
logical; if TRUE
, ignore case when matching a
pattern
or dictionary values
additional arguments passed by char_remove
and char_keep
to
char_select
. Cannot include selection
.
a modified character vector
# character selection
mykeywords <- c("natural", "national", "denatured", "other")
char_select(mykeywords, "nat*", valuetype = "glob")
#> [1] "natural" "national"
char_select(mykeywords, "nat", valuetype = "regex")
#> [1] "natural" "national" "denatured"
char_select(mykeywords, c("natur*", "other"))
#> [1] "natural" "other"
char_select(mykeywords, c("natur*", "other"), selection = "remove")
#> [1] "national" "denatured"
# character removal
char_remove(letters[1:5], c("a", "c", "x"))
#> [1] "b" "d" "e"
words <- c("any", "and", "Anna", "as", "announce", "but")
char_remove(words, "an*")
#> [1] "as" "but"
char_remove(words, "an*", case_insensitive = FALSE)
#> [1] "Anna" "as" "but"
char_remove(words, "^.n.+$", valuetype = "regex")
#> [1] "as" "but"
# remove some of the system stopwords
stopwords("en", source = "snowball")[1:6]
#> [1] "i" "me" "my" "myself" "we" "our"
stopwords("en", source = "snowball")[1:6] |>
char_remove(c("me", "my*"))
#> [1] "i" "we" "our"
# character keep
char_keep(letters[1:5], c("a", "c", "x"))
#> [1] "a" "c"