Get the concatenator character from a tokens object.
concat(x)
concatenator(x)a tokens object
a character of length 1
The concatenator character is a special delimiter used to link
separate tokens in multi-token phrases. It is embedded in the meta-data of
tokens objects and used in downstream operations, such as tokens_compound()
or tokens_lookup(). It can be extracted using concat() and set using
tokens(x, concatenator = ...) when x is a tokens object.
The default _ is recommended since it will not be removed during normal
cleaning and tokenization (while nearly all other punctuation characters, at
least those in the Unicode punctuation class [P] will be removed).
toks <- tokens(data_corpus_inaugural[1:5])
concat(toks)
#> [1] "_"