Get the concatenator character from a tokens object.

concat(x)

concatenator(x)

Arguments

x

a tokens object

Value

a character of length 1

Details

The concatenator character is a special delimiter used to link separate tokens in multi-token phrases. It is embedded in the meta-data of tokens objects and used in downstream operations, such as tokens_compound() or tokens_lookup(). It can be extracted using concat() and set using tokens(x, concatenator = ...) when x is a tokens object.

The default _ is recommended since it will not be removed during normal cleaning and tokenization (while nearly all other punctuation characters, at least those in the Unicode punctuation class [P] will be removed).

Examples

toks <- tokens(data_corpus_inaugural[1:5])
concat(toks)
#> [1] "_"