Declares that a character expression consists of multiple patterns, separated
by an element such as whitespace. This is typically used as a wrapper around
pattern()
to make it explicit that the pattern elements are to be used for
matches to multi-word sequences, rather than individual, unordered matches to
single words.
phrase(x, separator = " ") as.phrase(x) is.phrase(x)
x | character, dictionary, list, collocations, or tokens object; the
compound patterns to be treated as a sequence separated by |
---|---|
separator | character; the character in between the patterns. This
defaults to " ". For |
phrase()
and as.phrase()
return a specially classed list whose
elements have been split into separate character
(pattern) elements.
is.phrase
returns TRUE
if the object was created by
phrase()
; FALSE
otherwise.
as.phrase()
# make phrases from characters phrase(c("natural language processing")) #> [[1]] #> [1] "natural" "language" "processing" #> phrase(c("natural_language_processing", "text_analysis"), separator = "_") #> [[1]] #> [1] "natural" "language" "processing" #> #> [[2]] #> [1] "text" "analysis" #> # from a dictionary phrase(dictionary(list(catone = c("a b"), cattwo = "c d e", catthree = "f"))) #> [[1]] #> [1] "a" "b" #> #> [[2]] #> [1] "c" "d" "e" #> #> [[3]] #> [1] "f" #> # from a list as.phrase(list(c("natural", "language", "processing"))) #> [[1]] #> [1] "natural" "language" "processing" #> # from tokens as.phrase(tokens("natural language processing")) #> [[1]] #> [1] "natural" "language" "processing" #>