Returns document subsets of a tokens that meet certain conditions, including direct logical operations on docvars (document-level variables). tokens_subset functions identically to subset.data.frame(), using non-standard evaluation to evaluate conditions based on the docvars in the tokens.

tokens_subset(x, subset, ...)

Arguments

x

tokens object to be subsetted

subset

logical expression indicating the documents to keep: missing values are taken as false

...

not used

Value

tokens object, with a subset of documents (and docvars) selected according to arguments

See also

Examples

corp <- corpus(c(d1 = "a b c d", d2 = "a a b e", d3 = "b b c e", d4 = "e e f a b"), docvars = data.frame(grp = c(1, 1, 2, 3))) toks <- tokens(corp) # selecting on a docvars condition tokens_subset(toks, grp > 1)
#> Tokens consisting of 2 documents and 1 docvar. #> d3 : #> [1] "b" "b" "c" "e" #> #> d4 : #> [1] "e" "e" "f" "a" "b" #>
# selecting on a supplied vector tokens_subset(toks, c(TRUE, FALSE, TRUE, FALSE))
#> Tokens consisting of 2 documents and 1 docvar. #> d1 : #> [1] "a" "b" "c" "d" #> #> d3 : #> [1] "b" "b" "c" "e" #>