Returns document subsets of a dfm that meet certain conditions,
including direct logical operations on docvars (document-level variables).
dfm_subset
functions identically to subset.data.frame()
,
using non-standard evaluation to evaluate conditions based on the
docvars in the dfm.
dfm_subset(x, subset, drop_docid = TRUE, ...)
x | dfm object to be subsetted |
---|---|
subset | logical expression indicating the documents to keep: missing values are taken as false |
drop_docid | if |
... | not used |
dfm object, with a subset of documents (and docvars) selected according to arguments
To select or subset features, see dfm_select()
instead.
When select
is a dfm, then the returned dfm will be equal in
document dimension and order to the dfm used for selection. This is the
document-level version of using dfm_select()
where
pattern
is a dfm: that function matches features, while
dfm_subset
will match documents.
corp <- corpus(c(d1 = "a b c d", d2 = "a a b e", d3 = "b b c e", d4 = "e e f a b"), docvars = data.frame(grp = c(1, 1, 2, 3))) dfmat <- dfm(tokens(corp)) # selecting on a docvars condition dfm_subset(dfmat, grp > 1)#> Document-feature matrix of: 2 documents, 6 features (41.67% sparse) and 1 docvar. #> features #> docs a b c d e f #> d3 0 2 1 0 1 0 #> d4 1 1 0 0 2 1#> Document-feature matrix of: 2 documents, 6 features (41.67% sparse) and 1 docvar. #> features #> docs a b c d e f #> d1 1 1 1 1 0 0 #> d3 0 2 1 0 1 0