Combine documents in a dfm by a grouping variable, which can also be
one of the docvars attached to the dfm. This is identical in
functionality to using the "groups"
argument in dfm()
.
dfm_group(x, groups = NULL, fill = FALSE, force = FALSE)
x | a dfm |
---|---|
groups | either: a character vector containing the names of document
variables to be used for grouping; or a factor or object that can be
coerced into a factor equal in length or rows to the number of documents.
|
fill | logical; if |
force | logical; if |
dfm_group
returns a dfm whose documents are equal to
the unique group combinations, and whose cell values are the sums of the
previous values summed by group. Document-level variables that have no
variation within groups are saved in docvars. Document-level
variables that are lists are dropped from grouping, even when these exhibit
no variation within groups.
Setting the fill = TRUE
offers a way to "pad" a dfm with document
groups that may not have been observed, but for which an empty document is
needed, for various reasons. If groups
is a factor of dates, for
instance, then using fill = TRUE
ensures that the new documents will
consist of one row of the dfm per date, regardless of whether any documents
previously existed with that date.
corp <- corpus(c("a a b", "a b c c", "a c d d", "a c c d"), docvars = data.frame(grp = c("grp1", "grp1", "grp2", "grp2"))) dfmat <- dfm(corp) dfm_group(dfmat, groups = "grp")#> Document-feature matrix of: 2 documents, 4 features (25.0% sparse) and 1 docvar. #> features #> docs a b c d #> grp1 3 2 2 0 #> grp2 2 0 3 3#> Document-feature matrix of: 2 documents, 4 features (25.0% sparse) and 1 docvar. #> features #> docs a b c d #> 1 3 2 2 0 #> 2 2 0 3 3#> Document-feature matrix of: 2 documents, 4 features (25.0% sparse) and 1 docvar. #> features #> docs a b c d #> grp1 3 2 2 0 #> grp2 2 0 3 3#> Document-feature matrix of: 2 documents, 4 features (25.0% sparse) and 1 docvar. #> features #> docs a b c d #> 1 3 2 2 0 #> 2 2 0 3 3