Sample randomly from a dfm object, from documents or features.

dfm_sample(x, size = ifelse(margin == "documents", ndoc(x), nfeat(x)),
  replace = FALSE, prob = NULL, margin = c("documents", "features"))

Arguments

x

the dfm object whose documents or features will be sampled

size

a positive number, the number of documents or features to select. The default is the number of documents or the number of features, for margin = "documents" and margin = "features" respectively.

replace

logical; should sampling be with replacement?

prob

a vector of probability weights for obtaining the elements of the vector being sampled.

margin

dimension (of a dfm) to sample: can be documents or features

Value

A dfm object with number of documents or features equal to size, drawn from the dfm x.

See also

Examples

set.seed(10) dfmat <- dfm(c("a b c c d", "a a c c d d d")) head(dfmat)
#> Document-feature matrix of: 2 documents, 4 features (12.5% sparse). #> 2 x 4 sparse Matrix of class "dfm" #> features #> docs a b c d #> text1 1 1 2 1 #> text2 2 0 2 3
head(dfm_sample(dfmat))
#> Document-feature matrix of: 2 documents, 4 features (12.5% sparse). #> 2 x 4 sparse Matrix of class "dfm" #> features #> docs a b c d #> text1 1 1 2 1 #> text2 2 0 2 3
head(dfm_sample(dfmat, replace = TRUE))
#> Document-feature matrix of: 2 documents, 4 features (25.0% sparse). #> 2 x 4 sparse Matrix of class "dfm" #> features #> docs a b c d #> text2 2 0 2 3 #> text2 2 0 2 3
head(dfm_sample(dfmat, margin = "features"))
#> Document-feature matrix of: 2 documents, 4 features (12.5% sparse). #> 2 x 4 sparse Matrix of class "dfm" #> features #> docs d c b a #> text1 1 2 1 1 #> text2 3 2 0 2
head(dfm_sample(dfmat, margin = "features", replace = TRUE))
#> Document-feature matrix of: 2 documents, 4 features (0.0% sparse). #> 2 x 4 sparse Matrix of class "dfm" #> features #> docs c c c d #> text1 2 2 2 1 #> text2 2 2 2 3