"Compresses" or groups a dfm or fcm whose dimension names are the same, for either documents or features. This may happen, for instance, if features are made equivalent through application of a thesaurus. It could also be needed after a cbind.dfm or rbind.dfm operation. In most cases, you will not need to call `dfm_compress`, since it is called automatically by functions that change the dimensions of the dfm, e.g. dfm_tolower.

dfm_compress(x, margin = c("both", "documents", "features"))

fcm_compress(x)

Arguments

x

input object, a dfm or fcm

margin

character indicating on which margin to compress a dfm, either "documents", "features", or "both" (default). For fcm objects, "documents" has no effect.

...

additional arguments passed from generic to specific methods

Value

dfm_compress returns a dfm whose dimensions have been recombined by summing the cells across identical dimension names (docnames or featnames). The docvars will be preserved for combining by features but not when documents are combined. fcm_compress returns an fcm whose features have been recombined by combining counts of identical features, summing their counts.

Note

fcm_compress works only when the fcm was created with a document context.

Examples

# dfm_compress examples mat <- rbind(dfm(c("b A A", "C C a b B"), tolower = FALSE), dfm("A C C C C C", tolower = FALSE)) colnames(mat) <- char_tolower(featnames(mat)) mat
#> Document-feature matrix of: 3 documents, 5 features (46.7% sparse). #> 3 x 5 sparse Matrix of class "dfmSparse" #> features #> docs b a c a b #> text1 1 2 0 0 0 #> text2 1 0 2 1 1 #> text1 0 1 5 0 0
dfm_compress(mat, margin = "documents")
#> Document-feature matrix of: 2 documents, 5 features (30% sparse). #> 2 x 5 sparse Matrix of class "dfmSparse" #> features #> docs b a c a b #> text1 1 3 5 0 0 #> text2 1 0 2 1 1
dfm_compress(mat, margin = "features")
#> Document-feature matrix of: 3 documents, 3 features (22.2% sparse). #> 3 x 3 sparse Matrix of class "dfmSparse" #> features #> docs b a c #> text1 1 2 0 #> text2 2 1 2 #> text1 0 1 5
dfm_compress(mat)
#> Document-feature matrix of: 2 documents, 3 features (0% sparse). #> 2 x 3 sparse Matrix of class "dfmSparse" #> features #> docs b a c #> text1 1 3 5 #> text2 2 1 2
# no effect if no compression needed compactdfm <- dfm(data_corpus_inaugural[1:5]) dim(compactdfm)
#> [1] 5 1948
dim(dfm_compress(compactdfm))
#> [1] 5 1948
# compress an fcm myfcm <- fcm(tokens("A D a C E a d F e B A C E D"), context = "window", window = 3) ## this will produce an error: # fcm_compress(myfcm) txt <- c("The fox JUMPED over the dog.", "The dog jumped over the fox.") toks <- tokens(txt, remove_punct = TRUE) myfcm <- fcm(toks, context = "document")
#> Error in get(".SigLength", envir = env): object '.SigLength' not found
colnames(myfcm) <- rownames(myfcm) <- tolower(colnames(myfcm)) colnames(myfcm)[5] <- rownames(myfcm)[5] <- "fox" myfcm
#> Feature co-occurrence matrix of: 9 by 9 features. #> 9 x 9 sparse Matrix of class "fcm" #> features #> features a d a c fox d f e b #> a . 2 1 2 1 . 1 1 1 #> d . . 1 2 2 . . . . #> a . . 1 2 2 1 1 1 . #> c . . . . 2 1 . 1 1 #> fox . . . . . 1 1 . 1 #> d . . . . . . 1 1 1 #> f . . . . . . . 1 1 #> e . . . . . . . . 1 #> b . . . . . . . . .
fcm_compress(myfcm)
#> Error in fcm_compress.fcm(myfcm): compress_fcm invalid if fcm was created with a window context