Correspondence analysis of a document-feature matrix

textmodel_ca implements correspondence analysis scaling on a dfm. The method is a fast/sparse version of function ca.

textmodel_ca(x, smooth = 0, nd = NA, sparse = FALSE, residual_floor = 0.1)

Arguments

x	the dfm on which the model will be fit
smooth	a smoothing parameter for word counts; defaults to zero.
nd	Number of dimensions to be included in output; if `NA` (the default) then the maximum possible dimensions are included.
sparse	retains the sparsity if set to `TRUE`; set it to `TRUE` if `x` (the dfm) is too big to be allocated after converting to dense
residual_floor	specifies the threshold for the residual matrix for calculating the truncated svd.Larger value will reduce memory and time cost but might reduce accuracy; only applicable when `sparse = TRUE`

Value

textmodel_ca() returns a fitted CA textmodel that is a special class of ca object.

Details

svds in the RSpectra package is applied to enable the fast computation of the SVD.

Note

You may need to set sparse = TRUE) and increase the value of residual_floor to ignore less important information and hence to reduce the memory cost when you have a very big dfm. If your attempt to fit the model fails due to the matrix being too large, this is probably because of the memory demands of computing the \(V \times V\) residual matrix. To avoid this, consider increasing the value of residual_floor by 0.1, until the model can be fit.

References

Nenadic, O. & Greenacre, M. (2007). Correspondence Analysis in R, with Two- and Three-dimensional Graphics: The ca package. Journal of Statistical Software, 20(3).

Examples

dfmat <- dfm(data_corpus_irishbudget2010)
tmod <- textmodel_ca(dfmat)
summary(tmod)
#>            Length Class  Mode     
#> sv             7  -none- numeric  
#> nd             1  -none- numeric  
#> rownames      14  -none- character
#> rowmass       14  -none- numeric  
#> rowdist       14  -none- numeric  
#> rowinertia    14  -none- numeric  
#> rowcoord      98  -none- numeric  
#> rowsup         0  -none- logical  
#> colnames    5140  -none- character
#> colmass     5140  -none- numeric  
#> coldist     5140  -none- numeric  
#> colinertia  5140  -none- numeric  
#> colcoord   35980  -none- numeric  
#> colsup         0  -none- logical  
#> call           2  -none- call