The dfm class of object is a type of Matrix-class object with additional slots, described below. quanteda uses two subclasses of the dfm class, depending on whether the object can be represented by a sparse matrix, in which case it is a dfm class object, or if dense, then a dfmDense object. See Details.

# S4 method for dfm
t(x)

# S4 method for dfm
colSums(x, na.rm = FALSE, dims = 1, ...)

# S4 method for dfm
rowSums(x, na.rm = FALSE, dims = 1, ...)

# S4 method for dfm
colMeans(x, na.rm = FALSE, dims = 1, ...)

# S4 method for dfm
rowMeans(x, na.rm = FALSE, dims = 1, ...)

# S4 method for dfm,numeric
Arith(e1, e2)

# S4 method for numeric,dfm
Arith(e1, e2)

# S4 method for dfm,index,index,missing
[(x, i, j, ..., drop = TRUE)

# S4 method for dfm,index,index,logical
[(x, i, j, ..., drop = TRUE)

# S4 method for dfm,missing,missing,missing
[(x, i, j, ..., drop = TRUE)

# S4 method for dfm,missing,missing,logical
[(x, i, j, ..., drop = TRUE)

# S4 method for dfm,index,missing,missing
[(x, i, j, ..., drop = TRUE)

# S4 method for dfm,index,missing,logical
[(x, i, j, ..., drop = TRUE)

# S4 method for dfm,missing,index,missing
[(x, i, j, ..., drop = TRUE)

# S4 method for dfm,missing,index,logical
[(x, i, j, ..., drop = TRUE)

Arguments

x

the dfm object

na.rm

if TRUE, omit missing values (including NaN) from the calculations

dims

ignored

...

additional arguments not used here

e1

first quantity in "+" operation for dfm

e2

second quantity in "+" operation for dfm

i

index for documents

j

index for features

drop

always set to FALSE

Details

The dfm class is a virtual class that will contain dgCMatrix-class.

Slots

settings

settings that govern corpus handling and subsequent downstream operations, including the settings used to clean and tokenize the texts, and to create the dfm. See settings.

weighting

the feature weighting applied to the dfm. Default is "frequency", indicating that the values in the cells of the dfm are simple feature counts. To change this, use the dfm_weight method.

smooth

a smoothing parameter, defaults to zero. Can be changed using the dfm_smooth method.

Dimnames

These are inherited from Matrix-class but are named docs and features respectively.

See also

Examples

# dfm subsetting x <- dfm(tokens(c("this contains lots of stopwords", "no if, and, or but about it: lots", "and a third document is it"), remove_punct = TRUE)) x[1:2, ]
#> Document-feature matrix of: 2 documents, 16 features (59.4% sparse). #> 2 x 16 sparse Matrix of class "dfm" #> features #> docs this contains lots of stopwords no if and or but about it a third #> text1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 #> text2 0 0 1 0 0 1 1 1 1 1 1 1 0 0 #> features #> docs document is #> text1 0 0 #> text2 0 0
x[1:2, 1:5]
#> Document-feature matrix of: 2 documents, 5 features (40% sparse). #> 2 x 5 sparse Matrix of class "dfm" #> features #> docs this contains lots of stopwords #> text1 1 1 1 1 1 #> text2 0 0 1 0 0