`R/dfm-classes.R`

, `R/dfm-subsetting.R`

`dfm-class.Rd`

The dfm class of object is a type of Matrix-class object with
additional slots, described below. quanteda uses two subclasses of the
`dfm`

class, depending on whether the object can be represented by a
sparse matrix, in which case it is a `dfm`

class object, or if dense,
then a `dfmDense`

object. See Details.

# S4 method for dfm t(x) # S4 method for dfm colSums(x, na.rm = FALSE, dims = 1, ...) # S4 method for dfm rowSums(x, na.rm = FALSE, dims = 1, ...) # S4 method for dfm colMeans(x, na.rm = FALSE, dims = 1, ...) # S4 method for dfm rowMeans(x, na.rm = FALSE, dims = 1, ...) # S4 method for dfm,numeric Arith(e1, e2) # S4 method for numeric,dfm Arith(e1, e2) # S4 method for dfm,index,index,missing [(x, i, j, ..., drop = TRUE) # S4 method for dfm,index,index,logical [(x, i, j, ..., drop = TRUE) # S4 method for dfm,missing,missing,missing [(x, i, j, ..., drop = TRUE) # S4 method for dfm,missing,missing,logical [(x, i, j, ..., drop = TRUE) # S4 method for dfm,index,missing,missing [(x, i, j, ..., drop = TRUE) # S4 method for dfm,index,missing,logical [(x, i, j, ..., drop = TRUE) # S4 method for dfm,missing,index,missing [(x, i, j, ..., drop = TRUE) # S4 method for dfm,missing,index,logical [(x, i, j, ..., drop = TRUE)

x | the dfm object |
---|---|

na.rm | if |

dims | ignored |

... | additional arguments not used here |

e1 | first quantity in "+" operation for dfm |

e2 | second quantity in "+" operation for dfm |

i | document names or indices for documents to extract. |

j | feature names or indices for documents to extract. |

drop_docid | if |

The `dfm`

class is a virtual class that will contain
dgCMatrix-class.

`weightTf`

the type of term frequency weighting applied to the dfm. Default is

`"frequency"`

, indicating that the values in the cells of the dfm are simple feature counts. To change this, use the`dfm_weight()`

method.`weightFf`

the type of document frequency weighting applied to the dfm. See

`docfreq()`

.`smooth`

a smoothing parameter, defaults to zero. Can be changed using the

`dfm_smooth()`

method.`Dimnames`

These are inherited from Matrix-class but are named

`docs`

and`features`

respectively.

# dfm subsetting dfmat <- dfm(tokens(c("this contains lots of stopwords", "no if, and, or but about it: lots", "and a third document is it"), remove_punct = TRUE)) dfmat[1:2, ] #> Document-feature matrix of: 2 documents, 16 features (59.38% sparse) and 0 docvars. #> features #> docs this contains lots of stopwords no if and or but #> text1 1 1 1 1 1 0 0 0 0 0 #> text2 0 0 1 0 0 1 1 1 1 1 #> [ reached max_nfeat ... 6 more features ] dfmat[1:2, 1:5] #> Document-feature matrix of: 2 documents, 5 features (40.00% sparse) and 0 docvars. #> features #> docs this contains lots of stopwords #> text1 1 1 1 1 1 #> text2 0 0 1 0 0