Match the feature set of a dfm to a specified vector of feature names. For existing features in x for which there is an exact match for an element of features, these will be included. Any features in x not features will be discarded, and any feature names specified in features but not found in x will be added with all zero counts.

dfm_match(x, features)

Arguments

x

a dfm

features

character; the feature names to be matched in the output dfm

Value

A dfm whose features are identical to those specified in features.

Details

Selecting on another dfm's featnames is useful when you have trained a model on one dfm, and need to project this onto a test set whose features must be identical. It is also used in bootstrap_dfm.

Note

Unlike dfm_select, this function will add feature names not already present in x. It also provides only fixed, case-sensitive matches. For more flexible feature selection, see dfm_select.

See also

Examples

# matching a dfm to a feature vector dfm_match(dfm(""), letters[1:5])
#> Document-feature matrix of: 1 document, 5 features (100.0% sparse). #> 1 x 5 sparse Matrix of class "dfm" #> features #> docs a b c d e #> text1 0 0 0 0 0
dfm_match(data_dfm_lbgexample, c("A", "B", "Z"))
#> Document-feature matrix of: 6 documents, 3 features (72.2% sparse). #> 6 x 3 sparse Matrix of class "dfm" #> features #> docs A B Z #> R1 2 3 0 #> R2 0 0 0 #> R3 0 0 3 #> R4 0 0 115 #> R5 0 0 78 #> V1 0 0 0
dfm_match(data_dfm_lbgexample, c("B", "newfeat1", "A", "newfeat2"))
#> Document-feature matrix of: 6 documents, 4 features (91.7% sparse). #> 6 x 4 sparse Matrix of class "dfm" #> features #> docs B newfeat1 A newfeat2 #> R1 3 0 2 0 #> R2 0 0 0 0 #> R3 0 0 0 0 #> R4 0 0 0 0 #> R5 0 0 0 0 #> V1 0 0 0 0
# matching one dfm to another txt <- c("This is text one", "The second text", "This is text three") (dfmat1 <- dfm(txt[1:2]))
#> Document-feature matrix of: 2 documents, 6 features (41.7% sparse). #> 2 x 6 sparse Matrix of class "dfm" #> features #> docs this is text one the second #> text1 1 1 1 1 0 0 #> text2 0 0 1 0 1 1
(dfmat2 <- dfm(txt[2:3]))
#> Document-feature matrix of: 2 documents, 6 features (41.7% sparse). #> 2 x 6 sparse Matrix of class "dfm" #> features #> docs the second text this is three #> text1 1 1 1 0 0 0 #> text2 0 0 1 1 1 1
(dfmat3 <- dfm_match(dfmat1, featnames(dfmat2)))
#> Document-feature matrix of: 2 documents, 6 features (50.0% sparse). #> 2 x 6 sparse Matrix of class "dfm" #> features #> docs the second text this is three #> text1 0 0 1 1 1 0 #> text2 1 1 1 0 0 0
setequal(featnames(dfmat2), featnames(dfmat3))
#> [1] TRUE