Split a dfm's hyphenated features into constituent parts — dfm_split_hyphenated

Takes a dfm that contains features with hyphenated words, such as "split-second" and turns them into features that split the elements in the same was as tokens(x, remove_hyphens = TRUE) would have done.

dfm_split_hyphenated_features(x)

Arguments

x	input dfm

Examples

(dfmat <- dfm("One-two one two three."))
#> Document-feature matrix of: 1 document, 5 features (0.0% sparse).
#>        features
#> docs    one-two one two three .
#>   text1       1   1   1     1 1
quanteda:::dfm_split_hyphenated_features(dfmat)
#> Document-feature matrix of: 1 document, 5 features (0.0% sparse).
#>        features
#> docs    one two three . -
#>   text1   2   2     1 1 1