Takes a dfm that contains features with hyphenated words, such as "split-second" and turns them into features that split the elements in the same was as tokens(x, remove_hyphens = TRUE) would have done.

dfm_split_hyphenated_features(x)

Arguments

x

input dfm

Examples

(dfmat <- dfm("One-two one two three."))
#> Document-feature matrix of: 1 document, 5 features (0.0% sparse). #> features #> docs one-two one two three . #> text1 1 1 1 1 1
quanteda:::dfm_split_hyphenated_features(dfmat)
#> Document-feature matrix of: 1 document, 5 features (0.0% sparse). #> features #> docs one two three . - #> text1 2 2 1 1 1