dfm_compress()
and dfm_group()
that changed or deleted docvars attributes of dfm objects (#1506).textplot_xray()
that caused incorrect facet labels when a pattern contained multiple list elements or values (#1514).kwic()
now correctly returns the pattern associated with each match as the "keywords"
attribute, for all pattern
types (#1515)textstat_simil()
and textstat_dist()
.textstat_lexdiv()
now works on tokens objects, not just dfm objects. New methods of lexical diversity now include MATTR (the Moving-Average Type-Token Ratio, Covington & McFall 2010) and MSTTR (Mean Segmental Type-Token Ratio).tokens_split()
allows splitting single into multiple tokens based on a pattern match. (#1500)tokens_chunk()
allows splitting tokens into new documents of equally-sized “chunks”. (#1520)textstat_entropy()
now computes entropy for a dfm across feature or document margins.textstat_readability()
is vastly improved, now providing detailing all formulas and providing full references.dfm_match()
allows a user to specify the features in a dfm according to a fixed vector of feature names, including those of another dfm. Replaces dfm_select(x, pattern)
where pattern
was a dfm.vertex_labelsize
added to textplot_network()
to allow more precise control of label sizes, either globally or individually.tokens.tokens(x, remove_hyphens = TRUE)
where x
was generated with remove_hyphens = FALSE
now behaves similarly to how the same tokens would be handled had this option been called on character input as tokens.character(x, remove_hyphens = TRUE)
. (#1498)