predict.textmodel_worscores()when training and test feature sets are difference (#1380).
corpus_segment()are more robust to whitespace characters preceding a pattern (#1394).
tokens_ngrams()is more robust to handling large numbers of documents (#1395).
corpus.data.frame()is now robust to handling data.frame inputs with improper or missing variable names (#1388).
fcm(x, ordered = TRUE). (#1413) Also set the condition that
windowcan be of size 1 (formerly the limit was 2 or greater).
tokens(x, what = "fasterword", remove_separators = TRUE)so that it correctly splits words separated by
textstat_readability(), fixed a bug in Dale-Chall-based measures and in the Spache word list measure. These were caused by an incorrect lookup mechanism but also by limited implementation of the wordlists. The new wordlists include all of the variations called for in the original measures, but using fast fixed matching. (#1410)
colSums()) caused by not having access to the Matrix package methods. (#1428)
textplot_scale1d()when input a predicted wordscores object with
se.fit = TRUE(#1440).
textstat_readability(x, measure, intermediate = FALSE), which if
TRUEreturns intermediate quantities used in the computation of readability statistics. Useful for verification or direct use of the intermediate quantities.
kwic()to allow a user to define which characters will be added between tokens returned from a keywords in context search. (#1449)
textstat_simil()in C++ for enhanced performance. (#1210)
textstat_lexdiv(): Yule’s K, Simpson’s D, and Herdan’s Vm.