corpus.kwic()
by adding new arguments split_context
and extract_keyword
.dfm_remove(x, selection = anydfm)
is now equivalent to dfm_remove(x, selection = featnames(anydfm))
. (#1320)predict.textmodel_nb()
returns, and added type =
argument. (#1329)textmodel_affinity()
that caused failure when the input dfm had been compiled with tolower = FALSE
. (#1338)tokens_lookup()
and dfm_lookup()
when nomatch
is used. (#1347)"NA"
(#1372)predict.textmodel_worscores()
when training and test feature sets are difference (#1380).char_segment()
and corpus_segment()
are more robust to whitespace characters preceding a pattern (#1394).tokens_ngrams()
is more robust to handling large numbers of documents (#1395).corpus.data.frame()
is now robust to handling data.frame inputs with improper or missing variable names (#1388).as.igraph.fcm()
method for converting an fcm object into an igraph graph object.case_insensitive
argument to char_segment()
and corpus_segment()
.fcm(x, ordered = TRUE)
. (#1413) Also set the condition that window
can be of size 1 (formerly the limit was 2 or greater).tokens(x, what = "fasterword", remove_separators = TRUE)
so that it correctly splits words separated by \n
and \t
characters. (#1420)textstat_readability()
, fixed a bug in Dale-Chall-based measures and in the Spache word list measure. These were caused by an incorrect lookup mechanism but also by limited implementation of the wordlists. The new wordlists include all of the variations called for in the original measures, but using fast fixed matching. (#1410)rowMeans()
, rowSums()
, colMeans()
, colSums()
) caused by not having access to the Matrix package methods. (#1428)textplot_scale1d()
when input a predicted wordscores object with se.fit = TRUE
(#1440).textplot_network()
. (#1460)intermediate
to textstat_readability(x, measure, intermediate = FALSE)
, which if TRUE
returns intermediate quantities used in the computation of readability statistics. Useful for verification or direct use of the intermediate quantities.separator
argument to kwic()
to allow a user to define which characters will be added between tokens returned from a keywords in context search. (#1449)textstat_dist()
and textstat_simil()
in C++ for enhanced performance. (#1210)tokens_sample()
function (#1478).textstat_dist()
(#1443), based on the reasoning in #1442.textstat_simil()
. (#1442)textstat_keyness()
(#1482).textstat_simil()
return object coerced to matrix now default to 1.0, rather than 0.0 (#1494).textstat_lexdiv()
: Yule’s K, Simpson’s D, and Herdan’s Vm.