tokens_segment(), which works on tokens objects in the same way as
corpus_segment()does on corpus objects (#902).
%>%can now be used with quanteda without needing to attach magrittr (or, as many users apparently believe, the entire tidyverse.)
corpus_segment()now behaves more logically and flexibly, and is clearly differentiated from
corpus_reshape()in terms of its functionality. Its documentation is also vastly improved. (#908)
data_dictionary_LSD2015, the Lexicoder Sentiment 2015 dictionary (#963).
tail.corpus()provide fast subsetting of the first or last documents in a corpus. (#952)
regex2fixed()and associated functions.
textstat_collocations.tokens()caused by “documents” containing only
""as tokens. (#940)
cbind.dfm()when features shared a name starting with
summary.corpus()now generates a special data.frame, which has its own print method, rather than requiring
verbose = FALSEto suppress output (#926).
textstat_collocations()is now multi-threaded.
tail.dfm()now behave consistently with base R methods for matrix, with the added argument
nfeature. Previously, these methods printed the subset and invisibly returned it. Now, they simply return the subset. (#952)
textmodel_lsa()for Latent Semantic Analysis.
tokens_segment()has a new
windowargument, permitting selection within an asymmetric window around the
patternof selection. (#521)
tokens_replace()now allows token types to be substituted directly and quickly.
textmodel_affinity()now adds functionality to fit the Perry and Benoit (2017) class affinity model.
spacy_parsemethod for corpus objects. Also restored quanteda methods for spacyr
textmodel_nb()(#1010), and made output quantities from the fitted NB model regular matrix objects instead of Matrix classes.
tokens_group()is now significantly faster.
tokenize()function and all methods associated with the
tokenizedTextsobject types have been removed.
textmodel_NB()has been replaced by