tokens_segment()
, which works on tokens objects in the same way as corpus_segment()
does on corpus objects (#902).%>%
can now be used with quanteda without needing to attach magrittr (or, as many users apparently believe, the entire tidyverse.)corpus_segment()
now behaves more logically and flexibly, and is clearly differentiated from corpus_reshape()
in terms of its functionality. Its documentation is also vastly improved. (#908)data_dictionary_LSD2015
, the Lexicoder Sentiment 2015 dictionary (#963).tokens_lookup()
and dfm_lookup()
(#960).head.corpus()
, tail.corpus()
provide fast subsetting of the first or last documents in a corpus. (#952)purrr::map()
to dfm()
(#928).regex2fixed()
and associated functions.textstat_collocations.tokens()
caused by “documents” containing only ""
as tokens. (#940)cbind.dfm()
when features shared a name starting with quanteda_options("base_featname")
(#946)quanteda_options()
. (#966)summary.corpus()
now generates a special data.frame, which has its own print method, rather than requiring verbose = FALSE
to suppress output (#926).textstat_collocations()
is now multi-threaded.head.dfm()
, tail.dfm()
now behave consistently with base R methods for matrix, with the added argument nfeature
. Previously, these methods printed the subset and invisibly returned it. Now, they simply return the subset. (#952)textmodel_lsa()
for Latent Semantic Analysis.tokens_segment()
has a new window
argument, permitting selection within an asymmetric window around the pattern
of selection. (#521)tokens_replace()
now allows token types to be substituted directly and quickly.textmodel_affinity()
now adds functionality to fit the Perry and Benoit (2017) class affinity model.spacy_parse
method for corpus objects. Also restored quanteda methods for spacyr spacy_parsed
objects.textmodel_nb()
(#1010), and made output quantities from the fitted NB model regular matrix objects instead of Matrix classes.tokens_group()
is now significantly faster.tokenize()
function and all methods associated with the tokenizedTexts
object types have been removed.tokens_keep()
, dfm_keep()
, and fcm_keep()
. (#1037)textmodel_NB()
has been replaced by textmodel_nb()
.