quanteda_options()to control the number of documents in blocked tokenization.
print.dictionary2()to control the printing of nested levels with
textstat_summary()to provide detailed information about dfm, tokens and corpus objects. It will replace
summary()in future versions.
what = "word") corpora with large numbers of documents that contain social media tags and URLs that needed to be preserved (such a large corpus of Tweets).
quanteda_options(). The following are now preserved: “#政治” as well as Weibo-style hashtags such as “#英国首相#”.
convert(x, to = "data.frame")now outputs the first column as “doc_id” rather than “document” since “document” is a commonly occurring term in many texts. (#1918)
char_remove()for easy manipulation of character vectors.
dictionary_edit()for easy, interactive editing of dictionaries, plus the functions
list_edit()for editing character and list of character objects.
textplot_wordcloud()that plots objects from
textstat_keyness(), to visualize keywords either by comparison or for the target category only.
textstat_summary()method, which returns summary information about the tokens/types/features etc in an object. It also caches summary information so that this can be retrieved on subsequent calls, rather than re-computed.
NAfor non-existent features when
textstat_frequency(x, n). (#1929)
tokens_lookup()in which an error was caused when no dictionary key returned a single match (#1946).
textstat_simil/distobject converted to a data.frame to drop its
dfm_match()to fail on a dfm that included “pads” (
data_dfm_lbgexampleobject using more modern dfm internals.
nscrabble()so that empty texts are not dropped in the result. (#1976)
textstat_keyness()performance is now improved through implementation in (multi-threaded) C++.