flatten and levels arguments to as.list.dictionary2() to enable more flexible conversion of dictionary objects. (#1661)corpus_sample(), the size now works with the by argument, to control the size of units sampled from each group.textstat_dist() and textstat_simil(), see below.tokens(). (#1713)textstat_dist() and textstat_simil() now return sparse symmetric matrix objects using classes from the Matrix package. This replaces the former structure based on the dist class. Computation of these classes is now also based on the fast implementation in the proxyC package. When computing similarities, the new min_simil argument allows a user to ignore certain values below a specified similarity threshold. A new coercion method as.data.frame.textstat_simildist() now exists for converting these returns into a data.frame of pairwise comparisons. Existing methods such as as.matrix(), as.dist(), and as.list() work as they did before.textstat_dist() and textstat_simil() because these were either not symmetric or not invariant to document or feature ordering. Finally, the selection argument has been deprecated in favour of a new y argument.textstat_readability() now defaults to measure = "Flesch" if no measure is supplied. This makes it consistent with textstat_lexdiv() that also takes a default measure (“TTR”) if none is supplied. (#1715)max_nchar and min_nchar in tokens_select() are now NULL, meaning they are not applied if the user does not supply values. Fixes #1713.kwic.corpus() and kwic.tokens() behaviour now aligned, meaning that dictionaries are correctly faceted by key instead of by value. (#1684)tokens() verbose output. (#1683)textstat_readability(). (#1701)textstat_dist() and textstat_simil(). (#1730)textstat_dist() and textstat_simil() class symmetric matrices.textstat_lexdiv().featfreq() to compute the overall feature frequencies from a dfm.tokens_lookup() when exclusive = FALSE and the tokens object has paddings. (#1743)tokens_replace() (#1765).