dfm() returns a dfm with the identical column order even if tokens_compound() or tokens_ngrams() is used in the upstream (#2100).dfm_group() with NA values in a grouping variable now drops those, similar to the behaviour of tokens_group() and corpus_group() (#2134).char_wordstem() now has a a new argument check_whitespace, which will not throw an error when lower-casing text containing a whitespace character.dfm_remove() now has a new argument padding = FALSE that when TRUE, collects counts of the removed features in the first column. This produces results consistent with what is compiled as a dfm built from tokens where some have been removed with padding = TRUE (#2152).dfm_lookup() ignores matches of multiple dictionary values in the same key in a similar way as tokens_lookup() (#2159).fcm() computes the marginal frequency of upper-case tokens correctly (#2176).tokens_chunk() keeps all the docid, including those of empty documents, in the original object.tokens_select() recycles values when the length of startpos or endpos is less than ndoc(x).tokens_lookup() and dfm_lookup() can apply very large dictionaries (more than 100,000 keys).segid() is added to extract document serial numbers from corpus, tokens or dfm objects.