"word4" tokeniser that is based on new RBBI (RuleBasedBreakIterator) rules, implemented in a new .yml file that can be edited and changed by users, but whose defaults represent a significant improvement in pattern handling for words, sentences, and other forms of patterns. These rules are customised from the ICU rules for breaks, with the standard and customised rules found now in the
breakrules/ system folder, so that they could, in principle, be modified by the user.
Other minor changes:
preserve_special()that rejoined splits created by the default stringi tokeniser machinery.
Updated for compatibility with (forthcoming) Matrix 1.5.5 handling of dimnames() for empty dimensions.
readtext object class method extensions, to work better with the readtext package.
Removes some unused internal methods, such as
docvars.kwic() that were not exported despite matching exported generics.