Get or replace the texts in a corpus, with grouping options. Works for plain character vectors too, if groups is a factor.

texts(x, groups = NULL, spacer = "  ")

texts(x) <- value

# S3 method for corpus
as.character(x, ...)

Arguments

x

a corpus or character object

groups

either: a character vector containing the names of document variables to be used for grouping; or a factor or object that can be coerced into a factor equal in length or rows to the number of documents. See groups for details.

spacer

when concatenating texts by using groups, this will be the spacing added between texts. (Default is two spaces.)

value

character vector of the new texts

...

unused

Value

For texts, a character vector of the texts in the corpus. For texts <-, the corpus with the updated texts.

for texts <-, a corpus with the texts replaced by value as.character(x) is equivalent to texts(x)

Details

as.character(x) where x is a corpus is equivalent to calling texts(x)

Note

The groups will be used for concatenating the texts based on shared values of groups, without any specified order of aggregation.

You are strongly encouraged as a good practice of text analysis workflow not to modify the substance of the texts in a corpus. Rather, this sort of processing is better performed through downstream operations. For instance, do not lowercase the texts in a corpus, or you will never be able to recover the original case. Rather, apply tokens_tolower after applying tokens to a corpus, or use the option tolower = TRUE in dfm..

Examples

nchar(texts(corpus_subset(data_corpus_inaugural, Year < 1806)))
#> 1789-Washington 1793-Washington 1797-Adams 1801-Jefferson 1805-Jefferson #> 8618 790 13876 10136 12907
# grouping on a document variable nchar(texts(corpus_subset(data_corpus_inaugural, Year < 1806), groups = "President"))
#> Adams Jefferson Washington #> 13876 23045 9410
# grouping a character vector using a factor nchar(data_char_ukimmig2010[1:5])
#> BNP Coalition Conservative Greens Labour #> 18567 1471 2692 3841 3854
nchar(texts(data_corpus_inaugural[1:5], groups = as.factor(data_corpus_inaugural[1:5, "President"])))
#> Adams Jefferson Washington #> 13876 23045 9410
BritCorpus <- corpus(c("We must prioritise honour in our neighbourhood.", "Aluminium is a valourous metal.")) texts(BritCorpus) <- stringi::stri_replace_all_regex(texts(BritCorpus), c("ise", "([nlb])our", "nium"), c("ize", "$1or", "num"), vectorize_all = FALSE) texts(BritCorpus)
#> text1 #> "We must prioritize honor in our neighborhood." #> text2 #> "Aluminum is a valorous metal."
texts(BritCorpus)[2] <- "New text number 2." texts(BritCorpus)
#> text1 #> "We must prioritize honor in our neighborhood." #> text2 #> "New text number 2."