Extensions of base R functions for corpus objects.

# S3 method for corpus
print(x, ...)

is.corpus(x)

is.corpuszip(x)

# S3 method for summary.corpus
print(x, ...)

# S3 method for corpus
+(c1, c2)

# S3 method for corpus
c(..., recursive = FALSE)

# S3 method for corpus
[(x, i, j = NULL, ..., drop = TRUE)

# S3 method for corpus
[[(x, i, ...)

# S3 method for corpus
[[(x, i) <- value

# S3 method for corpus
str(object, ...)

## Arguments

x a corpus object not used corpus one to be added corpus two to be added logical used by c() method, always set to FALSE index for documents or rows of document variables index for column of document variables if TRUE, return a vector if extracting a single document variable; if FALSE, return it as a single-column data.frame. See drop for further details. a vector that will form a new docvar the corpus about which you want structural information

## Value

is.corpus returns TRUE if the object is a corpus

is.corpuszip returns TRUE if the object is a compressed corpus

## Details

The + operator for a corpus object will combine two corpus objects, resolving any non-matching docvars or metadoc fields by making them into NA values for the corpus lacking that field. Corpus-level meta data is concatenated, except for source and notes, which are stamped with information pertaining to the creation of the new joined corpus.

The c() operator is also defined for corpus class objects, and provides an easy way to combine multiple corpus objects.

There are some issues that need to be addressed in future revisions of quanteda concerning the use of factors to store document variables and meta-data. Currently most or all of these are not recorded as factors, because we use stringsAsFactors=FALSE in the data.frame calls that are used to create and store the document-level information, because the texts should always be stored as character vectors and never as factors.

## Examples


# concatenate corpus objects
corpus1 <- corpus(data_char_ukimmig2010[1:2])
corpus2 <- corpus(data_char_ukimmig2010[3:4])
corpus3 <- corpus(data_char_ukimmig2010[5:6])
summary(c(corpus1, corpus2, corpus3))#> Corpus consisting of 6 documents:
#>
#>          Text Types Tokens Sentences
#>           BNP  1125   3280        88
#>     Coalition   142    260         4
#>  Conservative   251    499        15
#>        Greens   322    679        21
#>        Labour   298    683        29
#>        LibDem   251    483        14
#>
#> Source: Concatenation by c.corpus()
#> Created: Fri Jul 26 17:43:06 2019
#> Notes:
# ways to index corpus elements
data_corpus_inaugural["1793-Washington"]    # 2nd Washington inaugural speech#>                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              1793-Washington
#> "Fellow citizens, I am again called upon by the voice of my country to execute the functions of its Chief Magistrate. When the occasion proper for it shall arrive, I shall endeavor to express the high sense I entertain of this distinguished honor, and of the confidence which has been reposed in me by the people of united America.\n\nPrevious to the execution of any official act of the President the Constitution requires an oath of office. This oath I am now about to take, and in your presence: That if it shall be found during my administration of the Government I have in any instance violated willingly or knowingly the injunctions thereof, I may (besides incurring constitutional punishment) be subject to the upbraidings of all who are now witnesses of the present solemn ceremony.\n\n " data_corpus_inaugural[2]                    # same#>                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              1793-Washington
#> "Fellow citizens, I am again called upon by the voice of my country to execute the functions of its Chief Magistrate. When the occasion proper for it shall arrive, I shall endeavor to express the high sense I entertain of this distinguished honor, and of the confidence which has been reposed in me by the people of united America.\n\nPrevious to the execution of any official act of the President the Constitution requires an oath of office. This oath I am now about to take, and in your presence: That if it shall be found during my administration of the Government I have in any instance violated willingly or knowingly the injunctions thereof, I may (besides incurring constitutional punishment) be subject to the upbraidings of all who are now witnesses of the present solemn ceremony.\n\n " # access the docvars from data_corpus_irishbudget2010
data_corpus_irishbudget2010[, "year"]#>  [1] "2010" "2010" "2010" "2010" "2010" "2010" "2010" "2010" "2010" "2010"
#> [11] "2010" "2010" "2010" "2010"# same
# data_corpus_irishbudget2010[["year"]]

# create a new document variable
# data_corpus_irishbudget2010[["govtopp"]] <-
#   ifelse(data_corpus_irishbudget2010[["party"]] %in% c("FF", "Greens"),
#             "Government", "Opposition")
# docvars(data_corpus_irishbudget2010)