Extensions of base R functions for corpus objects.

# S3 method for corpus
print(x, ...)

is.corpus(x)

is.corpuszip(x)

# S3 method for summary.corpus
print(x, ...)

# S3 method for corpus
+(c1, c2)

# S3 method for corpus
c(..., recursive = FALSE)

# S3 method for corpus
[(x, i, j = NULL, ..., drop = TRUE)

# S3 method for corpus
[[(x, i, ...)

# S3 method for corpus
[[(x, i) <- value

# S3 method for corpus
str(object, ...)

Arguments

x

a corpus object

...

not used

c1

corpus one to be added

c2

corpus two to be added

recursive

logical used by `c()` method, always set to `FALSE`

i

index for documents or rows of document variables

j

index for column of document variables

drop

if TRUE, return a vector if extracting a single document variable; if FALSE, return it as a single-column data.frame. See drop for further details.

value

a vector that will form a new docvar

object

the corpus about which you want structural information

Value

is.corpus returns TRUE if the object is a corpus

is.corpuszip returns TRUE if the object is a compressed corpus

Details

The + operator for a corpus object will combine two corpus objects, resolving any non-matching docvars or metadoc fields by making them into NA values for the corpus lacking that field. Corpus-level meta data is concatenated, except for source and notes, which are stamped with information pertaining to the creation of the new joined corpus.

The `c()` operator is also defined for corpus class objects, and provides an easy way to combine multiple corpus objects.

There are some issues that need to be addressed in future revisions of quanteda concerning the use of factors to store document variables and meta-data. Currently most or all of these are not recorded as factors, because we use stringsAsFactors=FALSE in the data.frame calls that are used to create and store the document-level information, because the texts should always be stored as character vectors and never as factors.

See also

Examples

# concatenate corpus objects corpus1 <- corpus(data_char_ukimmig2010[1:2]) corpus2 <- corpus(data_char_ukimmig2010[3:4]) corpus3 <- corpus(data_char_ukimmig2010[5:6]) summary(c(corpus1, corpus2, corpus3))
#> Corpus consisting of 6 documents: #> #> Text Types Tokens Sentences #> BNP 1125 3280 88 #> Coalition 142 260 4 #> Conservative 251 499 15 #> Greens 322 679 21 #> Labour 298 683 29 #> LibDem 251 483 14 #> #> Source: Concatenation by c.corpus() #> Created: Fri Dec 7 08:57:59 2018 #> Notes:
# ways to index corpus elements data_corpus_inaugural["1793-Washington"] # 2nd Washington inaugural speech
#> 1793-Washington #> "Fellow citizens, I am again called upon by the voice of my country to execute the functions of its Chief Magistrate. When the occasion proper for it shall arrive, I shall endeavor to express the high sense I entertain of this distinguished honor, and of the confidence which has been reposed in me by the people of united America.\n\nPrevious to the execution of any official act of the President the Constitution requires an oath of office. This oath I am now about to take, and in your presence: That if it shall be found during my administration of the Government I have in any instance violated willingly or knowingly the injunctions thereof, I may (besides incurring constitutional punishment) be subject to the upbraidings of all who are now witnesses of the present solemn ceremony.\n\n "
data_corpus_inaugural[2] # same
#> 1793-Washington #> "Fellow citizens, I am again called upon by the voice of my country to execute the functions of its Chief Magistrate. When the occasion proper for it shall arrive, I shall endeavor to express the high sense I entertain of this distinguished honor, and of the confidence which has been reposed in me by the people of united America.\n\nPrevious to the execution of any official act of the President the Constitution requires an oath of office. This oath I am now about to take, and in your presence: That if it shall be found during my administration of the Government I have in any instance violated willingly or knowingly the injunctions thereof, I may (besides incurring constitutional punishment) be subject to the upbraidings of all who are now witnesses of the present solemn ceremony.\n\n "
# access the docvars from data_corpus_irishbudget2010 data_corpus_irishbudget2010[, "year"]
#> [1] "2010" "2010" "2010" "2010" "2010" "2010" "2010" "2010" "2010" "2010" #> [11] "2010" "2010" "2010" "2010"
# same data_corpus_irishbudget2010[["year"]]
#> year #> Lenihan, Brian (FF) 2010 #> Bruton, Richard (FG) 2010 #> Burton, Joan (LAB) 2010 #> Morgan, Arthur (SF) 2010 #> Cowen, Brian (FF) 2010 #> Kenny, Enda (FG) 2010 #> ODonnell, Kieran (FG) 2010 #> Gilmore, Eamon (LAB) 2010 #> Higgins, Michael (LAB) 2010 #> Quinn, Ruairi (LAB) 2010 #> Gormley, John (Green) 2010 #> Ryan, Eamon (Green) 2010 #> Cuffe, Ciaran (Green) 2010 #> OCaolain, Caoimhghin (SF) 2010
# create a new document variable data_corpus_irishbudget2010[["govtopp"]] <- ifelse(data_corpus_irishbudget2010[["party"]] %in% c("FF", "Greens"), "Government", "Opposition") docvars(data_corpus_irishbudget2010)
#> year debate number foren name party #> Lenihan, Brian (FF) 2010 BUDGET 01 Brian Lenihan FF #> Bruton, Richard (FG) 2010 BUDGET 02 Richard Bruton FG #> Burton, Joan (LAB) 2010 BUDGET 03 Joan Burton LAB #> Morgan, Arthur (SF) 2010 BUDGET 04 Arthur Morgan SF #> Cowen, Brian (FF) 2010 BUDGET 05 Brian Cowen FF #> Kenny, Enda (FG) 2010 BUDGET 06 Enda Kenny FG #> ODonnell, Kieran (FG) 2010 BUDGET 07 Kieran ODonnell FG #> Gilmore, Eamon (LAB) 2010 BUDGET 08 Eamon Gilmore LAB #> Higgins, Michael (LAB) 2010 BUDGET 09 Michael Higgins LAB #> Quinn, Ruairi (LAB) 2010 BUDGET 10 Ruairi Quinn LAB #> Gormley, John (Green) 2010 BUDGET 11 John Gormley Green #> Ryan, Eamon (Green) 2010 BUDGET 12 Eamon Ryan Green #> Cuffe, Ciaran (Green) 2010 BUDGET 13 Ciaran Cuffe Green #> OCaolain, Caoimhghin (SF) 2010 BUDGET 14 Caoimhghin OCaolain SF #> govtopp #> Lenihan, Brian (FF) Opposition #> Bruton, Richard (FG) Opposition #> Burton, Joan (LAB) Opposition #> Morgan, Arthur (SF) Opposition #> Cowen, Brian (FF) Opposition #> Kenny, Enda (FG) Opposition #> ODonnell, Kieran (FG) Opposition #> Gilmore, Eamon (LAB) Opposition #> Higgins, Michael (LAB) Opposition #> Quinn, Ruairi (LAB) Opposition #> Gormley, John (Green) Opposition #> Ryan, Eamon (Green) Opposition #> Cuffe, Ciaran (Green) Opposition #> OCaolain, Caoimhghin (SF) Opposition