Extensions of base R functions for corpus objects.

# S3 method for corpus
print(x, ...)

is.corpus(x)

is.corpuszip(x)

# S3 method for summary.corpus
print(x, ...)

# S3 method for corpus
+(c1, c2)

# S3 method for corpus
c(..., recursive = FALSE)

# S3 method for corpus
[(x, i, j = NULL, ..., drop = TRUE)

# S3 method for corpus
[[(x, i, ...)

# S3 method for corpus
[[(x, i) <- value

# S3 method for corpus
str(object, ...)

Arguments

x

a corpus object

...

not used

c1

corpus one to be added

c2

corpus two to be added

recursive

logical used by `c()` method, always set to `FALSE`

i

index for documents or rows of document variables

j

index for column of document variables

drop

if TRUE, return a vector if extracting a single document variable; if FALSE, return it as a single-column data.frame. See drop for further details.

value

a vector that will form a new docvar

object

the corpus about which you want structural information

Value

is.corpus returns TRUE if the object is a corpus is.corpuszip returns TRUE if the object is a compressed corpus

Details

The + operator for a corpus object will combine two corpus objects, resolving any non-matching docvars or metadoc fields by making them into NA values for the corpus lacking that field. Corpus-level meta data is concatenated, except for source and notes, which are stamped with information pertaining to the creation of the new joined corpus. The `c()` operator is also defined for corpus class objects, and provides an easy way to combine multiple corpus objects. There are some issues that need to be addressed in future revisions of quanteda concerning the use of factors to store document variables and meta-data. Currently most or all of these are not recorded as factors, because we use stringsAsFactors=FALSE in the data.frame calls that are used to create and store the document-level information, because the texts should always be stored as character vectors and never as factors.

See also

summary.corpus

Examples

# concatenate corpus objects corpus1 <- corpus(data_char_ukimmig2010[1:2]) corpus2 <- corpus(data_char_ukimmig2010[3:4]) corpus3 <- corpus(data_char_ukimmig2010[5:6]) summary(c(corpus1, corpus2, corpus3))
#> Corpus consisting of 6 documents: #> #> Text Types Tokens Sentences #> BNP 1125 3280 88 #> Coalition 142 260 4 #> Conservative 251 499 15 #> Greens 322 679 21 #> Labour 298 683 29 #> LibDem 251 483 14 #> #> Source: Concatenation by c.corpus() #> Created: Fri Oct 6 09:35:46 2017 #> Notes:
# ways to index corpus elements data_corpus_inaugural["1793-Washington"] # 2nd Washington inaugural speech
#> 1793-Washington #> "Fellow citizens, I am again called upon by the voice of my country to execute the functions of its Chief Magistrate. When the occasion proper for it shall arrive, I shall endeavor to express the high sense I entertain of this distinguished honor, and of the confidence which has been reposed in me by the people of united America.\n\nPrevious to the execution of any official act of the President the Constitution requires an oath of office. This oath I am now about to take, and in your presence: That if it shall be found during my administration of the Government I have in any instance violated willingly or knowingly the injunctions thereof, I may (besides incurring constitutional punishment) be subject to the upbraidings of all who are now witnesses of the present solemn ceremony.\n\n "
data_corpus_inaugural[2] # same
#> 1793-Washington #> "Fellow citizens, I am again called upon by the voice of my country to execute the functions of its Chief Magistrate. When the occasion proper for it shall arrive, I shall endeavor to express the high sense I entertain of this distinguished honor, and of the confidence which has been reposed in me by the people of united America.\n\nPrevious to the execution of any official act of the President the Constitution requires an oath of office. This oath I am now about to take, and in your presence: That if it shall be found during my administration of the Government I have in any instance violated willingly or knowingly the injunctions thereof, I may (besides incurring constitutional punishment) be subject to the upbraidings of all who are now witnesses of the present solemn ceremony.\n\n "
# access the docvars from data_corpus_irishbudget2010 data_corpus_irishbudget2010[, "year"]
#> [1] "2010" "2010" "2010" "2010" "2010" "2010" "2010" "2010" "2010" "2010" #> [11] "2010" "2010" "2010" "2010"
# same data_corpus_irishbudget2010[["year"]]
#> year #> 2010_BUDGET_01_Brian_Lenihan_FF 2010 #> 2010_BUDGET_02_Richard_Bruton_FG 2010 #> 2010_BUDGET_03_Joan_Burton_LAB 2010 #> 2010_BUDGET_04_Arthur_Morgan_SF 2010 #> 2010_BUDGET_05_Brian_Cowen_FF 2010 #> 2010_BUDGET_06_Enda_Kenny_FG 2010 #> 2010_BUDGET_07_Kieran_ODonnell_FG 2010 #> 2010_BUDGET_08_Eamon_Gilmore_LAB 2010 #> 2010_BUDGET_09_Michael_Higgins_LAB 2010 #> 2010_BUDGET_10_Ruairi_Quinn_LAB 2010 #> 2010_BUDGET_11_John_Gormley_Green 2010 #> 2010_BUDGET_12_Eamon_Ryan_Green 2010 #> 2010_BUDGET_13_Ciaran_Cuffe_Green 2010 #> 2010_BUDGET_14_Caoimhghin_OCaolain_SF 2010
# create a new document variable data_corpus_irishbudget2010[["govtopp"]] <- ifelse(data_corpus_irishbudget2010[["party"]] %in% c("FF", "Greens"), "Government", "Opposition") docvars(data_corpus_irishbudget2010)
#> year debate number foren name #> 2010_BUDGET_01_Brian_Lenihan_FF 2010 BUDGET 01 Brian Lenihan #> 2010_BUDGET_02_Richard_Bruton_FG 2010 BUDGET 02 Richard Bruton #> 2010_BUDGET_03_Joan_Burton_LAB 2010 BUDGET 03 Joan Burton #> 2010_BUDGET_04_Arthur_Morgan_SF 2010 BUDGET 04 Arthur Morgan #> 2010_BUDGET_05_Brian_Cowen_FF 2010 BUDGET 05 Brian Cowen #> 2010_BUDGET_06_Enda_Kenny_FG 2010 BUDGET 06 Enda Kenny #> 2010_BUDGET_07_Kieran_ODonnell_FG 2010 BUDGET 07 Kieran ODonnell #> 2010_BUDGET_08_Eamon_Gilmore_LAB 2010 BUDGET 08 Eamon Gilmore #> 2010_BUDGET_09_Michael_Higgins_LAB 2010 BUDGET 09 Michael Higgins #> 2010_BUDGET_10_Ruairi_Quinn_LAB 2010 BUDGET 10 Ruairi Quinn #> 2010_BUDGET_11_John_Gormley_Green 2010 BUDGET 11 John Gormley #> 2010_BUDGET_12_Eamon_Ryan_Green 2010 BUDGET 12 Eamon Ryan #> 2010_BUDGET_13_Ciaran_Cuffe_Green 2010 BUDGET 13 Ciaran Cuffe #> 2010_BUDGET_14_Caoimhghin_OCaolain_SF 2010 BUDGET 14 Caoimhghin OCaolain #> party govtopp #> 2010_BUDGET_01_Brian_Lenihan_FF FF Opposition #> 2010_BUDGET_02_Richard_Bruton_FG FG Opposition #> 2010_BUDGET_03_Joan_Burton_LAB LAB Opposition #> 2010_BUDGET_04_Arthur_Morgan_SF SF Opposition #> 2010_BUDGET_05_Brian_Cowen_FF FF Opposition #> 2010_BUDGET_06_Enda_Kenny_FG FG Opposition #> 2010_BUDGET_07_Kieran_ODonnell_FG FG Opposition #> 2010_BUDGET_08_Eamon_Gilmore_LAB LAB Opposition #> 2010_BUDGET_09_Michael_Higgins_LAB LAB Opposition #> 2010_BUDGET_10_Ruairi_Quinn_LAB LAB Opposition #> 2010_BUDGET_11_John_Gormley_Green Green Opposition #> 2010_BUDGET_12_Eamon_Ryan_Green Green Opposition #> 2010_BUDGET_13_Ciaran_Cuffe_Green Green Opposition #> 2010_BUDGET_14_Caoimhghin_OCaolain_SF SF Opposition