corpus-class.Rd
Extensions of base R functions for corpus objects.
# S3 method for corpus print(x, ...) is.corpus(x) is.corpuszip(x) # S3 method for summary.corpus print(x, ...) # S3 method for corpus +(c1, c2) # S3 method for corpus c(..., recursive = FALSE) # S3 method for corpus [(x, i, j = NULL, ..., drop = TRUE) # S3 method for corpus [[(x, i, ...) # S3 method for corpus [[(x, i) <- value # S3 method for corpus str(object, ...)
x | a corpus object |
---|---|
... | not used |
c1 | corpus one to be added |
c2 | corpus two to be added |
recursive | logical used by `c()` method, always set to `FALSE` |
i | index for documents or rows of document variables |
j | index for column of document variables |
drop | if |
value | a vector that will form a new docvar |
object | the corpus about which you want structural information |
is.corpus
returns TRUE
if the object is a corpus
is.corpuszip
returns TRUE
if the object is a compressed corpus
The +
operator for a corpus object will combine two corpus
objects, resolving any non-matching docvars
or
metadoc
fields by making them into NA
values for the
corpus lacking that field. Corpus-level meta data is concatenated, except
for source
and notes
, which are stamped with information
pertaining to the creation of the new joined corpus.
The `c()` operator is also defined for corpus class objects, and provides an easy way to combine multiple corpus objects.
There are some issues that need to be addressed in future revisions of
quanteda concerning the use of factors to store document variables and
meta-data. Currently most or all of these are not recorded as factors,
because we use stringsAsFactors=FALSE
in the
data.frame
calls that are used to create and store the
document-level information, because the texts should always be stored as
character vectors and never as factors.
# concatenate corpus objects corpus1 <- corpus(data_char_ukimmig2010[1:2]) corpus2 <- corpus(data_char_ukimmig2010[3:4]) corpus3 <- corpus(data_char_ukimmig2010[5:6]) summary(c(corpus1, corpus2, corpus3))#> Corpus consisting of 6 documents: #> #> Text Types Tokens Sentences #> BNP 1125 3280 88 #> Coalition 142 260 4 #> Conservative 251 499 15 #> Greens 322 679 21 #> Labour 298 683 29 #> LibDem 251 483 14 #> #> Source: Concatenation by c.corpus() #> Created: Sat Feb 2 14:11:45 2019 #> Notes:# ways to index corpus elements data_corpus_inaugural["1793-Washington"] # 2nd Washington inaugural speech#> 1793-Washington #> "Fellow citizens, I am again called upon by the voice of my country to execute the functions of its Chief Magistrate. When the occasion proper for it shall arrive, I shall endeavor to express the high sense I entertain of this distinguished honor, and of the confidence which has been reposed in me by the people of united America.\n\nPrevious to the execution of any official act of the President the Constitution requires an oath of office. This oath I am now about to take, and in your presence: That if it shall be found during my administration of the Government I have in any instance violated willingly or knowingly the injunctions thereof, I may (besides incurring constitutional punishment) be subject to the upbraidings of all who are now witnesses of the present solemn ceremony.\n\n "data_corpus_inaugural[2] # same#> 1793-Washington #> "Fellow citizens, I am again called upon by the voice of my country to execute the functions of its Chief Magistrate. When the occasion proper for it shall arrive, I shall endeavor to express the high sense I entertain of this distinguished honor, and of the confidence which has been reposed in me by the people of united America.\n\nPrevious to the execution of any official act of the President the Constitution requires an oath of office. This oath I am now about to take, and in your presence: That if it shall be found during my administration of the Government I have in any instance violated willingly or knowingly the injunctions thereof, I may (besides incurring constitutional punishment) be subject to the upbraidings of all who are now witnesses of the present solemn ceremony.\n\n "# access the docvars from data_corpus_irishbudget2010 data_corpus_irishbudget2010[, "year"]#> [1] "2010" "2010" "2010" "2010" "2010" "2010" "2010" "2010" "2010" "2010" #> [11] "2010" "2010" "2010" "2010"# same data_corpus_irishbudget2010[["year"]]#> year #> Lenihan, Brian (FF) 2010 #> Bruton, Richard (FG) 2010 #> Burton, Joan (LAB) 2010 #> Morgan, Arthur (SF) 2010 #> Cowen, Brian (FF) 2010 #> Kenny, Enda (FG) 2010 #> ODonnell, Kieran (FG) 2010 #> Gilmore, Eamon (LAB) 2010 #> Higgins, Michael (LAB) 2010 #> Quinn, Ruairi (LAB) 2010 #> Gormley, John (Green) 2010 #> Ryan, Eamon (Green) 2010 #> Cuffe, Ciaran (Green) 2010 #> OCaolain, Caoimhghin (SF) 2010# create a new document variable data_corpus_irishbudget2010[["govtopp"]] <- ifelse(data_corpus_irishbudget2010[["party"]] %in% c("FF", "Greens"), "Government", "Opposition") docvars(data_corpus_irishbudget2010)#> year debate number foren name party #> Lenihan, Brian (FF) 2010 BUDGET 01 Brian Lenihan FF #> Bruton, Richard (FG) 2010 BUDGET 02 Richard Bruton FG #> Burton, Joan (LAB) 2010 BUDGET 03 Joan Burton LAB #> Morgan, Arthur (SF) 2010 BUDGET 04 Arthur Morgan SF #> Cowen, Brian (FF) 2010 BUDGET 05 Brian Cowen FF #> Kenny, Enda (FG) 2010 BUDGET 06 Enda Kenny FG #> ODonnell, Kieran (FG) 2010 BUDGET 07 Kieran ODonnell FG #> Gilmore, Eamon (LAB) 2010 BUDGET 08 Eamon Gilmore LAB #> Higgins, Michael (LAB) 2010 BUDGET 09 Michael Higgins LAB #> Quinn, Ruairi (LAB) 2010 BUDGET 10 Ruairi Quinn LAB #> Gormley, John (Green) 2010 BUDGET 11 John Gormley Green #> Ryan, Eamon (Green) 2010 BUDGET 12 Eamon Ryan Green #> Cuffe, Ciaran (Green) 2010 BUDGET 13 Ciaran Cuffe Green #> OCaolain, Caoimhghin (SF) 2010 BUDGET 14 Caoimhghin OCaolain SF #> govtopp #> Lenihan, Brian (FF) Opposition #> Bruton, Richard (FG) Opposition #> Burton, Joan (LAB) Opposition #> Morgan, Arthur (SF) Opposition #> Cowen, Brian (FF) Opposition #> Kenny, Enda (FG) Opposition #> ODonnell, Kieran (FG) Opposition #> Gilmore, Eamon (LAB) Opposition #> Higgins, Michael (LAB) Opposition #> Quinn, Ruairi (LAB) Opposition #> Gormley, John (Green) Opposition #> Ryan, Eamon (Green) Opposition #> Cuffe, Ciaran (Green) Opposition #> OCaolain, Caoimhghin (SF) Opposition