Returns subsets of a corpus that meet certain conditions, including direct
logical operations on docvars (document-level variables). corpus_subset
functions identically to subset.data.frame()
, using non-standard
evaluation to evaluate conditions based on the docvars in the corpus.
corpus_subset(x, subset, drop_docid = TRUE, ...)
x | corpus object to be subsetted |
---|---|
subset | logical expression indicating the documents to keep: missing values are taken as false |
drop_docid | if |
... | not used |
corpus object, with a subset of documents (and docvars) selected according to arguments
summary(corpus_subset(data_corpus_inaugural, Year > 1980)) #> Corpus consisting of 11 documents, showing 11 documents: #> #> Text Types Tokens Sentences Year President FirstName Party #> 1981-Reagan 902 2780 129 1981 Reagan Ronald Republican #> 1985-Reagan 925 2909 123 1985 Reagan Ronald Republican #> 1989-Bush 795 2673 141 1989 Bush George Republican #> 1993-Clinton 642 1833 81 1993 Clinton Bill Democratic #> 1997-Clinton 773 2436 111 1997 Clinton Bill Democratic #> 2001-Bush 621 1806 97 2001 Bush George W. Republican #> 2005-Bush 772 2312 99 2005 Bush George W. Republican #> 2009-Obama 938 2689 110 2009 Obama Barack Democratic #> 2013-Obama 814 2317 88 2013 Obama Barack Democratic #> 2017-Trump 582 1660 88 2017 Trump Donald J. Republican #> 2021-Biden 811 2766 216 2021 Biden Joseph R. Democratic #> summary(corpus_subset(data_corpus_inaugural, Year > 1930 & President == "Roosevelt")) #> Corpus consisting of 4 documents, showing 4 documents: #> #> Text Types Tokens Sentences Year President FirstName Party #> 1933-Roosevelt 743 2057 85 1933 Roosevelt Franklin D. Democratic #> 1937-Roosevelt 725 1989 96 1937 Roosevelt Franklin D. Democratic #> 1941-Roosevelt 526 1519 68 1941 Roosevelt Franklin D. Democratic #> 1945-Roosevelt 275 633 27 1945 Roosevelt Franklin D. Democratic #>