Extract a subset of a corpus

Returns subsets of a corpus that meet certain conditions, including direct logical operations on docvars (document-level variables). corpus_subset functions identically to subset.data.frame(), using non-standard evaluation to evaluate conditions based on the docvars in the corpus.

corpus_subset(x, subset, drop_docid = TRUE, ...)

Arguments

x: corpus object to be subsetted.
subset: logical expression indicating the documents to keep: missing values are taken as false.
drop_docid: if TRUE, docid for documents are removed as the result of subsetting.
...: not used

Value

corpus object, with a subset of documents (and docvars) selected according to arguments

Examples

summary(corpus_subset(data_corpus_inaugural, Year > 1980))
#> Corpus consisting of 12 documents, showing 12 documents:
#> 
#>            Text Types Tokens Sentences Year President FirstName      Party
#>     1981-Reagan   902   2781       129 1981    Reagan    Ronald Republican
#>     1985-Reagan   925   2909       123 1985    Reagan    Ronald Republican
#>       1989-Bush   795   2674       141 1989      Bush    George Republican
#>    1993-Clinton   642   1833        81 1993   Clinton      Bill Democratic
#>    1997-Clinton   773   2436       111 1997   Clinton      Bill Democratic
#>       2001-Bush   621   1806        97 2001      Bush George W. Republican
#>       2005-Bush   772   2312        99 2005      Bush George W. Republican
#>      2009-Obama   938   2689       110 2009     Obama    Barack Democratic
#>      2013-Obama   814   2317        88 2013     Obama    Barack Democratic
#>      2017-Trump   582   1660        88 2017     Trump Donald J. Republican
#>      2025-Trump   812   2766       216 2021     Biden Joseph R. Democratic
#>  2021-Biden.txt   812   2766       216 2025     Trump Donald J. Republican
#> 
summary(corpus_subset(data_corpus_inaugural, Year > 1930 & President == "Roosevelt"))
#> Corpus consisting of 4 documents, showing 4 documents:
#> 
#>            Text Types Tokens Sentences Year President   FirstName      Party
#>  1933-Roosevelt   743   2057        85 1933 Roosevelt Franklin D. Democratic
#>  1937-Roosevelt   725   1989        96 1937 Roosevelt Franklin D. Democratic
#>  1941-Roosevelt   526   1519        68 1941 Roosevelt Franklin D. Democratic
#>  1945-Roosevelt   275    633        27 1945 Roosevelt Franklin D. Democratic
#>

Arguments

Value

See also

Examples