Sample tokenized documents randomly from a tokens object, with or without replacement. Works just as sample() works, for document-level units (and their associated document-level variables).

tokens_sample(x, size = ndoc(x), replace = FALSE, prob = NULL)

Arguments

x

the tokens object whose documents will be sampled

size

a positive number, the number of documents or features to select

replace

logical; should sampling be with replacement?

prob

a vector of probability weights for obtaining the elements of the vector being sampled.

Value

A tokens object with number of documents or features equal to size, drawn from the tokens x.

See also

Examples

set.seed(10) toks <- tokens(data_corpus_inaugural[1:10]) head(toks)
#> Tokens consisting of 6 documents and 4 docvars. #> 1789-Washington : #> [1] "Fellow-Citizens" "of" "the" "Senate" #> [5] "and" "of" "the" "House" #> [9] "of" "Representatives" ":" "Among" #> [ ... and 1,525 more ] #> #> 1793-Washington : #> [1] "Fellow" "citizens" "," "I" "am" "again" #> [7] "called" "upon" "by" "the" "voice" "of" #> [ ... and 135 more ] #> #> 1797-Adams : #> [1] "When" "it" "was" "first" "perceived" "," #> [7] "in" "early" "times" "," "that" "no" #> [ ... and 2,565 more ] #> #> 1801-Jefferson : #> [1] "Friends" "and" "Fellow" "Citizens" ":" "Called" #> [7] "upon" "to" "undertake" "the" "duties" "of" #> [ ... and 1,911 more ] #> #> 1805-Jefferson : #> [1] "Proceeding" "," "fellow" "citizens" #> [5] "," "to" "that" "qualification" #> [9] "which" "the" "Constitution" "requires" #> [ ... and 2,368 more ] #> #> 1809-Madison : #> [1] "Unwilling" "to" "depart" "from" "examples" "of" #> [7] "the" "most" "revered" "authority" "," "I" #> [ ... and 1,249 more ] #>
head(tokens_sample(toks))
#> Tokens consisting of 6 documents and 4 docvars. #> 1821-Monroe : #> [1] "Fellow" "citizens" "," "I" "shall" "not" #> [7] "attempt" "to" "describe" "the" "grateful" "emotions" #> [ ... and 4,874 more ] #> #> 1813-Madison : #> [1] "About" "to" "add" "the" "solemnity" #> [6] "of" "an" "oath" "to" "the" #> [11] "obligations" "imposed" #> [ ... and 1,290 more ] #> #> 1817-Monroe : #> [1] "I" "should" "be" "destitute" "of" "feeling" #> [7] "if" "I" "was" "not" "deeply" "affected" #> [ ... and 3,665 more ] #> #> 1809-Madison : #> [1] "Unwilling" "to" "depart" "from" "examples" "of" #> [7] "the" "most" "revered" "authority" "," "I" #> [ ... and 1,249 more ] #> #> 1797-Adams : #> [1] "When" "it" "was" "first" "perceived" "," #> [7] "in" "early" "times" "," "that" "no" #> [ ... and 2,565 more ] #> #> 1793-Washington : #> [1] "Fellow" "citizens" "," "I" "am" "again" #> [7] "called" "upon" "by" "the" "voice" "of" #> [ ... and 135 more ] #>
head(tokens_sample(toks, replace = TRUE))
#> Tokens consisting of 6 documents and 4 docvars. #> 1817-Monroe.1 : #> [1] "I" "should" "be" "destitute" "of" "feeling" #> [7] "if" "I" "was" "not" "deeply" "affected" #> [ ... and 3,665 more ] #> #> 1813-Madison.1 : #> [1] "About" "to" "add" "the" "solemnity" #> [6] "of" "an" "oath" "to" "the" #> [11] "obligations" "imposed" #> [ ... and 1,290 more ] #> #> 1809-Madison.1 : #> [1] "Unwilling" "to" "depart" "from" "examples" "of" #> [7] "the" "most" "revered" "authority" "," "I" #> [ ... and 1,249 more ] #> #> 1813-Madison.2 : #> [1] "About" "to" "add" "the" "solemnity" #> [6] "of" "an" "oath" "to" "the" #> [11] "obligations" "imposed" #> [ ... and 1,290 more ] #> #> 1809-Madison.2 : #> [1] "Unwilling" "to" "depart" "from" "examples" "of" #> [7] "the" "most" "revered" "authority" "," "I" #> [ ... and 1,249 more ] #> #> 1793-Washington.1 : #> [1] "Fellow" "citizens" "," "I" "am" "again" #> [7] "called" "upon" "by" "the" "voice" "of" #> [ ... and 135 more ] #>