Get or set the document names of a corpus, tokens, or dfm object.

docnames(x)

docnames(x) <- value

docid(x)

Arguments

x

the object with docnames

value

a character vector of the same length as x

Value

docnames returns a character vector of the document names

docnames <- assigns new values to the document names of an object. docnames can only be character, so any non-character value assigned to be a docname will be coerced to mode character.

docid returns an internal variable denoting the original "docname" from which a document came. Unless an object has been reshaped (e.g. corpus_reshape(), split (e.g.tokens_split()), or segmented (e.g. corpus_segment()), docid(x) will return the docnames.

Note

docid is designed primarily for developers, not for end users. In most cases, you will want docnames instead. It is, however, the default for groups, so that documents that have been previously reshaped (e.g. corpus_reshape(), split (e.g.tokens_split()), or segmented (e.g. corpus_segment()) will be regrouped into their original docnames when groups = docid(x).

See also

Examples

# get and set doument names to a corpus corp <- data_corpus_inaugural docnames(corp) <- char_tolower(docnames(corp)) # get and set doument names to a tokens toks <- tokens(data_corpus_inaugural) docnames(toks) <- char_tolower(docnames(toks)) # get and set doument names to a dfm dfmat <- dfm(data_corpus_inaugural[1:5])
#> Warning: 'dfm.corpus()' is deprecated. Use 'tokens()' first.
docnames(dfmat) <- char_tolower(docnames(dfmat)) # reassign the document names of the inaugural speech corpus docnames(data_corpus_inaugural) <- paste("Speech", 1:ndoc(data_corpus_inaugural), sep="") # docid corp <- corpus(c(textone = "This is a sentence. Another sentence. Yet another.", textwo = "Sentence 1. Sentence 2.")) corpsent <- corp %>% corpus_reshape(to = "sentences") docnames(corpsent)
#> [1] "textone.1" "textone.2" "textone.3" "textwo.1" "textwo.2"
docid(corpsent)
#> [1] textone textone textone textwo textwo #> Levels: textone textwo
docid(tokens(corpsent))
#> [1] textone textone textone textwo textwo #> Levels: textone textwo
docid(dfm(tokens(corpsent)))
#> [1] textone textone textone textwo textwo #> Levels: textone textwo