docnames(x)
docnames(x) <- value
docid(x)
segid(x)
the object with docnames
a character vector of the same length as x
docnames
returns a character vector of the document names
docnames <-
assigns new values to the document names of an object.
docnames can only be character, so any non-character value assigned to be a
docname will be coerced to mode character
.
docid
returns an internal variable denoting the original "docname"
from which a document came. If an object has been reshaped (e.g.
corpus_reshape()
or segmented (e.g. corpus_segment()
), docid(x)
returns
the original docnames but segid(x)
does the serial number of those segments
within the original document.
docid
and segid
are designed primarily for developers, not for end users. In
most cases, you will want docnames
instead. It is, however, the
default for groups, so that documents that have been previously reshaped
(e.g. corpus_reshape()
or segmented (e.g.
corpus_segment()
) will be regrouped into their original docnames
when
groups = docid(x)
.
# get and set doument names to a corpus
corp <- data_corpus_inaugural
docnames(corp) <- char_tolower(docnames(corp))
# get and set doument names to a tokens
toks <- tokens(corp)
docnames(toks) <- char_tolower(docnames(toks))
# get and set doument names to a dfm
dfmat <- dfm(tokens(corp))
docnames(dfmat) <- char_tolower(docnames(dfmat))
# reassign the document names of the inaugural speech corpus
corp <- data_corpus_inaugural
docnames(corp) <- paste0("Speech", seq_len(ndoc(corp)))
corp <- corpus(c(textone = "This is a sentence. Another sentence. Yet another.",
textwo = "Sentence 1. Sentence 2."))
corp_sent <- corp |>
corpus_reshape(to = "sentences")
docnames(corp_sent)
#> [1] "textone.1" "textone.2" "textone.3" "textwo.1" "textwo.2"
# docid
docid(corp_sent)
#> [1] textone textone textone textwo textwo
#> Levels: textone textwo
docid(tokens(corp_sent))
#> [1] textone textone textone textwo textwo
#> Levels: textone textwo
docid(dfm(tokens(corp_sent)))
#> [1] textone textone textone textwo textwo
#> Levels: textone textwo
# segid
segid(corp_sent)
#> [1] 1 2 3 1 2
segid(tokens(corp_sent))
#> [1] 1 2 3 1 2
segid(dfm(tokens(corp_sent)))
#> [1] 1 2 3 1 2