Functions to add or retrieve corpus summary metadata

add_summary_metadata(x, extended = FALSE, ...)

get_summary_metadata(x, ...)

summarize_texts_extended(x, stop_words = stopwords("en"), n = 100)

Arguments

x

corpus object

...

additional arguments passed to tokens() when computing the summary information

Value

add_summary_metadata() returns a corpus with summary metadata added as a data.frame, with the top-level list element names summary().

get_summary_metadata() returns the summary metadata as a data.frame.

summarize_texts_extended() returns extended summary information.

Details

This is provided so that a corpus object can be stored with summary information to avoid having to compute this every time summary.corpus() is called.

So in future calls, if !is.null(meta(x, "summary", type = "system") && !length(list(...)), then summary.corpus() will simply return get_system_meta() rather than compute the summary statistics on the fly, which requires tokenizing the text.

Examples

corp <- corpus(data_char_ukimmig2010)
corp <- quanteda:::add_summary_metadata(corp)
quanteda:::get_summary_metadata(corp)
#> Corpus consisting of 9 documents, showing 9 documents:
#> 
#>          Text Types Tokens Sentences
#>           BNP  1125   3280        88
#>     Coalition   142    260         4
#>  Conservative   251    499        15
#>        Greens   322    679        21
#>        Labour   298    683        29
#>        LibDem   251    483        14
#>            PC    77    114         5
#>           SNP    88    134         4
#>          UKIP   346    723        26
#> 

## using extended summary

if (FALSE) {
extended_data <- quanteda:::summarize_texts_extended(data_corpus_inaugural)
topfeatures(extended_data$top_dfm)
}