Calculate "keyness", a score for features that occur differentially across different categories. Here, the categories are defined by reference to a "target" document index in the dfm, with the reference group consisting of all other documents.

textstat_keyness(x, target = 1L, measure = c("chi2", "exact", "lr", "pmi"),
  sort = TRUE, correction = c("default", "yates", "williams", "none"))

Arguments

x

a dfm containing the features to be examined for keyness

target

the document index (numeric, character or logical) identifying the document forming the "target" for computing keyness; all other documents' feature frequencies will be combined for use as a reference

measure

(signed) association measure to be used for computing keyness. Currenly available: "chi2"; "exact" (Fisher's exact test); "lr" for the likelihood ratio; "pmi" for pointwise mutual information.

sort

logical; if TRUE sort features scored in descending order of the measure, otherwise leave in original feature order

correction

if "default", Yates correction is applied to "chi2"; William's correction is applied to "lr"; and no correction is applied for the "exact" and "pmi" measures. Specifying a value other than the default can be used to override the defaults, for instance to apply the Williams correction to the chi2 measure. Specying a correction for the "exact" and "pmi" measures has no effect and produces a warning.

Value

a data.frame of computed statistics and associated p-values, where the features scored name each row, and the number of occurrences for both the target and reference groups. For measure = "chi2" this is the chi-squared value, signed positively if the observed value in the target exceeds its expected value; for measure = "exact" this is the estimate of the odds ratio; for measure = "lr" this is the likelihood ratio \(G2\) statistic; for "pmi" this is the pointwise mutual information statistics.

References

Bondi, Marina, and Mike Scott, eds. 2010. Keyness in Texts. Amsterdam, Philadelphia: John Benjamins, 2010. Stubbs, Michael. 2010. "Three Concepts of Keywords". In Keyness in Texts, Marina Bondi and Mike Scott, eds. pp21–42. Amsterdam, Philadelphia: John Benjamins. Scott, M. & Tribble, C. 2006. Textual Patterns: keyword and corpus analysis in language education. Amsterdam: Benjamins, p. 55. Dunning, Ted. 1993. "Accurate Methods for the Statistics of Surprise and Coincidence", Computational Linguistics, Vol 19, No. 1, pp. 61-74.

Examples

# compare pre- v. post-war terms using grouping period <- ifelse(docvars(data_corpus_inaugural, "Year") < 1945, "pre-war", "post-war") mydfm <- dfm(data_corpus_inaugural, groups = period) head(mydfm) # make sure 'post-war' is in the first row
#> Document-feature matrix of: 2 documents, 6 features (8.33% sparse). #> 2 x 6 sparse Matrix of class "dfmSparse" #> features #> docs fellow-citizens of the senate and house #> pre-war 39 5666 8094 13 3854 8 #> post-war 0 1437 1988 2 1456 3
head(result <- textstat_keyness(mydfm), 10)
#> chi2 p n_target n_reference #> the 299.85716 0.000000e+00 8094 1988 #> of 179.17847 0.000000e+00 5666 1437 #> which 160.14560 0.000000e+00 911 95 #> should 83.10741 0.000000e+00 309 15 #> be 72.21927 0.000000e+00 1224 257 #> constitution 61.16666 5.218048e-15 200 6 #> states 59.13009 1.476597e-14 305 28 #> public 55.99225 7.271961e-14 213 11 #> it 52.69911 3.888001e-13 1132 257 #> upon 51.91189 5.804246e-13 332 39
tail(result, 10)
#> chi2 p n_target n_reference #> new -141.3690 0 97 150 #> americans -150.6480 0 7 67 #> world -175.1354 0 123 188 #> america -176.6114 0 54 130 #> : -178.1283 0 29 105 #> us -186.0132 0 216 262 #> our -187.8329 0 1307 874 #> . -226.6654 0 3141 1804 #> - -395.1361 0 312 450 #> we -702.0019 0 779 960
# compare pre- v. post-war terms using logical vector mydfm2 <- dfm(data_corpus_inaugural) textstat_keyness(mydfm2, docvars(data_corpus_inaugural, "Year") >= 1945)
#> chi2 p n_target n_reference #> we 7.020019e+02 0.000000e+00 960 779 #> - 3.951361e+02 0.000000e+00 450 312 #> . 2.266654e+02 0.000000e+00 1804 3141 #> our 1.878329e+02 0.000000e+00 874 1307 #> us 1.860132e+02 0.000000e+00 262 216 #> : 1.781283e+02 0.000000e+00 105 29 #> america 1.766114e+02 0.000000e+00 130 54 #> world 1.751354e+02 0.000000e+00 188 123 #> americans 1.506480e+02 0.000000e+00 67 7 #> new 1.413690e+02 0.000000e+00 150 97 #> freedom 1.373460e+02 0.000000e+00 121 64 #> today 1.288729e+02 0.000000e+00 76 21 #> let 1.115160e+02 0.000000e+00 100 54 #> together 1.049803e+02 0.000000e+00 64 19 #> you 1.020667e+02 0.000000e+00 116 80 #> america's 9.406895e+01 0.000000e+00 35 0 #> help 9.213749e+01 0.000000e+00 46 8 #> work 9.077929e+01 0.000000e+00 78 40 #> , 6.285815e+01 2.220446e-15 2194 4832 #> know 6.043361e+01 7.660539e-15 63 40 #> earth 6.032691e+01 7.993606e-15 44 18 #> history 5.769898e+01 3.053113e-14 60 38 #> god 5.417331e+01 1.835199e-13 55 34 #> live 5.337023e+01 2.762235e-13 39 16 #> time 5.273752e+01 3.812506e-13 106 110 #> do 5.260523e+01 4.077849e-13 112 120 #> are 5.087598e+01 9.838796e-13 311 503 #> journey 4.930187e+01 2.194578e-12 20 1 #> dignity 4.864118e+01 3.073430e-12 24 4 #> what 4.847609e+01 3.343437e-12 88 86 #> century 4.724865e+01 6.252887e-12 39 19 #> will 4.721042e+01 6.376122e-12 339 572 #> promise 4.641444e+01 9.570789e-12 34 14 #> mr 4.493857e+01 2.033140e-11 25 6 #> build 4.427129e+01 2.858780e-11 21 3 #> thank 4.395178e+01 3.365652e-11 18 1 #> bless 4.001006e+01 2.526580e-10 18 2 #> strength 3.961129e+01 3.098864e-10 45 31 #> for 3.955452e+01 3.190271e-10 421 776 #> challenges 3.938908e+01 3.472338e-10 16 0 #> cannot 3.914476e+01 3.935189e-10 36 20 #> lives 3.798753e+01 7.119839e-10 32 16 #> across 3.750459e+01 9.119797e-10 22 6 #> dream 3.737566e+01 9.743051e-10 17 2 #> generation 3.721599e+01 1.057433e-09 28 12 #> moment 3.711190e+01 1.115412e-09 27 11 #> must 3.708654e+01 1.130010e-09 151 215 #> again 3.687593e+01 1.258907e-09 42 29 #> young 3.583078e+01 2.152211e-09 19 4 #> words 3.485943e+01 3.543892e-09 26 11 #> weapons 3.402523e+01 5.440201e-09 14 0 #> resolve 3.389994e+01 5.802054e-09 17 3 #> faith 3.350180e+01 7.119804e-09 48 40 #> heart 3.323775e+01 8.155199e-09 28 14 #> day 3.314882e+01 8.536828e-09 45 36 #> nation 3.278696e+01 1.028323e-08 123 170 #> friends 3.263119e+01 1.114126e-08 25 11 #> children 3.239506e+01 1.258057e-08 31 18 #> because 3.217534e+01 1.408671e-08 59 58 #> dreams 3.169219e+01 1.806472e-08 16 2 #> your 3.144943e+01 2.047031e-08 61 62 #> role 3.134467e+01 2.160514e-08 13 0 #> peoples 3.086300e+01 2.769008e-08 26 13 #> forward 3.042924e+01 3.462689e-08 24 11 #> unity 3.010270e+01 4.097608e-08 21 8 #> meaning 2.910796e+01 6.845506e-08 15 2 #> vice 2.910796e+01 6.845506e-08 15 2 #> communism 2.866529e+01 8.603180e-08 12 0 #> begin 2.866529e+01 8.603180e-08 12 0 #> yes 2.866529e+01 8.603180e-08 12 0 #> this 2.861672e+01 8.821709e-08 295 540 #> join 2.829580e+01 1.041215e-07 16 4 #> president 2.817897e+01 1.105997e-07 46 42 #> world's 2.783793e+01 1.319142e-07 18 6 #> man 2.765403e+01 1.450692e-07 45 41 #> story 2.737324e+01 1.677360e-07 13 1 #> strong 2.685377e+01 2.194459e-07 32 23 #> families 2.653669e+01 2.585804e-07 14 2 #> back 2.611380e+01 3.218758e-07 22 11 #> believe 2.608212e+01 3.272002e-07 40 35 #> jobs 2.598743e+01 3.436470e-07 11 0 #> seek 2.528140e+01 4.954634e-07 34 27 #> celebrate 2.474832e+01 6.532558e-07 12 1 #> when 2.470883e+01 6.667808e-07 90 123 #> women 2.400582e+01 9.604510e-07 21 11 #> senator 2.331152e+01 1.377720e-06 10 0 #> nuclear 2.331152e+01 1.377720e-06 10 0 #> remember 2.320349e+01 1.457329e-06 17 7 #> ask 2.275731e+01 1.838035e-06 26 18 #> free 2.227345e+01 2.364530e-06 78 105 #> change 2.191881e+01 2.844328e-06 36 33 #> peace 2.187335e+01 2.912502e-06 102 152 #> old 2.157477e+01 3.402983e-06 39 38 #> courage 2.145773e+01 3.617142e-06 23 15 #> centuries 2.144407e+01 3.643011e-06 12 2 #> make 2.126122e+01 4.007586e-06 64 81 #> born 2.105920e+01 4.453103e-06 13 3 #> pledge 2.104609e+01 4.483677e-06 22 14 #> need 2.104505e+01 4.486098e-06 38 37 #> values 2.096503e+01 4.677443e-06 16 7 #> challenge 2.087761e+01 4.895839e-06 14 4 #> hard 2.084299e+01 4.985136e-06 14 5 #> third 2.063821e+01 5.547786e-06 9 0 #> come 2.013522e+01 7.215571e-06 38 38 #> hope 1.957330e+01 9.681281e-06 53 64 #> who 1.942958e+01 1.043780e-05 138 232 #> american 1.909708e+01 1.242349e-05 69 94 #> here 1.894267e+01 1.347059e-05 39 41 #> threat 1.893039e+01 1.355757e-05 11 2 #> tyranny 1.893039e+01 1.355757e-05 11 2 #> go 1.891167e+01 1.369125e-05 26 21 #> nation's 1.876743e+01 1.476673e-05 15 7 #> poverty 1.876743e+01 1.476673e-05 15 7 #> answer 1.862999e+01 1.587044e-05 12 3 #> renew 1.852495e+01 1.676947e-05 13 4 #> voices 1.852495e+01 1.676947e-05 13 4 #> each 1.803028e+01 2.174196e-05 55 70 #> areas 1.796844e+01 2.245982e-05 8 0 #> heal 1.796844e+01 2.245982e-05 8 0 #> define 1.796844e+01 2.245982e-05 8 0 #> compassion 1.796844e+01 2.245982e-05 8 0 #> speaker 1.796844e+01 2.245982e-05 8 0 #> quiet 1.796844e+01 2.245982e-05 8 0 #> don't 1.796844e+01 2.245982e-05 8 0 #> bring 1.791425e+01 2.310843e-05 24 19 #> democracy 1.776756e+01 2.496034e-05 30 28 #> but 1.762136e+01 2.695441e-05 225 429 #> child 1.694883e+01 3.840093e-05 9 1 #> founding 1.694883e+01 3.840093e-05 9 1 #> age 1.688978e+01 3.961438e-05 15 8 #> land 1.682924e+01 4.089821e-05 35 37 #> creed 1.644570e+01 5.006356e-05 10 2 #> bold 1.644570e+01 5.006356e-05 10 2 #> simple 1.639733e+01 5.135748e-05 13 6 #> achieve 1.639733e+01 5.135748e-05 13 6 #> schools 1.631362e+01 5.367665e-05 13 5 #> strive 1.631362e+01 5.367665e-05 13 5 #> friend 1.623771e+01 5.587097e-05 11 3 #> leaders 1.623771e+01 5.587097e-05 11 3 #> learned 1.623771e+01 5.587097e-05 11 3 #> man's 1.623771e+01 5.587097e-05 11 3 #> program 1.623771e+01 5.587097e-05 11 3 #> planet 1.530373e+01 9.153583e-05 7 0 #> sick 1.530373e+01 9.153583e-05 7 0 #> productivity 1.530373e+01 9.153583e-05 7 0 #> decency 1.530373e+01 9.153583e-05 7 0 #> decades 1.530373e+01 9.153583e-05 7 0 #> goals 1.530373e+01 9.153583e-05 7 0 #> \\ 1.530373e+01 9.153583e-05 7 0 #> a 1.494622e+01 1.106196e-04 690 1556 #> factories 1.438799e+01 1.487479e-04 8 1 #> historic 1.438799e+01 1.487479e-04 8 1 #> can 1.416408e+01 1.675387e-04 164 307 #> way 1.411962e+01 1.715452e-04 37 44 #> ceremony 1.399790e+01 1.830146e-04 9 2 #> decent 1.399790e+01 1.830146e-04 9 2 #> everyone 1.399790e+01 1.830146e-04 9 2 #> commitment 1.399790e+01 1.830146e-04 9 2 #> goal 1.399790e+01 1.830146e-04 9 2 #> prayer 1.395366e+01 1.873730e-04 11 4 #> move 1.395366e+01 1.873730e-04 11 4 #> small 1.373881e+01 2.100682e-04 18 14 #> lead 1.325109e+01 2.724220e-04 17 13 #> generations 1.325109e+01 2.724220e-04 17 13 #> is 1.320823e+01 2.787231e-04 458 1004 #> fellow 1.316204e+01 2.856784e-04 45 60 #> life 1.312379e+01 2.915696e-04 56 81 #> human 1.294773e+01 3.203090e-04 41 53 #> instead 1.285562e+01 3.364677e-04 13 8 #> misery 1.264658e+01 3.762541e-04 6 0 #> hunger 1.264658e+01 3.762541e-04 6 0 #> quest 1.264658e+01 3.762541e-04 6 0 #> night 1.264658e+01 3.762541e-04 6 0 #> deepest 1.264658e+01 3.762541e-04 6 0 #> celebration 1.264658e+01 3.762541e-04 6 0 #> civility 1.264658e+01 3.762541e-04 6 0 #> choices 1.264658e+01 3.762541e-04 6 0 #> ensure 1.264658e+01 3.762541e-04 6 0 #> adversaries 1.264658e+01 3.762541e-04 6 0 #> timeless 1.264658e+01 3.762541e-04 6 0 #> bush 1.264658e+01 3.762541e-04 6 0 #> workers 1.264658e+01 3.762541e-04 6 0 #> streets 1.264658e+01 3.762541e-04 6 0 #> budget 1.264658e+01 3.762541e-04 6 0 #> enemies 1.231184e+01 4.500942e-04 12 6 #> sacrifice 1.229513e+01 4.541406e-04 15 11 #> end 1.218465e+01 4.818428e-04 23 23 #> better 1.193654e+01 5.504353e-04 34 42 #> security 1.191654e+01 5.563759e-04 30 35 #> renewal 1.185881e+01 5.738933e-04 7 1 #> basic 1.185881e+01 5.738933e-04 7 1 #> learn 1.175415e+01 6.070759e-04 10 4 #> ideas 1.160081e+01 6.592313e-04 9 3 #> try 1.160081e+01 6.592313e-04 9 3 #> truly 1.160081e+01 6.592313e-04 9 3 #> welcome 1.160081e+01 6.592313e-04 9 3 #> play 1.160081e+01 6.592313e-04 9 3 #> shape 1.159805e+01 6.602100e-04 8 2 #> heroes 1.159805e+01 6.602100e-04 8 2 #> working 1.159805e+01 6.602100e-04 8 2 #> shared 1.159805e+01 6.602100e-04 8 2 #> anew 1.159805e+01 6.602100e-04 8 2 #> reach 1.154638e+01 6.788136e-04 16 13 #> think 1.154638e+01 6.788136e-04 16 13 #> act 1.130290e+01 7.738603e-04 26 29 #> nations 1.125587e+01 7.937174e-04 75 124 #> once 1.112482e+01 8.518013e-04 21 21 #> stand 1.108194e+01 8.717236e-04 27 31 #> chance 1.093915e+01 9.415348e-04 12 8 #> era 1.093915e+01 9.415348e-04 12 8 #> responsibility 1.071869e+01 1.060588e-03 29 35 #> ourselves 1.061219e+01 1.123448e-03 35 46 #> that 1.057691e+01 1.145088e-03 546 1243 #> proud 1.032301e+01 1.313818e-03 11 6 #> policies 1.032301e+01 1.313818e-03 11 6 #> programs 1.000151e+01 1.564120e-03 5 0 #> cold 1.000151e+01 1.564120e-03 5 0 #> brothers 1.000151e+01 1.564120e-03 5 0 #> self 1.000151e+01 1.564120e-03 5 0 #> hungry 1.000151e+01 1.564120e-03 5 0 #> knows 1.000151e+01 1.564120e-03 5 0 #> ended 1.000151e+01 1.564120e-03 5 0 #> hatreds 1.000151e+01 1.564120e-03 5 0 #> dawn 1.000151e+01 1.564120e-03 5 0 #> today's 1.000151e+01 1.564120e-03 5 0 #> breeze 1.000151e+01 1.564120e-03 5 0 #> drugs 1.000151e+01 1.564120e-03 5 0 #> edge 1.000151e+01 1.564120e-03 5 0 #> god's 9.951952e+00 1.606790e-03 10 5 #> building 9.951952e+00 1.606790e-03 10 5 #> few 9.936444e+00 1.620383e-03 17 16 #> light 9.911904e+00 1.642132e-03 15 13 #> say 9.732960e+00 1.809927e-03 20 21 #> fate 9.629743e+00 1.914514e-03 9 4 #> choice 9.576598e+00 1.970732e-03 12 9 #> fight 9.383024e+00 2.190040e-03 8 3 #> allies 9.383024e+00 2.190040e-03 8 3 #> lift 9.383024e+00 2.190040e-03 8 3 #> turning 9.383024e+00 2.190040e-03 8 3 #> watching 9.374828e+00 2.199853e-03 6 1 #> bible 9.374828e+00 2.199853e-03 6 1 #> big 9.374828e+00 2.199853e-03 6 1 #> job 9.374828e+00 2.199853e-03 6 1 #> tens 9.374828e+00 2.199853e-03 6 1 #> feed 9.374828e+00 2.199853e-03 6 1 #> " 9.328459e+00 2.256220e-03 88 158 #> long 9.304221e+00 2.286265e-03 40 58 #> hear 9.262112e+00 2.339431e-03 7 2 #> gift 9.262112e+00 2.339431e-03 7 2 #> loyalty 9.262112e+00 2.339431e-03 7 2 #> [ reached getOption("max.print") -- omitted 9107 rows ]
# compare Trump 2017 to other post-war preseidents pwdfm <- dfm(corpus_subset(data_corpus_inaugural, period == "post-war")) head(textstat_keyness(pwdfm, target = "2017-Trump"), 10)
#> chi2 p n_target n_reference #> protected 76.64466 0.000000e+00 5 1 #> will 51.44795 7.351897e-13 40 299 #> while 48.23022 3.790079e-12 6 7 #> obama 47.85727 4.584000e-12 3 0 #> we've 47.85727 4.584000e-12 3 0 #> america 31.45537 2.040775e-08 18 112 #> again 27.81145 1.337322e-07 9 33 #> everyone 27.67876 1.432269e-07 4 5 #> your 26.67898 2.402201e-07 11 50 #> transferring 25.54569 4.320292e-07 2 0
# using the likelihood ratio method head(textstat_keyness(dfm_smooth(pwdfm), measure = "lr", target = "2017-Trump"), 10)
#> G2 p n_target n_reference #> will 24.604106 7.040156e-07 41 317 #> america 14.040255 1.789387e-04 19 130 #> your 10.435140 1.236402e-03 12 68 #> again 9.758516 1.784939e-03 10 51 #> while 9.504990 2.049139e-03 7 25 #> american 8.877690 2.886766e-03 12 76 #> protected 8.820562 2.978550e-03 6 19 #> back 6.853526 8.846653e-03 7 34 #> you 6.713202 9.570175e-03 14 121 #> country 5.821599 1.583055e-02 10 72