textmodel_wordscores implements Laver, Benoit and Garry's (2003) "Wordscores" method for scaling texts on a single dimension, given a set of anchoring or reference texts whose values are set through reference scores. This scale can be fitted in the linear space (as per LBG 2003) or in the logit space (as per Beauchamp 2012). Estimates of virgin or unknown texts are obtained using the predict() method to score documents from a fitted textmodel_wordscores object.

textmodel_wordscores(x, y, scale = c("linear", "logit"), smooth = 0)

Arguments

x

the dfm on which the model will be trained

y

vector of training scores associated with each document in x

scale

scale on which to score the words; "linear" for classic LBG linear posterior weighted word class differences, or "logit" for log posterior differences

smooth

a smoothing parameter for word counts; defaults to zero to match the LBG (2003) method. See Value below for additional information on the behaviour of this argument.

Value

A fitted textmodel_wordscores object. This object will contain a copy of the input data, but in its original form without any smoothing applied. Calling predict.textmodel_wordscores on this object without specifying a value for newdata, for instance, will predict on the unsmoothed object. This behaviour differs from versions of quanteda <= 1.2.

Details

The textmodel_wordscores() function and the associated predict() method are designed to function in the same manner as predict.lm. coef() can also be used to extract the word coefficients from the fitted textmodel_wordscores object, and summary() will print a nice summary of the fitted object.

References

Laver, M., Benoit, K.R., & Garry, J. (2003). Estimating Policy Positions from Political Text using Words as Data. American Political Science Review, 97(2), 311--331.

Beauchamp, N. (2012). Using Text to Scale Legislatures with Uninformative Voting. New York University Mimeo.

Martin, L.W. & Vanberg, G. (2007). A Robust Transformation Procedure for Interpreting Political Text. Political Analysis 16(1), 93--100.

See also

predict.textmodel_wordscores for methods of applying a fitted textmodel_wordscores model object to predict quantities from (other) documents.

Examples

(tmod <- textmodel_wordscores(data_dfm_lbgexample, y = c(seq(-1.5, 1.5, .75), NA)))
#> #> Call: #> textmodel_wordscores.dfm(x = data_dfm_lbgexample, y = c(seq(-1.5, #> 1.5, 0.75), NA)) #> #> Scale: linear; 5 reference scores; 37 scored features.
summary(tmod)
#> #> Call: #> textmodel_wordscores.dfm(x = data_dfm_lbgexample, y = c(seq(-1.5, #> 1.5, 0.75), NA)) #> #> Reference Document Statistics: #> score total min max mean median #> R1 -1.50 1000 0 158 27.03 0 #> R2 -0.75 1000 0 158 27.03 0 #> R3 0.00 1000 0 158 27.03 0 #> R4 0.75 1000 0 158 27.03 0 #> R5 1.50 1000 0 158 27.03 0 #> V1 NA 1000 0 158 27.03 0 #> #> Wordscores: #> (showing first 30 elements) #> A B C D E F G H I J #> -1.5000 -1.5000 -1.5000 -1.5000 -1.5000 -1.4812 -1.4809 -1.4519 -1.4083 -1.3233 #> K L M N O P Q R S T #> -1.1846 -1.0370 -0.8806 -0.7500 -0.6194 -0.4508 -0.2992 -0.1306 0.0000 0.1306 #> U V W X Y Z ZA ZB ZC ZD #> 0.2992 0.4508 0.6194 0.7500 0.8806 1.0370 1.1846 1.3233 1.4083 1.4519
coef(tmod)
#> A B C D E F G #> -1.5000000 -1.5000000 -1.5000000 -1.5000000 -1.5000000 -1.4812500 -1.4809322 #> H I J K L M N #> -1.4519231 -1.4083333 -1.3232984 -1.1846154 -1.0369898 -0.8805970 -0.7500000 #> O P Q R S T U #> -0.6194030 -0.4507576 -0.2992424 -0.1305970 0.0000000 0.1305970 0.2992424 #> V W X Y Z ZA ZB #> 0.4507576 0.6194030 0.7500000 0.8805970 1.0369898 1.1846154 1.3232984 #> ZC ZD ZE ZF ZG ZH ZI #> 1.4083333 1.4519231 1.4809322 1.4812500 1.5000000 1.5000000 1.5000000 #> ZJ ZK #> 1.5000000 1.5000000
predict(tmod)
#> R1 R2 R3 R4 R5 #> -1.317931e+00 -7.395598e-01 -8.673617e-18 7.395598e-01 1.317931e+00 #> V1 #> -4.480591e-01
predict(tmod, rescaling = "lbg")
#> R1 R2 R3 R4 R5 V1 #> -1.58967683 -0.88488724 0.01632248 0.91753220 1.62232179 -0.52967149
predict(tmod, se.fit = TRUE, interval = "confidence", rescaling = "mv")
#> Warning: More than two reference scores found with MV rescaling; using only min, max values.
#> $fit #> fit lwr upr #> R1 -1.5000000 -1.51494501 -1.48505499 #> R2 -0.8417280 -0.86723325 -0.81622274 #> R3 0.0000000 -0.02678045 0.02678045 #> R4 0.8417280 0.81622274 0.86723325 #> R5 1.5000000 1.48505499 1.51494501 #> V1 -0.5099572 -0.53649769 -0.48341678 #> #> $se.fit #> R1 R2 R3 R4 R5 V1 #> 0.007625147 0.013013126 0.013663743 0.013013126 0.007625147 0.013541297 #>