textmodel_wordscores.Rd
textmodel_wordscores
implements Laver, Benoit and Garry's (2003)
"Wordscores" method for scaling texts on a single dimension, given a set of
anchoring or reference texts whose values are set through reference
scores. This scale can be fitted in the linear space (as per LBG 2003) or in
the logit space (as per Beauchamp 2012). Estimates of virgin or
unknown texts are obtained using the predict()
method to score
documents from a fitted textmodel_wordscores
object.
textmodel_wordscores(x, y, scale = c("linear", "logit"), smooth = 0)
x | the dfm on which the model will be trained |
---|---|
y | vector of training scores associated with each document
in |
scale | scale on which to score the words; |
smooth | a smoothing parameter for word counts; defaults to zero to match the LBG (2003) method. See Value below for additional information on the behaviour of this argument. |
A fitted textmodel_wordscores
object. This object will
contain a copy of the input data, but in its original form without any
smoothing applied. Calling predict.textmodel_wordscores
on
this object without specifying a value for newdata
, for instance,
will predict on the unsmoothed object. This behaviour differs from
versions of quanteda <= 1.2.
The textmodel_wordscores()
function and the associated
predict()
method are designed
to function in the same manner as predict.lm
.
coef()
can also be used to extract the word coefficients from the
fitted textmodel_wordscores
object, and summary()
will print
a nice summary of the fitted object.
Laver, M., Benoit, K.R., & Garry, J. (2003). Estimating Policy Positions from Political Text using Words as Data. American Political Science Review, 97(2), 311--331.
Beauchamp, N. (2012). Using Text to Scale Legislatures with Uninformative Voting. New York University Mimeo.
Martin, L.W. & Vanberg, G. (2007). A Robust Transformation Procedure for Interpreting Political Text. Political Analysis 16(1), 93--100.
predict.textmodel_wordscores
for methods of applying a
fitted textmodel_wordscores model object to predict quantities from
(other) documents.
#> #> Call: #> textmodel_wordscores.dfm(x = data_dfm_lbgexample, y = c(seq(-1.5, #> 1.5, 0.75), NA)) #> #> Scale: linear; 5 reference scores; 37 scored features.summary(tmod)#> #> Call: #> textmodel_wordscores.dfm(x = data_dfm_lbgexample, y = c(seq(-1.5, #> 1.5, 0.75), NA)) #> #> Reference Document Statistics: #> score total min max mean median #> R1 -1.50 1000 0 158 27.03 0 #> R2 -0.75 1000 0 158 27.03 0 #> R3 0.00 1000 0 158 27.03 0 #> R4 0.75 1000 0 158 27.03 0 #> R5 1.50 1000 0 158 27.03 0 #> V1 NA 1000 0 158 27.03 0 #> #> Wordscores: #> (showing first 30 elements) #> A B C D E F G H I J #> -1.5000 -1.5000 -1.5000 -1.5000 -1.5000 -1.4812 -1.4809 -1.4519 -1.4083 -1.3233 #> K L M N O P Q R S T #> -1.1846 -1.0370 -0.8806 -0.7500 -0.6194 -0.4508 -0.2992 -0.1306 0.0000 0.1306 #> U V W X Y Z ZA ZB ZC ZD #> 0.2992 0.4508 0.6194 0.7500 0.8806 1.0370 1.1846 1.3233 1.4083 1.4519coef(tmod)#> A B C D E F G #> -1.5000000 -1.5000000 -1.5000000 -1.5000000 -1.5000000 -1.4812500 -1.4809322 #> H I J K L M N #> -1.4519231 -1.4083333 -1.3232984 -1.1846154 -1.0369898 -0.8805970 -0.7500000 #> O P Q R S T U #> -0.6194030 -0.4507576 -0.2992424 -0.1305970 0.0000000 0.1305970 0.2992424 #> V W X Y Z ZA ZB #> 0.4507576 0.6194030 0.7500000 0.8805970 1.0369898 1.1846154 1.3232984 #> ZC ZD ZE ZF ZG ZH ZI #> 1.4083333 1.4519231 1.4809322 1.4812500 1.5000000 1.5000000 1.5000000 #> ZJ ZK #> 1.5000000 1.5000000predict(tmod)#> R1 R2 R3 R4 R5 #> -1.317931e+00 -7.395598e-01 -8.673617e-18 7.395598e-01 1.317931e+00 #> V1 #> -4.480591e-01#> R1 R2 R3 R4 R5 V1 #> -1.58967683 -0.88488724 0.01632248 0.91753220 1.62232179 -0.52967149#> Warning: More than two reference scores found with MV rescaling; using only min, max values.#> $fit #> fit lwr upr #> R1 -1.5000000 -1.51494501 -1.48505499 #> R2 -0.8417280 -0.86723325 -0.81622274 #> R3 0.0000000 -0.02678045 0.02678045 #> R4 0.8417280 0.81622274 0.86723325 #> R5 1.5000000 1.48505499 1.51494501 #> V1 -0.5099572 -0.53649769 -0.48341678 #> #> $se.fit #> R1 R2 R3 R4 R5 V1 #> 0.007625147 0.013013126 0.013663743 0.013013126 0.007625147 0.013541297 #>