Apply varieties of term frequency weightings to a dfm.

tf(x, scheme = c("count", "prop", "propmax", "boolean", "log", "augmented",
  "logave"), base = 10, K = 0.5)

Arguments

x

object for which idf or tf-idf will be computed (a document-feature matrix)

scheme

divisor for the normalization of feature frequencies by document. Valid types include:

count

default, each feature count will remain as feature counts, equivalent to dividing by 1

prop

feature proportions within document, equivalent to dividing each term by the total count of features in the document.

propmax

feature proportions relative to the most frequent term of the document, equivalent to dividing term counts by the frequency of the most frequent term in the document.

boolean

recode all non-zero counts as 1

log

take the logarithm of 1 + each count, for base base

augmented

equivalent to K + (1 - K) * tf(x, "propmax")

logave

(1 + the log of the counts) / (1 + log of the counts / the average count within document)

base

base for the logarithm when scheme is "log" or logave

K

the K for the augmentation when scheme = "augmented"

Value

A document feature matrix to which the weighting scheme has been applied.

Details

tf(x, scheme = "prop") is equivalent to weight(x, "relFreq")).

References

Manning, C. D., Raghavan, P., & Schutze, H. (2008). Introduction to Information Retrieval. Cambridge University Press. https://en.wikipedia.org/wiki/Tf-idf#Term_frequency_2