Returns a count of the number of syllables in texts. For English
words, the syllable count is exact and looked up from the CMU pronunciation
dictionary, from the default syllable dictionary
For any word not in the dictionary, the syllable count is estimated by
counting vowel clusters.
data_int_syllables is a quanteda-supplied data object consisting of a
named numeric vector of syllable counts for the words used as names. This
is the default object used to count English syllables. This object that
can be accessed directly, but we strongly encourage you to access it only
nsyllable() wrapper function.
nsyllable(x, syllable_dictionary = quanteda::data_int_syllables, use.names = FALSE)
character vector or
optional named integer vector of syllable counts where
the names are lower case tokens. When set to
x is a character vector, a named numeric vector of the
counts of the syllables in each element. If
x is a tokens
object, return a list of syllable counts where each list element corresponds
to the tokens in a document.
All tokens are automatically converted to lowercase to perform the
matching with the syllable dictionary, so there is no need to perform this
step prior to calling
`nsyllable()` only works reliably for English, as the only syllable count
dictionary we could find is the freely available CMU pronunciation
http://www.speech.cs.cmu.edu/cgi-bin/cmudict. If you
have a dictionary for another language, please email the package
maintainer as we would love to include it.
# character nsyllable(c("cat", "syllable", "supercalifragilisticexpialidocious", "Brexit", "Administration"), use.names = TRUE)#> cat syllable #> 1 3 #> supercalifragilisticexpialidocious Brexit #> 13 2 #> Administration #> 5# tokens txt <- c(doc1 = "This is an example sentence.", doc2 = "Another of two sample sentences.") nsyllable(tokens(txt, remove_punct = TRUE))#> $doc1 #>  1 1 1 3 2 #> #> $doc2 #>  3 1 1 2 3 #># punctuation is not counted nsyllable(tokens(txt), use.names = TRUE)#> $doc1 #> This is an example sentence . #> 1 1 1 3 2 NA #> #> $doc2 #> Another of two sample sentences . #> 3 1 1 2 3 NA #>