Sorts a dfm by descending frequency of total features, total features in documents, or both.

dfm_sort(x, decreasing = TRUE, margin = c("features", "documents", "both"))

Arguments

x

Document-feature matrix created by dfm()

decreasing

logical; if TRUE, the sort will be in descending order, otherwise sort in increasing order

margin

which margin to sort on features to sort by frequency of features, documents to sort by total feature counts in documents, and both to sort by both

Value

A sorted dfm matrix object

Author

Ken Benoit

Examples

dfmat <- dfm(tokens(data_corpus_inaugural))
head(dfmat)
#> Document-feature matrix of: 6 documents, 9,437 features (93.83% sparse) and 4 docvars.
#>                  features
#> docs              fellow-citizens  of the senate and house representatives :
#>   1789-Washington               1  71 116      1  48     2               2 1
#>   1793-Washington               0  11  13      0   2     0               0 1
#>   1797-Adams                    3 140 163      1 130     0               2 0
#>   1801-Jefferson                2 104 130      0  81     0               0 1
#>   1805-Jefferson                0 101 143      0  93     0               0 0
#>   1809-Madison                  1  69 104      0  43     0               0 0
#>                  features
#> docs              among vicissitudes
#>   1789-Washington     1            1
#>   1793-Washington     0            0
#>   1797-Adams          4            0
#>   1801-Jefferson      1            0
#>   1805-Jefferson      7            0
#>   1809-Madison        0            0
#> [ reached max_nfeat ... 9,427 more features ]
head(dfm_sort(dfmat))
#> Document-feature matrix of: 6 documents, 9,437 features (93.83% sparse) and 4 docvars.
#>                  features
#> docs              the  of   , and  . to in  a our we
#>   1789-Washington 116  71  70  48 23 48 31 14   1  1
#>   1793-Washington  13  11   5   2  4  5  3  0   0  0
#>   1797-Adams      163 140 201 130 33 72 47 51   6  3
#>   1801-Jefferson  130 104 128  81 37 61 24 21  24 10
#>   1805-Jefferson  143 101 142  93 41 83 35 20  24 13
#>   1809-Madison    104  69  47  43 21 61 34 19   9  2
#> [ reached max_nfeat ... 9,427 more features ]
head(dfm_sort(dfmat, decreasing = FALSE, "both"))
#> Document-feature matrix of: 6 documents, 9,437 features (96.33% sparse) and 4 docvars.
#>                  features
#> docs              notification 14th fondest predilection flattering asylum
#>   1793-Washington            0    0       0            0          0      0
#>   1945-Roosevelt             0    0       0            0          0      0
#>   1865-Lincoln               0    0       0            0          0      0
#>   1905-Roosevelt             0    0       0            0          0      0
#>   1849-Taylor                0    0       0            0          0      0
#>   1829-Jackson               0    0       0            0          0      0
#>                  features
#> docs              interruptions awaken distrustful despondence
#>   1793-Washington             0      0           0           0
#>   1945-Roosevelt              0      0           0           0
#>   1865-Lincoln                0      0           0           0
#>   1905-Roosevelt              0      0           0           0
#>   1849-Taylor                 0      0           0           0
#>   1829-Jackson                0      0           0           0
#> [ reached max_nfeat ... 9,427 more features ]