Stem a set of terms using one of the algorithms provided by the Snowball stemming library.
stem_snowball(x, algorithm = "en")
| x | character vector of terms to stem. |
|---|---|
| algorithm | stemming algorithm; see ‘Details’ for the valid choices. |
Apply a Snowball stemming algorithm to a vector of input terms, x,
returning the result in a character vector of the same length with the
same names.
The algorithm argument specifies the stemming algorithm. Valid choices
include the following:
"ar" ("arabic"),
"da" ("danish"),
"de" ("german"),
"en" ("english"),
"es" ("spanish"),
"fi" ("finnish"),
"fr" ("french"),
"hu" ("hungarian"),
"it" ("italian"),
"nl" ("dutch"),
"no" ("norwegian"),
"pt" ("portuguese"),
"ro" ("romanian"),
"ru" ("russian"),
"sv" ("swedish"),
"ta" ("tamil"),
"tr" ("turkish"),
and "porter".
Setting algorithm = NULL gives a stemmer that returns its input
unchanged.
The function only stems single-word terms of kind "letter"; it leaves other inputs (multi-word terms, and terms of kind "number", "punct", and "symbol") unchanged.
The Snowball stemming library
provides the underlying implementation. The wordStem function from
the SnowballC package provides a similar interface, but that function
applies the algorithm to all input terms, regardless of the kind of the term.
A character vector the same length and names as the input, x, with
entries containing the corresponding stems.
# apply english stemming algorithm; don't stem non-letter terms stem_snowball(c("win", "winning", "winner", "#winning"))#> [1] "win" "win" "winner" "#winning"# compare with SnowballC, which stems all kinds, not just letter# NOT RUN { SnowballC::wordStem(c("win", "winning", "winner", "#winning"), "en") # }