Lists of common function words (‘stop’ words).

stopwords_da
stopwords_de
stopwords_en
stopwords_es
stopwords_fi
stopwords_fr
stopwords_hu
stopwords_it
stopwords_nl
stopwords_no
stopwords_pt
stopwords_ru
stopwords_sv

Details

The stopwords_ objects are character vectors of case-folded ‘stop’ words. These are common function words that often get discarded before performing other text analysis tasks.

There are lists available for the following languages: Danish (stopwords_da), Dutch (stopwords_nl), English (stopwords_en), Finnish (stopwords_fi), French (stopwords_fr, German (stopwords_de) Hungarian (stopwords_hu), Italian (stopwords_it), Norwegian (stopwords_no), Portuguese (stopwords_pt), Russian (stopwords_ru), Spanish (stopwords_es), and Swedish (stopwords_sv).

These built-in word lists are reasonable defaults, but they may require further tailoring to suit your particular task. The original lists were compiled by the Snowball stemming project. Following the Quanteda text analysis software, we have tailored the original lists by adding the word "will" to the English list.

Format

A character vector of unique stop words.

See also

text_filter