Get or measure the set of types (unique token values).
text_types(x, filter = NULL, collapse = FALSE, ...) text_ntype(x, filter = NULL, collapse = FALSE, ...)
| x | a text or character vector. |
|---|---|
| filter | if non- |
| collapse | a logical value indicating whether to collapse the aggregation over all rows of the input. |
| … | additional properties to set on the text filter. |
text_ntype counts the number of unique types in each text;
text_types returns the set of unique types, as a character
vector. Types are determined according to the filter argument.
If collapse = FALSE, then text_ntype produces a numeric
vector with the same length and names as the input text, with the elements
giving the number of units in the corresponding texts. For
text_types, the result is a list of character vector with each
vector giving the unique types in the corresponding text, ordered
according to the sort function.
If collapse = TRUE, then we aggregate over all rows of the input.
In this case, text_ntype produces a scalar indicating the number
of unique types in x, and text_types produces a character
vector with the unique types.
text <- c("I saw Mr. Jones today.", "Split across\na line.", "What. Are. You. Doing????", "She asked 'do you really mean that?' and I said 'yes.'") # count the number of unique types text_ntype(text)#> [1] 6 5 6 14text_ntype(text, collapse = TRUE)#> [1] 25# get the type sets text_types(text)#> [[1]] #> [1] "." "i" "jones" "mr" "saw" "today" #> #> [[2]] #> [1] "." "a" "across" "line" "split" #> #> [[3]] #> [1] "." "?" "are" "doing" "what" "you" #> #> [[4]] #> [1] "'" "." "?" "and" "asked" "do" "i" "mean" #> [9] "really" "said" "she" "that" "yes" "you" #>text_types(text, collapse = TRUE)#> [1] "'" "." "?" "a" "across" "and" "are" "asked" #> [9] "do" "doing" "i" "jones" "line" "mean" "mr" "really" #> [17] "said" "saw" "she" "split" "that" "today" "what" "yes" #> [25] "you"