Get or measure the set of types (unique token values).
text_types(x, filter = NULL, collapse = FALSE, ...) text_ntype(x, filter = NULL, collapse = FALSE, ...)
x | a text or character vector. |
---|---|
filter | if non- |
collapse | a logical value indicating whether to collapse the aggregation over all rows of the input. |
… | additional properties to set on the text filter. |
text_ntype
counts the number of unique types in each text;
text_types
returns the set of unique types, as a character
vector. Types are determined according to the filter
argument.
If collapse = FALSE
, then text_ntype
produces a numeric
vector with the same length and names as the input text, with the elements
giving the number of units in the corresponding texts. For
text_types
, the result is a list of character vector with each
vector giving the unique types in the corresponding text, ordered
according to the sort
function.
If collapse = TRUE
, then we aggregate over all rows of the input.
In this case, text_ntype
produces a scalar indicating the number
of unique types in x
, and text_types
produces a character
vector with the unique types.
text <- c("I saw Mr. Jones today.", "Split across\na line.", "What. Are. You. Doing????", "She asked 'do you really mean that?' and I said 'yes.'") # count the number of unique types text_ntype(text)#> [1] 6 5 6 14text_ntype(text, collapse = TRUE)#> [1] 25# get the type sets text_types(text)#> [[1]] #> [1] "." "i" "jones" "mr" "saw" "today" #> #> [[2]] #> [1] "." "a" "across" "line" "split" #> #> [[3]] #> [1] "." "?" "are" "doing" "what" "you" #> #> [[4]] #> [1] "'" "." "?" "and" "asked" "do" "i" "mean" #> [9] "really" "said" "she" "that" "yes" "you" #>text_types(text, collapse = TRUE)#> [1] "'" "." "?" "a" "across" "and" "are" "asked" #> [9] "do" "doing" "i" "jones" "line" "mean" "mr" "really" #> [17] "said" "saw" "she" "split" "that" "today" "what" "yes" #> [25] "you"