Get or measure the set of types (unique token values).

text_types(x, filter = NULL, collapse = FALSE, ...)

text_ntype(x, filter = NULL, collapse = FALSE, ...)

## Arguments

x a text or character vector. if non-NULL, a text filter to to use instead of the default text filter for x. a logical value indicating whether to collapse the aggregation over all rows of the input. additional properties to set on the text filter.

## Details

text_ntype counts the number of unique types in each text; text_types returns the set of unique types, as a character vector. Types are determined according to the filter argument.

## Value

If collapse = FALSE, then text_ntype produces a numeric vector with the same length and names as the input text, with the elements giving the number of units in the corresponding texts. For text_types, the result is a list of character vector with each vector giving the unique types in the corresponding text, ordered according to the sort function.

If collapse = TRUE, then we aggregate over all rows of the input. In this case, text_ntype produces a scalar indicating the number of unique types in x, and text_types produces a character vector with the unique types.

text_filter, text_tokens.

## Examples

text <- c("I saw Mr. Jones today.",
"Split across\na line.",
"What. Are. You. Doing????",
"She asked 'do you really mean that?' and I said 'yes.'")

# count the number of unique types
text_ntype(text)#> [1]  6  5  6 14text_ntype(text, collapse = TRUE)#> [1] 25
# get the type sets
text_types(text)#> [[1]]
#> [1] "."     "i"     "jones" "mr"    "saw"   "today"
#>
#> [[2]]
#> [1] "."      "a"      "across" "line"   "split"
#>
#> [[3]]
#> [1] "."     "?"     "are"   "doing" "what"  "you"
#>
#> [[4]]
#>  [1] "'"      "."      "?"      "and"    "asked"  "do"     "i"      "mean"
#>  [9] "really" "said"   "she"    "that"   "yes"    "you"
#> text_types(text, collapse = TRUE)#>  [1] "'"      "."      "?"      "a"      "across" "and"    "are"    "asked"
#>  [9] "do"     "doing"  "i"      "jones"  "line"   "mean"   "mr"     "really"
#> [17] "said"   "saw"    "she"    "split"  "that"   "today"  "what"   "yes"
#> [25] "you"