Look for instances of one or more terms in a set of texts.
text_locate(x, terms, filter = NULL, ...) text_count(x, terms, filter = NULL, ...) text_detect(x, terms, filter = NULL, ...) text_match(x, terms, filter = NULL, ...) text_sample(x, terms, size = NULL, filter = NULL, ...) text_subset(x, terms, filter = NULL, ...)
x | a text or character vector. |
---|---|
terms | a character vector of search terms. |
filter | if non- |
size | the maximum number of results to return, or |
… | additional properties to set on the text filter. |
text_locate
finds all instances of the search terms in the
input text, along with their contexts.
text_count
counts the number of search term instances in
each element of the text vector.
text_detect
indicates whether each text contains at least
one of the search terms.
text_match
reports the matching instances as a factor variable
with levels equal to the terms
argument.
text_subset
returns the texts that contain the search terms.
text_sample
returns a random sample of the results from
text_locate
, in random order. This is this is useful for
hand-inspecting a subset of the text_locate
matches.
text_count
and text_detect
return a numeric vector and
a logical vector, respectively, with length equal to the number of input
texts and names equal to the text names.
text_locate
and text_sample
both return a data frame with
one row for each search result and columns named ‘text’, ‘before’,
‘instance’, and ‘after’. The ‘text’ column gives
the name of the text containing the instance; ‘before’ and
‘after’ are text vectors giving the text before and after the
instance. The ‘instance’ column gives the token or tokens matching
the search term.
text_match
returns a data frame for one row for each search result,
with columns names ‘text’ and ‘term’. Both columns are
factors. The ‘text’ column has levels equal to the text labels,
and the ‘term’ column has levels equal to terms
argument.
text_subset
returns the subset of texts that contain the given
search terms. The resulting has its text_filter
set to the
passed-in filter
argument.
text <- c("Rose is a rose is a rose is a rose.", "A rose by any other name would smell as sweet.", "Snow White and Rose Red") text_count(text, "rose")#> [1] 4 1 1text_detect(text, "rose")#> [1] TRUE TRUE TRUEtext_locate(text, "rose")#> text before instance after #> 1 1 Rose is a rose is a rose is a rose. #> 2 1 Rose is a rose is a rose is a rose. #> 3 1 Rose is a rose is a rose is a rose. #> 4 1 Rose is a rose is a rose is a rose . #> 5 2 A rose by any other name would smell … #> 6 3 Snow White and Rose Redtext_match(text, "rose")#> text term #> 1 1 rose #> 2 1 rose #> 3 1 rose #> 4 1 rose #> 5 2 rose #> 6 3 rosetext_sample(text, "rose", 3)#> text before instance after #> 1 1 Rose is a rose is a rose is a rose. #> 2 2 A rose by any other name would smell … #> 3 1 Rose is a rose is a rose is a rose.text_subset(text, "a rose")#> [1] "Rose is a rose is a rose is a rose." #> [2] "A rose by any other name would smell as sweet."# search for multiple terms text_locate(text, c("rose", "rose red", "snow white"))#> text before instance after #> 1 1 Rose is a rose is a rose is a rose… #> 2 1 Rose is a rose is a rose is a rose. #> 3 1 Rose is a rose is a rose is a rose. #> 4 1 …ose is a rose is a rose is a rose . #> 5 2 A rose by any other name would smell… #> 6 3 Snow White and Rose Red #> 7 3 Snow White and Rose Red #> 8 3 Snow White and Rose Red