Create or test for corpus objects.
corpus_frame(..., row.names = NULL, filter = NULL) as_corpus_frame(x, filter = NULL, ..., row.names = NULL) is_corpus_frame(x)
| … | data frame columns for |
|---|---|
| row.names | character vector of row names for the corpus object. |
| filter | text filter object for the |
| x | object to be coerced or tested. |
These functions create or convert another object to a corpus object.
A corpus object is just a data frame with special functions for
printing, and a column names "text" of type "corpus_text".
corpus has similar semantics to the data.frame
function, except that string columns do not get converted to factors.
as_corpus_frame converts another object to a corpus data frame
object. By default, the method converts x to a data frame with
a column named "text" of type "corpus_text", and sets the
class attribute of the result to c("corpus_frame", "data.frame").
is_corpus_frame tests whether x is a data frame with a column
named "text" of type "corpus_text".
as_corpus_frame is generic: you can write methods to
handle specific classes of objects.
corpus_frame creates a data frame with a column named "text"
of type "corpus_text", and a class attribute set to
c("corpus_frame", "data.frame").
as_corpus_frame attempts to coerce its argument to a corpus
data frame object, setting the row.names and calling
as_corpus_text on the "text" column with
the filter and … arguments.
is_corpus_frame returns TRUE or FALSE depending on
whether its argument is a valid corpus object or not.
corpus-package, print.corpus_frame,
corpus_text, read_ndjson.
# convert a data frame: emoji <- data.frame(text = sapply(0x1f600 + 1:30, intToUtf8), stringsAsFactors = FALSE) as_corpus_frame(emoji)#> text #> 1 😁 #> 2 😂 #> 3 😃 #> 4 😄 #> 5 😅 #> 6 😆 #> 7 😇 #> 8 😈 #> 9 😉 #> 10 😊 #> 11 😋 #> 12 😌 #> 13 😍 #> 14 😎 #> 15 😏 #> 16 😐 #> 17 😑 #> 18 😒 #> 19 😓 #> 20 😔 #> ⋮ (30 rows total)# construct directly (no need for stringsAsFactors = FALSE): corpus_frame(text = sapply(0x1f600 + 1:30, intToUtf8))#> text #> 1 😁 #> 2 😂 #> 3 😃 #> 4 😄 #> 5 😅 #> 6 😆 #> 7 😇 #> 8 😈 #> 9 😉 #> 10 😊 #> 11 😋 #> 12 😌 #> 13 😍 #> 14 😎 #> 15 😏 #> 16 😐 #> 17 😑 #> 18 😒 #> 19 😓 #> 20 😔 #> ⋮ (30 rows total)# convert a character vector: as_corpus_frame(c(a = "goodnight", b = "moon")) # keeps names#> text #> a goodnight #> b moonas_corpus_frame(c(a = "goodnight", b = "moon"), row.names = NULL) # drops names#> text #> a goodnight #> b moon