Create or test for corpus objects.
corpus_frame(..., row.names = NULL, filter = NULL) as_corpus_frame(x, filter = NULL, ..., row.names = NULL) is_corpus_frame(x)
… | data frame columns for |
---|---|
row.names | character vector of row names for the corpus object. |
filter | text filter object for the |
x | object to be coerced or tested. |
These functions create or convert another object to a corpus object.
A corpus object is just a data frame with special functions for
printing, and a column names "text"
of type "corpus_text"
.
corpus
has similar semantics to the data.frame
function, except that string columns do not get converted to factors.
as_corpus_frame
converts another object to a corpus data frame
object. By default, the method converts x
to a data frame with
a column named "text"
of type "corpus_text"
, and sets the
class attribute of the result to c("corpus_frame", "data.frame")
.
is_corpus_frame
tests whether x
is a data frame with a column
named "text"
of type "corpus_text"
.
as_corpus_frame
is generic: you can write methods to
handle specific classes of objects.
corpus_frame
creates a data frame with a column named "text"
of type "corpus_text"
, and a class attribute set to
c("corpus_frame", "data.frame")
.
as_corpus_frame
attempts to coerce its argument to a corpus
data frame object, setting the row.names
and calling
as_corpus_text
on the "text"
column with
the filter
and …
arguments.
is_corpus_frame
returns TRUE
or FALSE
depending on
whether its argument is a valid corpus object or not.
corpus-package
, print.corpus_frame
,
corpus_text
, read_ndjson
.
# convert a data frame: emoji <- data.frame(text = sapply(0x1f600 + 1:30, intToUtf8), stringsAsFactors = FALSE) as_corpus_frame(emoji)#> text #> 1 😁 #> 2 😂 #> 3 😃 #> 4 😄 #> 5 😅 #> 6 😆 #> 7 😇 #> 8 😈 #> 9 😉 #> 10 😊 #> 11 😋 #> 12 😌 #> 13 😍 #> 14 😎 #> 15 😏 #> 16 😐 #> 17 😑 #> 18 😒 #> 19 😓 #> 20 😔 #> ⋮ (30 rows total)# construct directly (no need for stringsAsFactors = FALSE): corpus_frame(text = sapply(0x1f600 + 1:30, intToUtf8))#> text #> 1 😁 #> 2 😂 #> 3 😃 #> 4 😄 #> 5 😅 #> 6 😆 #> 7 😇 #> 8 😈 #> 9 😉 #> 10 😊 #> 11 😋 #> 12 😌 #> 13 😍 #> 14 😎 #> 15 😏 #> 16 😐 #> 17 😑 #> 18 😒 #> 19 😓 #> 20 😔 #> ⋮ (30 rows total)# convert a character vector: as_corpus_frame(c(a = "goodnight", b = "moon")) # keeps names#> text #> a goodnight #> b moonas_corpus_frame(c(a = "goodnight", b = "moon"), row.names = NULL) # drops names#> text #> a goodnight #> b moon