Create or test for corpus objects.

corpus_frame(..., row.names = NULL, filter = NULL)

as_corpus_frame(x, filter = NULL, ..., row.names = NULL)

is_corpus_frame(x)

Arguments

data frame columns for corpus_frame; further arguments passed to as_corpus_text from as_corpus_frame.

row.names

character vector of row names for the corpus object.

filter

text filter object for the "text" column in the corpus object.

x

object to be coerced or tested.

Details

These functions create or convert another object to a corpus object. A corpus object is just a data frame with special functions for printing, and a column names "text" of type "corpus_text".

corpus has similar semantics to the data.frame function, except that string columns do not get converted to factors.

as_corpus_frame converts another object to a corpus data frame object. By default, the method converts x to a data frame with a column named "text" of type "corpus_text", and sets the class attribute of the result to c("corpus_frame", "data.frame").

is_corpus_frame tests whether x is a data frame with a column named "text" of type "corpus_text".

as_corpus_frame is generic: you can write methods to handle specific classes of objects.

Value

corpus_frame creates a data frame with a column named "text" of type "corpus_text", and a class attribute set to c("corpus_frame", "data.frame").

as_corpus_frame attempts to coerce its argument to a corpus data frame object, setting the row.names and calling as_corpus_text on the "text" column with the filter and arguments.

is_corpus_frame returns TRUE or FALSE depending on whether its argument is a valid corpus object or not.

See also

corpus-package, print.corpus_frame, corpus_text, read_ndjson.

Examples

# convert a data frame: emoji <- data.frame(text = sapply(0x1f600 + 1:30, intToUtf8), stringsAsFactors = FALSE) as_corpus_frame(emoji)
#> text #> 1 😁​ #> 2 😂​ #> 3 😃​ #> 4 😄​ #> 5 😅​ #> 6 😆​ #> 7 😇​ #> 8 😈​ #> 9 😉​ #> 10 😊​ #> 11 😋​ #> 12 😌​ #> 13 😍​ #> 14 😎​ #> 15 😏​ #> 16 😐​ #> 17 😑​ #> 18 😒​ #> 19 😓​ #> 20 😔​ #> ⋮ (30 rows total)
# construct directly (no need for stringsAsFactors = FALSE): corpus_frame(text = sapply(0x1f600 + 1:30, intToUtf8))
#> text #> 1 😁​ #> 2 😂​ #> 3 😃​ #> 4 😄​ #> 5 😅​ #> 6 😆​ #> 7 😇​ #> 8 😈​ #> 9 😉​ #> 10 😊​ #> 11 😋​ #> 12 😌​ #> 13 😍​ #> 14 😎​ #> 15 😏​ #> 16 😐​ #> 17 😑​ #> 18 😒​ #> 19 😓​ #> 20 😔​ #> ⋮ (30 rows total)
# convert a character vector: as_corpus_frame(c(a = "goodnight", b = "moon")) # keeps names
#> text #> a goodnight #> b moon
as_corpus_frame(c(a = "goodnight", b = "moon"), row.names = NULL) # drops names
#> text #> a goodnight #> b moon