Why is there no c.factor?
Dear Thomas and Hadley, I'd propose the following: If the sets of levels of all arguments are the same, then c.factor() would return a factor with the common set of levels; if the sets of levels differ, then, as Hadley suggests, the level-set of the result would be the union of sets of levels of the arguments, but a warning would be issued. Best, John
-----Original Message----- From: r-devel-bounces at r-project.org [mailto:r-devel-bounces at r-project.org]
On
Behalf Of Thomas Lumley Sent: February-04-10 12:07 PM To: Hadley Wickham Cc: r-devel at r-project.org Subject: Re: [Rd] Why is there no c.factor? On Thu, 4 Feb 2010, Hadley Wickham wrote:
Hi all,
Is there are reason that there is no c.factor method? Analogous to
c.Date, I'd expect something like the following to be useful:
c.factor <- function(...) {
factors <- list(...)
levels <- unique(unlist(lapply(factors, levels)))
char <- unlist(lapply(factors, as.character))
factor(char, levels = levels)
}
c(factor("a"), factor("b"), factor(c("c", "b","a")), factor("d"))
# [1] a b c b a d
# Levels: a b c d
It's well established that different people have different views on what factors should do, but this doesn't match mine. I think of factors as enumerated data types where the factor levels already specify all the
valid
values for the factor, so I wouldn't want to be able to combine two
factors
with different sets of levels.
For example:
A <- factor("orange",levels=c("orange","yellow","red","purple"))
B <- factor("orange", levels=c("orange","apple","mango", "banananana"))
On the other hand, I think the current behaviour, which reduces them to
numbers, is just wrong.
-thomas
Thomas Lumley Assoc. Professor, Biostatistics
tlumley at u.washington.edu University of Washington, Seattle
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel