What is behind class coercion of a factor into a character

An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20121022/a74b576c/attachment.pl>
Tal:

There was a recent discussion on this list about this (Sam Steingold
was the OP IIRC).

The issue is ?c . In particular:

"c is sometimes used for its side effect of removing attributes except
names, for example to turn an array into a vector."

Hence, the factor attribute is removed and you get what you saw. As
regards it's "rationale," you may find Bill Dunlap's comments on
"c()'s unfortunate history" relevant. The problem with factors is
"what should concatenation do, anyway?" If a <- factor(c("x", "y"))
and b <- factor(c("y", "z")), what should c(a,b) be? -- There is no
reason to assume that the "y" in a is the same as the "y" in b!

Cheers,
Bert
Hello all,

Please review the following simple code:

# make a factor:
x <- factor(c("one", "two"))
       # what should be the output to the following expression?
c(x, "3")    # <===  ????
       # I expected it to be as the output of:
c(as.character(x), "3")
       # But in fact, the output is what would happen if we had ran the
next line:
c(as.character(as.numeric(x)), "3")
       # p.s: c(x, 3) would of course behave differently...

I imagine the above behavior is a "feature" (not a bug), but I am curious
as to what is the rational behind it.  Is it because of computational
efficiency, or something that fixes some case study?

Thanks,
Tal

        [[alternative HTML version deleted]]

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Bert Gunter
Genentech Nonclinical Biostatistics

Internal Contact Info:
Phone: 467-7374
Website:
http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm
WARNING:  Use with caution!

There is a way to effect the catenation of factors:  The data.frame
method for rbind() does this.  E.g.

set.seed(42)
f1 <- factor(sample(letters[1:3],42,TRUE))
f2 <- factor(sample(letters[1:4],66,TRUE))
d1 <- data.frame(f=f1)
d2 <- data.frame(f=f2)
dd <- rbind(d1,d2)
ff   <- dd[,1]

et voila, ff is the "desired" catenation of f1 and f2.
But heed Bert's words of caution below!

     cheers,

         Rolf Turner
Tal:

There was a recent discussion on this list about this (Sam Steingold
was the OP IIRC).

The issue is ?c . In particular:

"c is sometimes used for its side effect of removing attributes except
names, for example to turn an array into a vector."

Hence, the factor attribute is removed and you get what you saw. As
regards it's "rationale," you may find Bill Dunlap's comments on
"c()'s unfortunate history" relevant. The problem with factors is
"what should concatenation do, anyway?" If a <- factor(c("x", "y"))
and b <- factor(c("y", "z")), what should c(a,b) be? -- There is no
reason to assume that the "y" in a is the same as the "y" in b!

Cheers,
Bert

On Mon, Oct 22, 2012 at 6:46 AM, Tal Galili <tal.galili at gmail.com> wrote:
Hello all,

Please review the following simple code:

# make a factor:
x <- factor(c("one", "two"))
        # what should be the output to the following expression?
c(x, "3")    # <===  ????
        # I expected it to be as the output of:
c(as.character(x), "3")
        # But in fact, the output is what would happen if we had ran the
next line:
c(as.character(as.numeric(x)), "3")
        # p.s: c(x, 3) would of course behave differently...

I imagine the above behavior is a "feature" (not a bug), but I am curious
as to what is the rational behind it.  Is it because of computational
efficiency, or something that fixes some case study?