tapply bug? - levels of a factor in a data frame after tapply are intermixed
Both Greg and Marc - thank you so much! It helped a lot. What I just discovered also works (similar to Greg's suggestions) is to make it first a character and THEN to do: as.factor(as.numeric(original character vector))). Wow! R never stops surprizing one - and I am just in the beginning of the journey! Thank you! Dimitri
On Fri, Feb 13, 2009 at 1:13 PM, Greg Snow <Greg.Snow at imail.org> wrote:
It comes down to 2 simple rules: 1. If you don't care about the order of the factor levels, then it doesn't matter how R codes the relationship 2. If you do care about the order, then tell R what order you want. Consider the following:
x <- c(9,3,15,9,15,9,3) factor(x)
[1] 9 3 15 9 15 9 3 Levels: 3 9 15
factor(as.character(x))
[1] 9 3 15 9 15 9 3 Levels: 15 3 9
factor(x, levels=unique(x))
[1] 9 3 15 9 15 9 3 Levels: 9 3 15 The last looks most like what you want, but for many uses, all 3 will give equivalent results. Hope this helps, -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare greg.snow at imail.org 801.408.8111
-----Original Message----- From: r-help-bounces at r-project.org [mailto:r-help-bounces at r- project.org] On Behalf Of Dimitri Liakhovitski Sent: Friday, February 13, 2009 10:54 AM To: marc_schwartz at comcast.net Cc: R-Help List Subject: Re: [R] tapply bug? - levels of a factor in a data frame after tapply are intermixed Sorry - one clarification: When I run:
test$xx - the what I am currently seeing is:
[1] 9 3 15 Levels: 3 9 15 But what I am expecting to be seeing is: [1] 9 3 15 Levels: 9 3 15 Or maybe: Levels: 2 1 3 On Fri, Feb 13, 2009 at 12:38 PM, Dimitri Liakhovitski <ld7631 at gmail.com> wrote:
On Fri, Feb 13, 2009 at 12:24 PM, Marc Schwartz <marc_schwartz at comcast.net> wrote:
on 02/13/2009 11:09 AM Dimitri Liakhovitski wrote:
Hello! I have encountered a really weird problem. Maybe you've encountered it before? I have a large data frame "importances". It has one factor ($A)
with 3
levels: 3, 9, and 15. $B is a regular numeric variable. Below I am picking a really small sub-frame (just 3 rows) based on "indices". "indices" were chosen so that all 3 levels of A are present: indices=c(14329,14209,14353)
test=data.frame(yy=importances[["B']][indices],xx=importances[["A"]][in dices])
Here is what the new data frame "test" looks like:
yy xx
1 -0.009984006 9
2 -2.339904131 3
3 -0.008427385 15
Here is the structure of "test":
str(test)
'data.frame': 3 obs. of 2 variables: $ yy: num -0.00998 -2.3399 -0.00843 $ xx: Factor w/ 3 levels "3","9","15": 2 1 3 Notice - the order of factor levels for xx is not 1 2 3 as it
should
be but 2 1 3. How come? Or also look at this:
test$xx
[1] 9 3 15 Levels: 3 9 15 Same thing. Do you know what might be the reason? Thank you very much!
The output of str() is showing you the factor levels of test$xx, followed by the internal integer codes for the three actual values
of
test$xx, 9, 3, and 15:
str(test$xx)
Factor w/ 3 levels "3","9","15": 2 1 3
levels(test$xx)
[1] "3" "9" "15"
as.integer(test$xx)
[1] 2 1 3 9 is the second level, hence the 2 3 is the first level, hence the 1 15 is the third level, hence the 3. No problems, just clarification needed on what you are seeing. Note that you do not reference anything above regarding tapply() as
per
your subject line, though I suspect that I have an idea as to why
you did...
HTH, Marc Schwartz
Marc (and everyone), I expected it to show: $ xx: Factor w/ 3 levels "3","9","15": 1 2 3 rather than what I am seeing: $ xx: Factor w/ 3 levels "3","9","15": 2 1 3 Because 3 is level 1, 9 is level 2 and 15 is level 3. I have several other factors in my original data frame. And I've done that tapply for all of them (for the same dependent variable) - and
in
all of them the first level was 1, the second 2, etc. Why I am concerned about the problem? Because I am plotting the means of the numeric variable against the levels of the factor and it's important to me that the factor levels are correct (in the right order)... Dimitri -- Dimitri Liakhovitski MarketTools, Inc. Dimitri.Liakhovitski at markettools.com
-- Dimitri Liakhovitski MarketTools, Inc. Dimitri.Liakhovitski at markettools.com
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code.
Dimitri Liakhovitski MarketTools, Inc. Dimitri.Liakhovitski at markettools.com