Skip to content
Prev 170443 / 398503 Next

tapply bug? - levels of a factor in a data frame after tapply are intermixed

on 02/13/2009 11:38 AM Dimitri Liakhovitski wrote:
Dimitri,

The above examples that you have are the expected output given the data
that you provided, including the ordering of the explicit row indices
that you used.

If we create some sample data, using something along the lines of your
original description:

set.seed(1)
A <- sample(factor(c(3, 9, 15)), 100, replace = TRUE)

set.seed(2)
B <- rnorm(100)

DF <- data.frame(A = A, B = B)
A           B
1  3 -0.89691455
2  9  0.18484918
3  9  1.58784533
4 15 -1.13037567
5  3 -0.08025176
6 15  0.13242028
'data.frame':	100 obs. of  2 variables:
 $ A: Factor w/ 3 levels "3","9","15": 1 2 2 3 1 3 3 2 2 1 ...
 $ B: num  -0.8969 0.1848 1.5878 -1.1304 -0.0803 ...


I then use tapply() to get the means:
A
          3           9          15
 0.10620274  0.08577537 -0.26276438

The output is in the order one would expect. If you want something else,
then you may have to check the factor levels for 'A' and alter them to
the ordering that you actually want. For example:

DF$A <- factor(DF$A, levels = c("9", "3", "15"))

  or

levels(DF$A) <- c("9", "3", "15")
'data.frame':	100 obs. of  2 variables:
 $ A: Factor w/ 3 levels "9","3","15": 2 1 1 3 2 3 3 1 1 2 ...
 $ B: num  -0.8969 0.1848 1.5878 -1.1304 -0.0803 ...


which would then adjust the ordering of the tapply() output to:
A
          9           3          15
 0.08577537  0.10620274 -0.26276438


Is that perhaps what you are looking for?

Marc