Skip to content

plyr: a*ply with functions that return matrices-- possible bug in aaply?

3 messages · Michael Friendly, Hadley Wickham

#
I have an application where I have a function to calculate results for 
a 2-way table or matrix, which
returns a matrix with one less row and column. To keep this short, the 
function below captures the structure:

fun2way <- function(f){
     if (!length(dim(f)) ==2) stop("only for 2-way arrays")
     R <- dim(f)[1]
     C <- dim(f)[2]
     f[1:(R-1), 1:(C-1)]
}

Now, I want to extend this to higher-way arrays, using apply-like 
methods over the strata (all but the first two dimensions),
and returning an array in which the last dimensions correspond to 
strata.  That is, I want to define something like the
following using an a*ply method, but aaply gives a result in which the 
applied .margin(s) do not appear last in the
result, contrary to the documentation for ?aaply.  I think this is a 
bug, either in the function or the documentation,
but perhaps there's something I misunderstand for this case.

fun <- function(f, stratum=NULL) {
     L <- length(dim(f))
   if (L > 2 && is.null(stratum))
         stratum <- 3:L
   if (is.null(stratum)) {
     result <- fun2way(f)
   }
   else {
         require(plyr)
     result <- aaply(f, stratum, fun2way)  ## order of dimensions 
screwed up!
     }
   result
}


For example, by hand (or with a loop) I can calculate the
pieces and combine them as I want using abind():

 > # apply separately to strata
 > t1<-fun2way(HairEyeColor[,,1])
 > t2<-fun2way(HairEyeColor[,,2])
 >
 > library(abind)
 > abind(t1, t2, along=3)
, , 1

       Brown Blue Hazel
Black    32   11    10
Brown    53   50    25
Red      10   10     7

, , 2

       Brown Blue Hazel
Black    36    9     5
Brown    66   34    29
Red      16    7     7

alply() gives me what I want, but with the strata as list elements, 
rather than an array

 > library(plyr)
 > # strata define separate list elements
 > alply(HairEyeColor, 3, fun2way)
$`1`
        Eye
Hair    Brown Blue Hazel
   Black    32   11    10
   Brown    53   50    25
   Red      10   10     7

$`2`
        Eye
Hair    Brown Blue Hazel
   Black    36    9     5
   Brown    66   34    29
   Red      16    7     7

attr(,"split_type")
[1] "array"
attr(,"split_labels")
      Sex
1   Male
2 Female
 >

However, with aaply(), dim[3] ends up as first dimension, not last

 > # dim[3] ends up as first dimension, not last
 > aaply(HairEyeColor, 3, fun2way)
, , Eye = Brown

         Hair
Sex      Black Brown Red
   Female    36    66  16
   Male      32    53  10

, , Eye = Blue

         Hair
Sex      Black Brown Red
   Female     9    34   7
   Male      11    50  10

, , Eye = Hazel

         Hair
Sex      Black Brown Red
   Female     5    29   7
   Male      10    25   7

 > str(aaply(as.array(HairEyeColor), 3, fun2way))
  num [1:2, 1:3, 1:3] 36 32 66 53 16 10 9 11 34 50 ...
  - attr(*, "dimnames")=List of 3
   ..$ Sex : chr [1:2] "Female" "Male"
   ..$ Hair: chr [1:3] "Black" "Brown" "Red"
   ..$ Eye : chr [1:3] "Brown" "Blue" "Hazel"
 >
 > ## aaply should return this....
 > aperm(aaply(HairEyeColor, 3, fun2way), c(2,3,1))
, , Sex = Female

        Eye
Hair    Brown Blue Hazel
   Black    36    9     5
   Brown    66   34    29
   Red      16    7     7

, , Sex = Male

        Eye
Hair    Brown Blue Hazel
   Black    32   11    10
   Brown    53   50    25
   Red      10   10     7

 >

On the other hand, aaply() does work as I expect, with an array of size 
2 x C x strata

 > library(vcd)
 > fun2way(Employment[,,1])
<1Mo  1-3Mo 3-12Mo  1-2Yr  2-5Yr
      8     35     70     62     56
 > fun2way(Employment[,,2])
<1Mo  1-3Mo 3-12Mo  1-2Yr  2-5Yr
     40     85    181     85    118
 >
 > aaply(Employment, 3, fun2way)

LayoffCause <1Mo 1-3Mo 3-12Mo 1-2Yr 2-5Yr
    Closure     8    35     70    62    56
    Replaced   40    85    181    85   118
#
Maybe the documentation isn't clear but I think this is behaving as expected:

 * the margin you split on comes first in the output
 * followed by the dimensions created by the applied function.

Hadley
1 day later
#
On 10/4/2010 9:22 AM, hadley wickham wrote:
OK, this is clear (and more explicit than the .Rd doc description of 
Value).
What is still not clear is why alply() returns the split levels in their 
original order, while aaply() returns them in sorted order when the 
level names are character.  My application is for
contingency tables, where, e.g., a dimension may have levels
c("Lo", "Med", "Hi") and I need to preserve this order in the result.

 > alply(HairEyeColor, 3, fun2way) # gives c("Male", "Female")
$`1`
        Eye
Hair    Brown Blue Hazel
   Black    32   11    10
   Brown    53   50    25
   Red      10   10     7

$`2`
        Eye
Hair    Brown Blue Hazel
   Black    36    9     5
   Brown    66   34    29
   Red      16    7     7

attr(,"split_type")
[1] "array"
attr(,"split_labels")
      Sex
1   Male
2 Female
 >
 > aaply(HairEyeColor, 3, fun2way) # gives c("Female", "Male")
, , Eye = Brown

         Hair
Sex      Black Brown Red
   Female    36    66  16
   Male      32    53  10

, , Eye = Blue

         Hair
Sex      Black Brown Red
   Female     9    34   7
   Male      11    50  10

, , Eye = Hazel

         Hair
Sex      Black Brown Red
   Female     5    29   7
   Male      10    25   7

 >