Skip to content

Applying functions to partitions

7 messages · Titus von der Malsburg, David Winsemius, Martin Morgan +2 more

#
Hi list!  I have a large matrix which I'd like to partition into blocks
and for each block I'd like to compute the mean.  Following a example
where each letter marks a block of the partition:

     a a a d g g 
     a a a d g g
     a a a d g g
     b b b e h h
     b b b e h h
     c c c f i i

I'm only interested in the resulting matrix of means.  How can this be
done efficiently?

Thanks!  Titus
#
Its not clear that the object returned from such an operation would be  
a matrix, but if things remain very regular then perhapos you will  
succeed with this:

 > markmtx <- matrix(scan(textConnection("a a a d g g
+     a a a d g g
+     a a a d g g
+     b b b e h h
+     b b b e h h
+     c c c f i i"), what="character"), nrow=6)
Read 36 items
 > nummtx <-matrix(rnorm(36),nrow=6)
 > nummtx
             [,1]       [,2]        [,3]         [,4]       [,5]        
[,6]
[1,] -1.49492952 -0.1000962 -0.54546587 -0.536216056  0.1065169  
-1.3368842
[2,] -0.64393278  0.3343573  0.76247880  0.282666215  0.2236401   
0.8210809
[3,]  1.42879752 -1.3246770  0.06403316 -0.002843621 -0.2990221  
-0.4885461
[4,] -0.38740975  0.7800235  0.12819144  0.206188106  0.8481351   
0.2572268
[5,] -0.07082702 -0.7870970  0.60560030 -1.381615740  1.4935228   
0.1165892
[6,] -0.06916424 -0.5869168  0.39492984  0.016430970 -0.6531722  
-0.1194990

 > tapply(nummtx, markmtx, FUN=mean)
            a            b            c            d             
e            f            g
-0.168826070 -0.037543099 -0.334783145  0.173601720  0.527161589   
0.257226798 -0.085579151
            h            i
-0.131208561 -0.001454904

 > matrix(tapply(nummtx, markmtx, FUN=mean), nrow=3)
            [,1]      [,2]         [,3]
[1,] -0.1688261 0.1736017 -0.085579151
[2,] -0.0375431 0.5271616 -0.131208561
[3,] -0.3347831 0.2572268 -0.001454904
#
Stavros Macrakis <macrakis at alum.mit.edu> writes:
or the variant

  idx <- outer(rfact, (cfact - 1) * max(rfact), "+")
  matrix(tapply(m, idx, mean), max(rfact))

The assumption is that cfact, rfact are integer valued with max(rfact)
<= nrow(m), max(cfact) <= ncol(m).

I think Stavros' solution will run in to trouble when there are more
than 9 row blocks, and '10 1' sorts before '2 1', for instance.

Martin

  
    
#
"... 

I suppose the clean way to do this would be to define a cartesian product of
two factors with the induced lexicographic order (is there a standard
function for doing this?):"


Of course. ?interaction.

-- Bert Gunter
Genentch
#
On Mon, Feb 16, 2009 at 7:52 PM, Bert Gunter <gunter.berton at gene.com> wrote:
Perhaps my specification was unclear, because interaction does not do this:

g1 <- factor(c("b","a","c"),levels=c("b","a","c"),ordered=TRUE)
g2 <- factor(c(3,10,1,10), levels=c(3,10,1),ordered=TRUE)

interaction(g1,g2,lex.order=TRUE)
[1] b.3  a.10 c.1  b.10
Levels: b.3 b.10 b.1 a.3 a.10 a.1 c.3 c.10 c.1
Warning message:
In ans + ll * if1 :
  longer object length is not a multiple of shorter object length

First problem: it calculates the cartesian product of the *levels*,
not of the *values*, so ignores duplicate values, which are important.
Second problem: it reorders them alphabetically, ignoring the original orders.
Third problem: it gives a warning if the number of levels is not the same.

expand.grid is closer, but it produces the cartesian product as a data
frame, not a factor, and the first argument varies fastest.  Using
expand.grid, I suppose you could do:

`*.factor` <- function(f1,f2)
      { val <- do.call(paste,rev(expand.grid(f2,f1)))
        factor(val, levels=unique(val), ordered=TRUE) }

or

`*.factor` <- function(f1,f2)
       factor(  do.call(paste,rev(expand.grid(f2,f1))),

levels=do.call(paste,rev(expand.grid(unique(f2),unique(f1))))
                    ordered=TRUE)

but to my eye those hardly look nicer than the code I gave before
which doesn't use expand.grid.

Is there something I'm missing here?

          -s