Hi list! I have a large matrix which I'd like to partition into blocks
and for each block I'd like to compute the mean. Following a example
where each letter marks a block of the partition:
a a a d g g
a a a d g g
a a a d g g
b b b e h h
b b b e h h
c c c f i i
I'm only interested in the resulting matrix of means. How can this be
done efficiently?
Thanks! Titus
Applying functions to partitions
7 messages · Titus von der Malsburg, David Winsemius, Martin Morgan +2 more
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20090216/fedefb81/attachment-0001.pl>
Its not clear that the object returned from such an operation would be
a matrix, but if things remain very regular then perhapos you will
succeed with this:
> markmtx <- matrix(scan(textConnection("a a a d g g
+ a a a d g g
+ a a a d g g
+ b b b e h h
+ b b b e h h
+ c c c f i i"), what="character"), nrow=6)
Read 36 items
> nummtx <-matrix(rnorm(36),nrow=6)
> nummtx
[,1] [,2] [,3] [,4] [,5]
[,6]
[1,] -1.49492952 -0.1000962 -0.54546587 -0.536216056 0.1065169
-1.3368842
[2,] -0.64393278 0.3343573 0.76247880 0.282666215 0.2236401
0.8210809
[3,] 1.42879752 -1.3246770 0.06403316 -0.002843621 -0.2990221
-0.4885461
[4,] -0.38740975 0.7800235 0.12819144 0.206188106 0.8481351
0.2572268
[5,] -0.07082702 -0.7870970 0.60560030 -1.381615740 1.4935228
0.1165892
[6,] -0.06916424 -0.5869168 0.39492984 0.016430970 -0.6531722
-0.1194990
> tapply(nummtx, markmtx, FUN=mean)
a b c d
e f g
-0.168826070 -0.037543099 -0.334783145 0.173601720 0.527161589
0.257226798 -0.085579151
h i
-0.131208561 -0.001454904
> matrix(tapply(nummtx, markmtx, FUN=mean), nrow=3)
[,1] [,2] [,3]
[1,] -0.1688261 0.1736017 -0.085579151
[2,] -0.0375431 0.5271616 -0.131208561
[3,] -0.3347831 0.2572268 -0.001454904
David Winsemius On Feb 16, 2009, at 12:43 PM, Titus von der Malsburg wrote: > > Hi list! I have a large matrix which I'd like to partition into > blocks > and for each block I'd like to compute the mean. Following a example > where each letter marks a block of the partition: > > a a a d g g > a a a d g g > a a a d g g > b b b e h h > b b b e h h > c c c f i i > > I'm only interested in the resulting matrix of means. How can this be > done efficiently? > > Thanks! Titus > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
Stavros Macrakis <macrakis at alum.mit.edu> writes:
Assuming your matrix is:
mm <- matrix(runif(6*6),6,6)
And your blocks are defined by integers or factors:
cfact <- c(1,1,1,2,3,3)
rfact <- c(1,1,1,2,2,3)
Then the following should do the trick:
matrix(tapply(mm, outer(rfact,cfact,paste), mean),
length(unique(rfact)))
or the variant idx <- outer(rfact, (cfact - 1) * max(rfact), "+") matrix(tapply(m, idx, mean), max(rfact)) The assumption is that cfact, rfact are integer valued with max(rfact) <= nrow(m), max(cfact) <= ncol(m). I think Stavros' solution will run in to trouble when there are more than 9 row blocks, and '10 1' sorts before '2 1', for instance. Martin
The 'outer' calculates a joint factor for each element of the matrix; the
'tapply' treats the matrix as a vector, grouping by factor and calculating
means; the 'matrix' rearranges them as a matrix corresponding to the
original block structure.
Is that what you had in mind?
-s
On Mon, Feb 16, 2009 at 12:43 PM, Titus von der Malsburg <malsburg at gmail.com
wrote:
Hi list! I have a large matrix which I'd like to partition into blocks
and for each block I'd like to compute the mean. Following a example
where each letter marks a block of the partition:
a a a d g g
a a a d g g
a a a d g g
b b b e h h
b b b e h h
c c c f i i
I'm only interested in the resulting matrix of means. How can this be
done efficiently?
Thanks! Titus
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Martin Morgan Computational Biology / Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: Arnold Building M2 B169 Phone: (206) 667-2793
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20090216/33905526/attachment-0001.pl>
"... I suppose the clean way to do this would be to define a cartesian product of two factors with the induced lexicographic order (is there a standard function for doing this?):" Of course. ?interaction. -- Bert Gunter Genentch
On Mon, Feb 16, 2009 at 7:52 PM, Bert Gunter <gunter.berton at gene.com> wrote:
I suppose the clean way to do this would be to define a cartesian product of two factors with the induced lexicographic order (is there a standard function for doing this?):" Of course. ?interaction.
Perhaps my specification was unclear, because interaction does not do this:
g1 <- factor(c("b","a","c"),levels=c("b","a","c"),ordered=TRUE)
g2 <- factor(c(3,10,1,10), levels=c(3,10,1),ordered=TRUE)
interaction(g1,g2,lex.order=TRUE)
[1] b.3 a.10 c.1 b.10
Levels: b.3 b.10 b.1 a.3 a.10 a.1 c.3 c.10 c.1
Warning message:
In ans + ll * if1 :
longer object length is not a multiple of shorter object length
First problem: it calculates the cartesian product of the *levels*,
not of the *values*, so ignores duplicate values, which are important.
Second problem: it reorders them alphabetically, ignoring the original orders.
Third problem: it gives a warning if the number of levels is not the same.
expand.grid is closer, but it produces the cartesian product as a data
frame, not a factor, and the first argument varies fastest. Using
expand.grid, I suppose you could do:
`*.factor` <- function(f1,f2)
{ val <- do.call(paste,rev(expand.grid(f2,f1)))
factor(val, levels=unique(val), ordered=TRUE) }
or
`*.factor` <- function(f1,f2)
factor( do.call(paste,rev(expand.grid(f2,f1))),
levels=do.call(paste,rev(expand.grid(unique(f2),unique(f1))))
ordered=TRUE)
but to my eye those hardly look nicer than the code I gave before
which doesn't use expand.grid.
Is there something I'm missing here?
-s