replace values in array with means of certain values in array
On Mon, Jan 10, 2011 at 03:21:18PM +0100, joke R wrote:
Dear all,
I have a problem with arrays.
Simplified, I have two arrays:
A =
[,,1]
1 2 3
4 5 6
7 8 9
[,,2]
10 11 12
13 14 15
16 17 18
B=
1 1 2
1 1 2
3 3 2
Basically, B declares to which cluster the values of A belong to. Now I
want an array C where for each cluster of B, the mean of A is shown (the
mean on the first 2 dimensions):
C=
[,,1]
3 3 6
3 3 6
7.5 7.5 6
[,,2]
12 12 15
12 12 15
16.5 16.5 15
The following short solution works:
A <-
aperm(array(c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18),dim=c(3,3,2)),c(2,1,3))
B <- matrix(c(1,1,2,1,1,2,3,3,2),nrow=3,byrow=TRUE)
C <- array(NA,dim=c(3,3,2))
for(a in 1:3){
TS <- matrix(A[B==a],nrow=2,byrow=TRUE)
C[B==a] <- rep(rowMeans(TS),each=sum(B==a))
}
But unfortunately, my dataset is a 64x64x64x120 array, and I have
approximately 29000 clusters, and I'd have to repeat the process on 500
simulated datasets.
If I try the code above, it takes about 2 hours...
Strange enough, it is the first line in the loop that takes so long:
reconstructing a dataset with only the datapoints belonging to a certain
cluster in it...
Let me suggest to consider the following approach, which avoids the
first line of the above code, computes the means in a different way and
puts them back using the second line of the above code.
First, transform A into a matrix, where the first two dimensions
represent the rows and the remaining dimensions represent the columns.
This does not reorder the elements of the array, only changes the
dimension information.
A1 <- matrix(A, nrow=3*3, ncol=2)
Compute the means
B1 <- c(B)
C1 <- rowsum(A1, group=B1)/c(table(B1))
Put the means to the required positions
for(a in 1:3){
C[B==a] <- rep(C1[a, ], each=sum(B==a))
}
I hope this can help.
Petr Savicky.