Column-mean-values for targeted rows
set.seed(123)
N = 30000
K = 400
theData = matrix(rnorm(N*K), ncol=K)
theData = as.data.frame(theData)
theData = cbind(indicator = sample(0:1, N, rep=T), theData)
> system.time(results <- colMeans(subset(theData, indicator == 1)))
user system elapsed
2.309 1.319 3.853
b
On Jul 20, 2007, at 6:17 PM, Diogo Alagador wrote:
Hi all,
I'm handling massive data.frames and matrices in R (30000 x 400).
In the 1st column, say, I have 0s and 1s indicating rows that
matter; other columns have probability values.
One simple task I would like to do would be to get the column mean
values for signaled rows (the ones with 1)
As a very fresh "programmer" I have build a simple function in R
which should not be very efficient indeed! It works well for
current-dimension matrices, but it just not goes so well in huge ones.
meanprob<-function(Robj){
NLINE<-dim(Robj)[1];
NCOLUMN<-dim(Robj)[2];
mprob<-c(rep(0,(NCOLUMN-1)));
for (i in 2:NCOLUMN){
sumprob<-0;
pa<-0;
for (j in 1:NLINE){
if(Robj[j,1]!=0){
pa<-pa+1;
sumprob<-Robj[j,i]+sumprob;
}
}
mprob[i-1]<-sumprob/pa;
}
return(mprob);
}
So I "only" see 3 ways to get through the problem:
- to reformulate the function to gain efficiency;
- to establish a C-routine (for example), where loops are more
"speedy", and then interfacing with R;
- to find some function/ package that already do that.
Can anybody illuminate my way here,
Mush thanks,
Diogo Andre' Alagador
[[alternative HTML version deleted]]
______________________________________________ R-help at stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code.