Parallel Linear Model
On Aug 22, 2012, at 12:38 PM, "Hao Yu" <hyu at stats.uwo.ca> wrote:
Here is my test with 8 core. y<-rnorm(1000) x<-matrix(rnorm(1000*10000),ncol=10000) dimx<-dim(x) library(parallel) cl <- makeCluster(8, methods=FALSE) print(system.time( pval <- unlist(mclapply(1:dimx[2], function(i) summary(lm(y ~ x[,i]))$coefficients[2,4])) )) user system elapsed 25.46 0.02 25.62
... just to clarify, are you on Windows? You can't use multicore on Windows because the OS does not support it ... there you have to use snow. Cheers, S
#here is appply (not parallel) system.time(pval<-apply(x,2, function(x)summary(lm(y~x))$coeff[2,4])) user system elapsed 24.54 0.00 24.65 clusterExport(cl,"y") system.time(pval<-parApply(cl, x,2, function(x)summary(lm(y~x))$coeff[2,4])) user system elapsed 0.72 0.47 6.73 stopCluster(cl) Hao Patrik Waldmann wrote:
That seems to be a good idea (for 8 cores):
y<-rnorm(1000)
x<-matrix(rnorm(1000*10000),ncol=10000)
dimx<-dim(x)
library(doParallel)
library(foreach)
cl <- makeCluster(8, methods=FALSE)
registerDoParallel(cl)
print(system.time(
pval <- foreach (i =1:dimx[2], .combine=c) %dopar% {
mod <- lm(y ~ x[,i])
summary(mod)$coefficients[2,4]
}
))
user system elapsed
12.28 2.75 231.93
stopCluster(cl)
library(parallel)
cl <- makeCluster(8, methods=FALSE)
print(system.time(
pval <- unlist(mclapply(1:dimx[2], function(i) summary(lm(y ~
x[,i]))$coefficients[2,4]))
))
user system elapsed
21.80 1.33 25.78
stopCluster(cl)
Patrik
Simon Urbanek <simon.urbanek at r-project.org> 22/08/2012 17:20 >>>
On Aug 22, 2012, at 10:47 AM, Patrik Waldmann <patrik.waldmann at boku.ac.at> wrote:
I did not manage to implement this example in foreach, could anyone point me to a similar example?
I would't even both with foreach for something as simple - you can write it trivially as library(parallel) pval <- unlist(mclapply(1:n, function(i) summary(lm(y ~ x[,i]))$coefficients[2,4])) Cheers, Simon
Patrik
Jay Emerson <jayemerson at gmail.com> 22/08/2012 14:05 >>>
Patrik, Your question (at least from you example) is really about general parallel computing. Nothing you want to do with your linear model from your short example requires some special type of parallelism. I recommend package 'foreach' with the parallel backends, or else the package 'parallel' that comes with the newer versions of R. You could also have a look at Dirk's HPC page: http://cran.r-project.org/web/views/HighPerformanceComputing.html Jay -- John W. Emerson (Jay) Associate Professor of Statistics, Adjunct, and Acting Director of Graduate Studies Department of Statistics Yale University http://www.stat.yale.edu/~jay ( http://www.stat.yale.edu/%7Ejay ) [[alternative HTML version deleted]]
_______________________________________________ R-sig-hpc mailing list R-sig-hpc at r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-hpc
[[alternative HTML version deleted]]
_______________________________________________ R-sig-hpc mailing list R-sig-hpc at r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-hpc
-- Department of Statistics & Actuarial Sciences Office Phone#:(519)-661-3622 Fax Phone#:(519)-661-3813 The University of Western Ontario London, Ontario N6A 5B7 http://www.stats.uwo.ca/yu