Problems parallelizing glmnet

An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <https://stat.ethz.ch/pipermail/r-sig-hpc/attachments/20120906/f8a5ef49/attachment.pl>
Hasn't the caret package already solved this problem?

You can pass the tuneGrid parameter to specify your custom alpha and
lambda sequence, an the trainControl parameter to specify what kind of
cross-validation you wish to use.

Caret uses foreach, so you can register a parallel backend of your choice.

Sent from my iPhone

I want to run the cv.glmnet function with the same data (y and x) with different values on the alpha parameter determined by the number of cores, but the result is absurd. What is wrong in the code below?

Patrik Waldmann

x <- matrix(rnorm(2000*10000),ncol=10000)
y <- matrix(rnorm(2000),ncol=1)

library(parallel)
cvglmnet <- function(...) {
library(glmnet)
cv.glmnet(x,y,alpha=alphasplit)
}
system.time(cores<-detectCores())
system.time(cl <- makeCluster(cores, methods=FALSE))
alpha<-seq(0, 1,by=1/(cores-1))
alphasplit<-clusterSplit(cl,alpha)
system.time(clusterExport(cl, c("x","y","cvglmnet","alphasplit")))
system.time(outbrlist<-clusterEvalQ(cl, cvglmnet(x,y,alphasplit)))
system.time(totoutbr<-do.call(cbind,outbrlist))
stopCluster(cl)

   [[alternative HTML version deleted]]

_______________________________________________
R-sig-hpc mailing list
R-sig-hpc at r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-hpc
An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <https://stat.ethz.ch/pipermail/r-sig-hpc/attachments/20120906/ec375ecc/attachment.pl>

I want to run the cv.glmnet function with the same data (y and x) with different values on the alpha parameter determined by the number of cores, but the result is absurd. What is wrong in the code below?

You're evaluating exactly the same expression on all nodes ... I don't think you intended that (you are passing the alphasplit list as alpha to all of them - I don't think that makes sense). Isn't this closer to the intention:

alphas <- seq(0, 1, length.out= cores)
out <- clusterApply(cl, alphas, function(alpha) cv.glmnet(x,y,alpha=alpha))

Cheers,
Simon
Patrik Waldmann

x <- matrix(rnorm(2000*10000),ncol=10000)
y <- matrix(rnorm(2000),ncol=1)

library(parallel)
cvglmnet <- function(...) {
library(glmnet)
cv.glmnet(x,y,alpha=alphasplit)
}
system.time(cores<-detectCores())
system.time(cl <- makeCluster(cores, methods=FALSE))
alpha<-seq(0, 1,by=1/(cores-1))
alphasplit<-clusterSplit(cl,alpha)
system.time(clusterExport(cl, c("x","y","cvglmnet","alphasplit")))
system.time(outbrlist<-clusterEvalQ(cl, cvglmnet(x,y,alphasplit)))
system.time(totoutbr<-do.call(cbind,outbrlist))
stopCluster(cl)

	[[alternative HTML version deleted]]

_______________________________________________
R-sig-hpc mailing list
R-sig-hpc at r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-hpc