more efficient way to parallel
On 08/06/2012 09:41 AM, Jie wrote:
After searching online, I found that clusterCall or foreach might be the solution.
Re-write your outer loop as an lapply, then on non-Windows use
parallel::mclapply. Or on windows use makePSOCKcluster and parLapply. I
ended with
library(parallel)
library(MASS)
Maxi <- 10
Maxj <- 1000
doit <- function(i, Maxi, Maxj)
{
## initialization, not of interest
Sigmahalf <- matrix(sample(10000, replace=TRUE), 100)
Sigma <- t(Sigmahalf) %*% Sigmahalf
x <- mvrnorm(n=Maxj, rep(0, 100), Sigma)
xlist <- lapply(seq_len(nrow(x)), function(i, x) matrix(x[i,], 10), x)
## end of initialization
fun <- function(x) {
v <- eigen(x, symmetric=FALSE, only.values=TRUE)$values
min(abs(v))
}
dd1 <- sapply(xlist, fun)
dd2 <- dd1 + dd1 / sum(dd1)
sum(dd1 * dd2)
}
> system.time(lapply(1:8, doit, Maxi, Maxj))
user system elapsed
6.677 0.016 6.714
> system.time(mclapply(1:64, doit, Maxi, Maxj, mc.cores=8))
user system elapsed
68.857 1.032 10.398
the extra arguments to eigen are important, as is avoiding unnecessary
repeated calculations. The strategy of allocate-and-grow
(result.vec=numeric(); result.vec[i] <- ...) is very inefficient
(result.vec is copied in its entirety for each new value of i); better
preallocate-and-fill (result.vec = integer(Maxi); result.vec[i] = ...)
or let lapply manage the allocation.
Martin
Best wishes, Jie On Sun, Aug 5, 2012 at 10:23 PM, Jie <jimmycloud at gmail.com> wrote:
Dear All,
Suppose I have a program as below: Outside is a loop for simulation (with
random generated data), inside there are several sapply()'s (10~100) over
the data and something else, but these sapply's have to be sequential. And
each sapply do not involve very intensive calculation (a few seconds only).
So the outside loop takes minutes to finish one iteration.
I guess the better way is not to parallel sapply but the outer loop.
But I have no idea how to modify it. I have a simple code here. Only two
sapply's involved for simplicity. The logical in the sapply is not
important.
Thank you for your attention and suggestion.
library(parallel)
library(MASS)
result.seq=c()
Maxi <- 100
for (i in 1:Maxi)
{
## initialization, not of interest
Sigmahalf <- matrix(sample(1:10000,size = 10000,replace =T ), 100)
Sigma <- t(Sigmahalf)%*%Sigmahalf
x <- mvrnorm(n=1000, rep(0, 10), Sigma)
xlist <- list()
for (j in 1:1000)
{
xlist[[j]] <- list(X = matrix( x [j, ],5))
}
## end of initialization
dd1 <- sapply(xlist,function(s) {min(abs((eigen(s$X))$values))})
##
sumdd1=sum(dd1)
for (j in 1:1000)
{
xlist[[j]]$dd1 <- dd1[j]/sumdd1
}
## Assume dd2 and dd1 can not be combined in one sapply()
dd2 <- sapply(xlist, function(s){min(abs((eigen(s$X))$values))+s$dd1})
result.seq[i] <- sum(dd1*dd2)
}
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Computational Biology / Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: Arnold Building M1 B861 Phone: (206) 667-2793