I'm a newbie running parallel computing, so, sorry for this simple question.
My original code without parallel computing is like this:
runAna <- function(myData, Model, ...) {
myStat <- wFun(myData, Model, ...) # myStat: a vector with a
length of nStat
return(myStat)
}
rStat <- array(0, dim=c(dimx, dimy, dimz, nStat))
for (i in 1:dimx)
for (j in 1:dimy)
for (k in 1:dimz)
rStat[i, j, k,] <- runAna(rData[i, j, k,], Model, ...) # each
analysis is on the 4th dimension, and returns nStat numbers which are
stored in the 4th dimension of rStat
I'm trying to run the above analysis using snow on a machine with two
processors, but could not figure out how to correctly set it up:
nNodes <- 2
library(snow)
cl <- makeCluster(nNodes, type = "SOCK")
I thought I would use parApply, but how should I combine the looping
with parApply? Or no looping at all with something like parApply(cl,
rData, c(1,2,3), ...)?
Thanks in advance,
Gang
Using snow on a looping structure
4 messages · Gang Chen, Martin Morgan
Hi Gang -- "Gang Chen" <gangchen6 at gmail.com> writes:
I'm a newbie running parallel computing, so, sorry for this simple question.
My original code without parallel computing is like this:
runAna <- function(myData, Model, ...) {
myStat <- wFun(myData, Model, ...) # myStat: a vector with a
length of nStat
return(myStat)
}
rStat <- array(0, dim=c(dimx, dimy, dimz, nStat))
for (i in 1:dimx)
for (j in 1:dimy)
for (k in 1:dimz)
rStat[i, j, k,] <- runAna(rData[i, j, k,], Model, ...) # each
analysis is on the 4th dimension, and returns nStat numbers which are
stored in the 4th dimension of rStat
I think what you want is along the lines of
a <- array(1:(2*3*4*5), c(2,3,4,5)) b <- apply(a, c(1,2,3), range)
and then as you guessed
library(snow) cl <- makeCluster(nNodes, type="SOCK") d <- parApply(cl, a, c(1,2,3), range) identical(b, d)
[1] TRUE so for your example, I'd guess runStat <- parApply(cl, rData, c(1,2,3), runAna, Model=Model) This is not quite what you want -- the 'result' dimension is the first rather than last
dim(a)
[1] 2 3 4 5
dim(b)
[1] 2 2 3 4 array-munging is not a speciality of mine, but a simple work-around is to reorder the dimensions of the original array, so you're applying to, and writing in, the slice indexed by the first entry
a <- array(1:(5*2*3*4), c(5,2,3,4)) b <- apply(a, c(2,3,4), range) dim(a)
[1] 5 2 3 4
dim(b)
[1] 2 2 3 4
d <- parApply(cl, a, c(2,3,4), range)
Hope that helps, Martin
I'm trying to run the above analysis using snow on a machine with two processors, but could not figure out how to correctly set it up: nNodes <- 2 library(snow) cl <- makeCluster(nNodes, type = "SOCK") I thought I would use parApply, but how should I combine the looping with parApply? Or no looping at all with something like parApply(cl, rData, c(1,2,3), ...)? Thanks in advance, Gang
_______________________________________________ R-sig-hpc mailing list R-sig-hpc at r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-hpc
Martin Morgan Computational Biology / Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: Arnold Building M2 B169 Phone: (206) 667-2793
1 day later
Hi Martin, Your suggestion really helps! It's exactly what I wanted. I really appreciate it... Regarding the array-munging part, the following will do: b <- aperm(b, c(2,3,4,1)) I have a couple of related issues now: (1) When running the following I get two warnings on my Mac OS X 10.4.11 (one from each processor, I guess):
cl <- makeCluster(2, type = "SOCK")
WARNING: ignoring environment value of R_HOME WARNING: ignoring environment value of R_HOME Why is this warning? How to correct it? (2) Previously I could follow up the progress of the job by sticking the following print(format(Sys.time(), "%D %H:%M:%OS3")) inside the outermost for loop (with ii index), but now with parallel computing I couldn't find a similar way to trace the progress. Do you or anybody know how to do that? Thanks again, Gang
On Wed, Dec 3, 2008 at 12:05 PM, Martin Morgan <mtmorgan at fhcrc.org> wrote:
Hi Gang -- "Gang Chen" <gangchen6 at gmail.com> writes:
I'm a newbie running parallel computing, so, sorry for this simple question.
My original code without parallel computing is like this:
runAna <- function(myData, Model, ...) {
myStat <- wFun(myData, Model, ...) # myStat: a vector with a
length of nStat
return(myStat)
}
rStat <- array(0, dim=c(dimx, dimy, dimz, nStat))
for (i in 1:dimx)
for (j in 1:dimy)
for (k in 1:dimz)
rStat[i, j, k,] <- runAna(rData[i, j, k,], Model, ...) # each
analysis is on the 4th dimension, and returns nStat numbers which are
stored in the 4th dimension of rStat
I think what you want is along the lines of
a <- array(1:(2*3*4*5), c(2,3,4,5)) b <- apply(a, c(1,2,3), range)
and then as you guessed
library(snow) cl <- makeCluster(nNodes, type="SOCK") d <- parApply(cl, a, c(1,2,3), range) identical(b, d)
[1] TRUE so for your example, I'd guess runStat <- parApply(cl, rData, c(1,2,3), runAna, Model=Model) This is not quite what you want -- the 'result' dimension is the first rather than last
dim(a)
[1] 2 3 4 5
dim(b)
[1] 2 2 3 4 array-munging is not a speciality of mine, but a simple work-around is to reorder the dimensions of the original array, so you're applying to, and writing in, the slice indexed by the first entry
a <- array(1:(5*2*3*4), c(5,2,3,4)) b <- apply(a, c(2,3,4), range) dim(a)
[1] 5 2 3 4
dim(b)
[1] 2 2 3 4
d <- parApply(cl, a, c(2,3,4), range)
Hope that helps, Martin
I'm trying to run the above analysis using snow on a machine with two processors, but could not figure out how to correctly set it up: nNodes <- 2 library(snow) cl <- makeCluster(nNodes, type = "SOCK") I thought I would use parApply, but how should I combine the looping with parApply? Or no looping at all with something like parApply(cl, rData, c(1,2,3), ...)? Thanks in advance, Gang
_______________________________________________ R-sig-hpc mailing list R-sig-hpc at r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-hpc
-- Martin Morgan Computational Biology / Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: Arnold Building M2 B169 Phone: (206) 667-2793
"Gang Chen" <gangchen6 at gmail.com> writes:
Hi Martin, Your suggestion really helps! It's exactly what I wanted. I really appreciate it... Regarding the array-munging part, the following will do: b <- aperm(b, c(2,3,4,1)) I have a couple of related issues now: (1) When running the following I get two warnings on my Mac OS X 10.4.11 (one from each processor, I guess):
cl <- makeCluster(2, type = "SOCK")
WARNING: ignoring environment value of R_HOME WARNING: ignoring environment value of R_HOME Why is this warning? How to correct it?
I guess you have a variable R_HOME specified in your environment, and it is different from the location where the R workers are running from. I suspect though that R is easily fooled, e.g., by symbolic links. You might want to make sure that you're actually starting the right R (ask, e.g., for the worker sessionInfo()).
(2) Previously I could follow up the progress of the job by sticking the following print(format(Sys.time(), "%D %H:%M:%OS3")) inside the outermost for loop (with ii index), but now with parallel computing I couldn't find a similar way to trace the progress. Do you or anybody know how to do that?
You might modify runAna to write a timestamp to a (worker-specific) file, and track that. It is not a nice hack. Maybe others have better ideas? Martin
Thanks again, Gang On Wed, Dec 3, 2008 at 12:05 PM, Martin Morgan <mtmorgan at fhcrc.org> wrote:
Hi Gang -- "Gang Chen" <gangchen6 at gmail.com> writes:
I'm a newbie running parallel computing, so, sorry for this simple question.
My original code without parallel computing is like this:
runAna <- function(myData, Model, ...) {
myStat <- wFun(myData, Model, ...) # myStat: a vector with a
length of nStat
return(myStat)
}
rStat <- array(0, dim=c(dimx, dimy, dimz, nStat))
for (i in 1:dimx)
for (j in 1:dimy)
for (k in 1:dimz)
rStat[i, j, k,] <- runAna(rData[i, j, k,], Model, ...) # each
analysis is on the 4th dimension, and returns nStat numbers which are
stored in the 4th dimension of rStat
I think what you want is along the lines of
a <- array(1:(2*3*4*5), c(2,3,4,5)) b <- apply(a, c(1,2,3), range)
and then as you guessed
library(snow) cl <- makeCluster(nNodes, type="SOCK") d <- parApply(cl, a, c(1,2,3), range) identical(b, d)
[1] TRUE so for your example, I'd guess runStat <- parApply(cl, rData, c(1,2,3), runAna, Model=Model) This is not quite what you want -- the 'result' dimension is the first rather than last
dim(a)
[1] 2 3 4 5
dim(b)
[1] 2 2 3 4 array-munging is not a speciality of mine, but a simple work-around is to reorder the dimensions of the original array, so you're applying to, and writing in, the slice indexed by the first entry
a <- array(1:(5*2*3*4), c(5,2,3,4)) b <- apply(a, c(2,3,4), range) dim(a)
[1] 5 2 3 4
dim(b)
[1] 2 2 3 4
d <- parApply(cl, a, c(2,3,4), range)
Hope that helps, Martin
I'm trying to run the above analysis using snow on a machine with two processors, but could not figure out how to correctly set it up: nNodes <- 2 library(snow) cl <- makeCluster(nNodes, type = "SOCK") I thought I would use parApply, but how should I combine the looping with parApply? Or no looping at all with something like parApply(cl, rData, c(1,2,3), ...)? Thanks in advance, Gang
_______________________________________________ R-sig-hpc mailing list R-sig-hpc at r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-hpc
-- Martin Morgan Computational Biology / Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: Arnold Building M2 B169 Phone: (206) 667-2793
Martin Morgan Computational Biology / Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: Arnold Building M2 B169 Phone: (206) 667-2793