Skip to content

Using snow on a looping structure

4 messages · Gang Chen, Martin Morgan

#
I'm a newbie running parallel computing, so, sorry for this simple question.

My original code without parallel computing is like this:

runAna <- function(myData, Model, ...) {
      myStat <- wFun(myData, Model, ...)   # myStat: a vector with a
length of nStat
      return(myStat)
}

rStat <- array(0, dim=c(dimx, dimy, dimz, nStat))
for (i in 1:dimx)
for (j in 1:dimy)
for (k in 1:dimz)
     rStat[i, j, k,] <- runAna(rData[i, j, k,], Model, ...)   # each
analysis is on the 4th dimension, and returns nStat numbers which are
stored in the 4th dimension of rStat

I'm trying to run the above analysis using snow on a machine with two
processors, but could not figure out how to correctly set it up:

nNodes <- 2
library(snow)
cl <- makeCluster(nNodes, type = "SOCK")	

I thought I would use parApply, but how should I combine the looping
with parApply? Or no looping at all with something like parApply(cl,
rData, c(1,2,3), ...)?

Thanks in advance,
Gang
#
Hi Gang --

"Gang Chen" <gangchen6 at gmail.com> writes:
I think what you want is along the lines of
and then as you guessed
[1] TRUE

so for your example, I'd guess

runStat <- parApply(cl, rData, c(1,2,3), runAna, Model=Model)

This is not quite what you want -- the 'result' dimension is the first
rather than last
[1] 2 3 4 5
[1] 2 2 3 4

array-munging is not a speciality of mine, but a simple work-around is
to reorder the dimensions of the original array, so you're applying
to, and writing in, the slice indexed by the first entry
[1] 5 2 3 4
[1] 2 2 3 4
Hope that helps,

Martin

  
    
1 day later
#
Hi Martin,

Your suggestion really helps! It's exactly what I wanted. I really
appreciate it...

Regarding the array-munging part, the following will do:

b <- aperm(b, c(2,3,4,1))

I have a couple of related issues now:

(1) When running the following I get two warnings on my Mac OS X
10.4.11 (one from each processor, I guess):
WARNING: ignoring environment value of R_HOME
WARNING: ignoring environment value of R_HOME

Why is this warning? How to correct it?

(2) Previously I could follow up the progress of the job by sticking
the following

print(format(Sys.time(), "%D %H:%M:%OS3"))

inside the outermost for loop (with ii index), but now with parallel
computing I couldn't find a similar way to trace the progress. Do you
or anybody know how to do that?

Thanks again,
Gang
On Wed, Dec 3, 2008 at 12:05 PM, Martin Morgan <mtmorgan at fhcrc.org> wrote:
#
"Gang Chen" <gangchen6 at gmail.com> writes:
I guess you have a variable R_HOME specified in your environment, and
it is different from the location where the R workers are running
from. I suspect though that R is easily fooled, e.g., by symbolic
links. You might want to make sure that you're actually starting the
right R (ask, e.g., for the worker sessionInfo()).
You might modify runAna to write a timestamp to a (worker-specific)
file, and track that. It is not a nice hack. Maybe others have better
ideas?

Martin