Skip to content
Prev 1419 / 2152 Next

SNOW Hybrid Cluster in R, Network problems

Hi,

I solved the problem of big networkload using parLapply!!

I will explain it in short with a similar example:

cl<-makeCluster(4,type="SOCK")

spectrum.lomb<-function(x,y) { ....}    x... space vector e.g. time, or 
location
y... measured data

spectrum.lomb calculates a periodogram (frequency analysis) depending on 
x and y

we have many data in y with the same space vector on x. So a data list 
can be created

dat<-list(all_y)        dat[[1]]=y1; dat[[2]]=y2; .... and so on

If youy want to use lapply, one have to ensure that the first argument 
in the function FUN is a listelement of 'dat'. So in case of our 
function we need to change the order of variables, because x is always 
the same!!!

fun2<-function()
{
      clusterExport(cl,ls.str(mode="function",envir=.GlobalEnv )) # 
export ONLY the functions not the DATA to each node

     result<-parLapply(cl,dat,function(a,b) spectrum.lomb(b,a), b<-x)    
# change of variables order!!
     return(result)
}
fun2()    producing much networkload

Now take into account that parallel computing needs to distribute all 
relevant data to each node. In this case it should be the data list and 
the constant x vector. But this call end up in heavy network load and 
malfunction of network on 2 of my windows workers. Of some reason the 
indirect call of spectrum.lomb(x,y) causes R to distribute all the 
environment to each node. In my case sensor data is to be calculated and 
the environment ist about 500MB large. The result is, that n times 500MB 
changing the place and acquiring many Memory on the machines.
!!! Now if you try to call 'clusterExport' and 'parLapply' from console, 
every thing is fine, and network load is as low as possible. !!!

# changing the variables order in the definition of the function

spectrum.lomb<-function(y,x) { ....}    x... space vector e.g. time, or 
location
y... measured data
fun3<-function()
{
      clusterExport(cl,ls.str(mode="function",envir=.GlobalEnv )) # 
export ONLY the functions not the DATA to each node

     result<-parLapply(cl,dat, spectrum.lomb, x<-constant_vector)    
change of variables order!!
     return(result)
}
fun3()    causing minimal network load

So in the end I rewrote my definition and the problem is gone. But could 
anybody explain that to me?! Is see the point with the environments, but 
I don't understand this.


Cu and many thanks!

Martin





Dipl.-Ing. Martin Seilmayer

Helmholtz-Zentrum Dresden-Rossendorf e. V.

Institut fuer Fluiddynamik
Abteilung Magnetohydrodynamik
Bautzner Landstra?e 400
01328 Dresden, Germany

@fon: +49 351 260  3165
@fax: +49 351 260 12969
@web: www.hzdr.de

Am 03.07.2012 20:03, schrieb Stephen Weston: