Skip to content
Prev 1412 / 2152 Next

SNOW Hybrid Cluster in R, Network problems

Hi all of you,

I successfully created a hybrid cluster of several Windows and Linux 
machines using snow and MPICH2. Basically I setup a SOCK - Cluster. To 
start the Rscript processes on each machine MPICH2 comes in the game. 
Because it is Platform independent one can start processes on both OS, 
Win and Linux more or less remote. I know SSH is possible on Linux, but 
I'd like to have a clean solution for Windows too.

The first problem I have now is the following. With starting programming 
scripts running parallel code I noticed, that parLapply and all the 
others used to distribute many data to the nodes, IF these functions are 
called in a subroutine of a Script. Calling them from "Console" the 
Networkload is minimal, calling a function witch then calls parLapply 
causes big load on the Network. Now I have an big array to calculate an 
all the traffic slowing it down. I tried to read the R-Code in parApply 
and deeper, but can't find a useful hint.

The secound Problem is connected to the first. Two of four Windows 
mashines starting at 100 Mbit/s and collapses to 2.8 Mbit/s after 1 s. 
Now imagine snow tries to transfer many data... this slowes down the 
hole process enormous. So why the data transfer breaks down? I checked 
the cables, switches, firewalls and all what is related to physical 
networking. Nothing, everything is fine. One could transfer Files via 
FTP on the communication ports of R (10187) without any restriction. 100 
Mbit/s is absolutely possible. So my opinion is, that this must be an 
other software problem, maybe in R itsself?!


Many thanks for any idea!

Cu
Martin