SNOW Hybrid Cluster in R, Network problems
Hi all of you, I successfully created a hybrid cluster of several Windows and Linux machines using snow and MPICH2. Basically I setup a SOCK - Cluster. To start the Rscript processes on each machine MPICH2 comes in the game. Because it is Platform independent one can start processes on both OS, Win and Linux more or less remote. I know SSH is possible on Linux, but I'd like to have a clean solution for Windows too. The first problem I have now is the following. With starting programming scripts running parallel code I noticed, that parLapply and all the others used to distribute many data to the nodes, IF these functions are called in a subroutine of a Script. Calling them from "Console" the Networkload is minimal, calling a function witch then calls parLapply causes big load on the Network. Now I have an big array to calculate an all the traffic slowing it down. I tried to read the R-Code in parApply and deeper, but can't find a useful hint. The secound Problem is connected to the first. Two of four Windows mashines starting at 100 Mbit/s and collapses to 2.8 Mbit/s after 1 s. Now imagine snow tries to transfer many data... this slowes down the hole process enormous. So why the data transfer breaks down? I checked the cables, switches, firewalls and all what is related to physical networking. Nothing, everything is fine. One could transfer Files via FTP on the communication ports of R (10187) without any restriction. 100 Mbit/s is absolutely possible. So my opinion is, that this must be an other software problem, maybe in R itsself?! Many thanks for any idea! Cu Martin
Dipl.-Ing. Martin Seilmayer Helmholtz-Zentrum Dresden-Rossendorf e. V. Institut fuer Fluiddynamik Abteilung Magnetohydrodynamik Bautzner Landstra?e 400 01328 Dresden, Germany @fon: +49 351 260 3165 @fax: +49 351 260 12969 @web: www.hzdr.de