On Sun, Dec 7, 2008 at 12:37 AM, Dan Bode <dbode at univaud.com> wrote:
The file transfer still relies on full reads and writes on the filesystem, just fixed a bug that could cause objects to be written more than once in some cases.
OK.
Maybe one day I can sit and think about ways to speed it up, I wanted to optimize memory usage, but I didnt want to go that low level with the R language. The pre and post processing is heavy but something similar will be required with any solution. This function is only recommended for applications that have a good ratio of data size to compute time.
Yes - that is exactly what I need: long processing, and no communication between the different processes. So mty approcah will work nicely with Rsge. Tge only reason why I tried to use snow and Rmpi / mpi was because it seemed the easiest approach. But Rsge is much easier.
I have only ever gotten snow to work with the client server method (tcp mode). I was thinking about using it as the base code for the project, but there was a feature that I did not like, something about the timing of job completions or something, I have to search through my notes to find what it was. (my package actually requires snow, because it uses its same function for splitting)
OK - I was wondering already.
I have never used Rmpi, mpi can be tricky though, you may want to ask your sge admin (which is a fun and rewarding job :) if they can assist you with setting up the Parallel execution environments.
I spoke to them, and they heve not used R and have therefore no experience in how to use Rmpi and snow. In addition, the main admin was given the job in addition to his other duties... The parallel environment with MPI is working - that's fine.
I ported Rsge from Rlsf which provided support for Rmpi, porting the Rmpi stuff was not a requirement, so I havent gotten around to it. There is some commented out code on in the remote execution script that refers to the Rmpi stuff. It could be added with some work, but again, I am not sure when I will find the time. To be sure, you need message passing for your algorithm? There is some kind of boundary condition or something? I just have to ask because I see alot of people using MPI where it could be easy to rework the algorithm to fit into the scatter-gather paradigm. MPI adds a lot of additional complexity to anything that it touches.
As mentioned above, I don't need any communication between the nodes (at the moment), so Rsge will be perfect. Thanks Rainer
-----Original Message----- From: Rainer M Krug [mailto:r.m.krug at gmail.com] Sent: Fri 12/5/2008 11:57 PM To: Dan Bode Cc: r-sig-hpc at stat.math.ethz.ch Subject: Re: Sun Grid Engine (SGE) and Rsge package On Fri, Dec 5, 2008 at 7:42 PM, Dan Bode <dbode at univaud.com> wrote:
I wrote the package for a customer that is just starting to build use cases around it. I will be updating to the latest and probably last planned version for now (.6) in the next week or so. The version available should be enough to get you started, but .6 contains some code changes that increase performance dramastically, it also provides more flexibility with the parallell distribution of local and global environments.
Sounds great - the performance increase will probably come from a streamlined submitting (writing files,...) and getting results (reading files)? Looking forward to that. I actualy managed to use it in a few examples, and apart from the overhang due to submitting, it really looks great.
I would be more than willing to assist you with getting started with the package.
Great - thanks.
I am assumming that you already have an SGE cluster installed?
Yes - I am just a normal user on the cluster and have (luckily?) nothing to do with it's admin.
We are working on open source packages around SGE for provisioning and configuration management with additional GLOBUS features. http://www.grid.org/downloads all of the functions are documented here http://cran.r-project.org/web/packages/Rsge/Rsge.pdf, (although I am sure that you already found this)
Yes - but I was missing a simple example. I found one in the "test" directory in the source package.
In particular you should be interested in sge.parApply or sge.submit, depending on your use cases.
True.
sge.apply(X, MARGIN, FUN, ..., njobs, packages, savelist) those are prob all of the params that you may need to get started. This will split the data structure defined by X into njobs data structures, FUN is applied against these data structures on the remote execution hosts, then the results will be merged togehther using the join.method.
OK - sounds straight forward.
packages and savelist are both a vector of strings savelist is used to add objects from the global environment (this will change in the next version)
OK.
packages lists the packages that need to be loaded in the remote execution environments
Nice.
and X, MARGIN, FUN, and ... should be the same as regular apply. I hope this is enough to get you started.
Yes - thanks a lot for the additional info.
Also, you are the first person to contact me, how did you find the package?
I am having tremendous problems getting parallel processing in R on a cluster running opnmpi and SGE (mpich is also available). I tried to use snow and Rmpi, but all the processes were spawned on the same node on which the master session of R was running. Therefore I googled for sge and R and found your package. It looks really promising. Do you have any ideas, if I can use snow (through Rmpi) on a SGE - openmpi cluster? if yes, how? I have to use qsub to start the job.
Enjoy the package!
I already do, Thanks, Rainer
Dan Bode -----Eredeti ?zenet----- Felad?: Rainer M Krug [mailto:r.m.krug at gmail.com] K?ldve: 12/4/2008, Cs 6:49 du. C?mzett: r-sig-hpc at stat.math.ethz.ch M?solatot kap: Dan Bode T?rgy: Sun Grid Engine (SGE) and Rsge package Hi I just found the Rsge package which seems to suit my need to do parallel processing on a cluster with SGE. I did not manage to use Rmpi or snow (which would have been nice). My only problem is, that I can't find any documentation on how to use Rsge (apart from the one included in the source package) - it seems to utilise several options which might be useful to set. Does anybody has any experience with this package or can show me a way of using snow or Rmpi in an environment where I have to use SGE (qsub ...)? Thanks Rainer -- Rainer M. Krug, PhD (Conservation Ecology, SUN), MSc (Conservation Biology, UCT), Dipl. Phys. (Germany) Centre of Excellence for Invasion Biology Faculty of Science Natural Sciences Building Private Bag X1 University of Stellenbosch Matieland 7602 South Africa --------------------------------------------------------------------- Notice from Univa UD Postmaster: This email message is for the sole use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message. This message has been content scanned by the Univa UD Tumbleweed MailGate. ---------------------------------------------------------------------
--
Rainer M. Krug, PhD (Conservation Ecology, SUN), MSc (Conservation
Biology, UCT), Dipl. Phys. (Germany)
Centre of Excellence for Invasion Biology
Faculty of Science
Natural Sciences Building
Private Bag X1
University of Stellenbosch
Matieland 7602
South Africa
---------------------------------------------------------------------
Notice from Univa UD Postmaster:
This email message is for the sole use of the intended...{{dropped:25}}