Amazon AWS, RGenoud, Parallel Computing
----------------------------------------
Date: Sat, 11 Jun 2011 13:03:10 +0200 From: lui.r.project at googlemail.com To: r-help at r-project.org Subject: [R] Amazon AWS, RGenoud, Parallel Computing Dear R group,
[...]
I am a little bit puzzled now about what I could do... It seems like there are only very limited options for me to increase the performance. Does anybody have experience with parallel computations with rGenoud or parallelized sorting algorithms? I think one major problem is that the sorting happens rather quick (only a few hundred entries to sort), but needs to be done very frequently (population size >2000, iterations >500), so I guess the problem with the "housekeeping" of the parallel computation deminishes all benefits.
Your sort is part of algorithm or you have to sort results after getting then back out of order from async processes? One of my favorite anecdotes is how I used a bash sort on huge data file to make program run faster ( from impractical zero percent CPU to very fast with full CPU usage and you complain about exactly a lack of CPU saturation). I guess a couple of comments. First, if you have specialized apps you need optimized, you may want to write dedicated c++ code. However, this won't help if you don't find the bottleneck. Lack of CPU saturation could easily be due to "waiting for stuff" like disk IO or VM swap. You really ought to find the bottle neck first, it could be anything ( except the CPU maybe LOL). The sort that I used prevented VM thrashing with no change to the app code- the app got sorted data and so VM paging became infrequent. If you can specify the problem precisely you may be able to find a simple solution.