Skip to content

Is R more heavy on memory or processor?

4 messages · Simon Urbanek, Booman, M, Brian Ripley

#
On Mar 24, 2009, at 14:55 , Booman, M wrote:

            
I don't use BioC [you may want to ask on the BioC list instead (or  
hopefully some BioC users will chip in)], so my recommendations may be  
based on slightly different problems.
Unfortunately I cannot comment on Nehalems, but in general with Xeons  
you do feel quite a difference in the clock speed, so I wouldn't trade  
2.93GHz for 2.26GHz regardless of the CPU generation. It is true that  
pre-Nehalem Mac Pros cannot feed 8 cores, so you want go for the new  
Mac Pros, but I wouldn't even think about the 2.26GHz option. Some  
benchmarks suggest that the 2.26 Nehalem can still compete favorably  
if a lot of memory/io is involved, but it was not very convincing and  
I cannot tell first hand.
R can use multiple cores in many ways - through BLAS (default in R for  
Mac OS X), vector op parallelization (Luke's pnmath) or explicit  
parallelization such as forking (multicore) or parallel processes  
(snow). The amount of parallelization achievable depends heavily on  
your applications. I use routinely all cores, but then I'm usually  
modeling my problems that way.
I cannot comment on ongoing work details due to DNA associated with  
Snow Leopard, but technically from the Apple announcements you can  
deduce that the only possible improvements directly related to R can  
be achieved in the implicit parallelization which is essentially the  
pnmath path. There is not much more you can do in R save for a re- 
write of the methods you want to deal with.

In fact, the hope is rather that the packages for R start using  
parallelization more effectively, but that's not something Snow  
Leopard alone can change.
In my line of work (which is not bioinf, though) RAM turned to be more  
important, because the drop off when you run out of memory is sudden  
and devastatingly huge. With CPUs you'll have to wait a bit longer,  
but the difference is directly proportional to the CPU speed you get,  
so it is never as bad as running out of wired RAM. (BTW: in general  
you don't want to buy RAM from Apple - as much as I like Apple, there  
are compatible RAM sets at a fraction of the cost of what Apple  
charges, especially for Mac Pros - but there is always the 1st  
generation issue *).
6GB is very little RAM, so I don't think that's an option ;) - but  
yes, you should care about the size first. The channels and timings  
only define how you populate the slots. Note that the 4-core Nehalem  
has only 4 slots, so it's not very expandable - I'd definitely get a 8- 
core old one with 16GB RAM or more rather than something that can take  
only 8GB ...
I would keep an eye on the RAM expansibility - even if you buy less  
RAM now, a ceiling of 8GB is very low. It may turn out that larger  
DIMMs will become available, but 16GB for the future is not enough,  
either. As with all 1st generation products the prices will go down a  
lot over time, so you may plan to upgrade later. Another piece worth  
considering is that you can always update RAM easily, but CPU upgrade  
is much more difficult.

Cheers,
Simon
1 day later
#
On Tue, 24 Mar 2009, Simon Urbanek wrote:

            
Simon,

We've some experience with recent Xeons on Linux serrers, and that 
says that the size of the L1 cache is at least as important as clock 
speed.  The following figures are from memory and rounded ....  A dual 
quad-core 2.5GHz 12Mb cache system (we've an identical pair, one my 
server, bought in January) outperforms a dual quad-core 3CHz 6Mb cache 
system bought 9 months earlier.  That's running R, and in particular 
multiple R jobs.  At least here, the extra cost of the 2.93GHz 
processor is phenomenal.

Also, it looks to us like the Achilles' Heel of the Mac Pro is its 
disk system.  Even if you load it up with a RAID controller and extra 
discs (pretty exorbitant, too) it is still on paper way down on my 
server -- and the 3GHz server does considerably outperform mine on 
disc I/O as it has more discs and a better RAID controller, and our 
Solaris servers are better still.

Just a bit of background,

Brian