Skip to content

recommended computing server for R (March 2009)?

7 messages · ivo welch, Dirk Eddelbuettel, Douglas Bates +1 more

#
On 28 March 2009 at 13:37, ivowel at gmail.com wrote:
| I need to speed up my monte-carlo simulations. my code is written in R (and  
| it was also the cause of my many questions here over the last few days). my  
[...]
| with $10/GB of DRAM, this is no longer a bottleneck. For my application,  
| parallelism is a given, since most of it is monte-carlo simulations. (I  
[...]
| My operating system will probably be ubuntu. (I also run a little of it on  
| an OSX Mac Pro I own.)

One thing you could consider is renting the compute hours from the cloud:

    http://aws.amazon.com/ec2

as EC2 now has a choice of Debian and Ubuntu (among others) and Debian /
Ubuntu have R and Open MPI work out of the box.  Examples as in my 'Intro to
High Performance Computing with R' tutorials from UseR and the BoC [ google
for the pdf slides if interested ] should apply 'as is', you don't need to
fiddle with (physical) hardware and can scale up CPU resources as needed.

Dirk
#
On 28 March 2009 at 14:36, ivowel at gmail.com wrote:
| thanks, dirk. I just read your tutorial. great information for our needs.

My pleasure.

| alas, the Amazon economics do not work well for us. the server that I am  
| planning to purchase should cost around $800 and is the equivalent of the  
| high-intensive CPU, which goes for $0.80/hour. that's about 2 months of  
| amazon server time for the same price. if administering the hardware is  
| very costly, then amazon is cost-effective. fortunately, we believe we can  
| run the hardware easily ourselves.

A steady supply of grad students can do that do a project, I suppose.
 
| I wonder how long it will take before debian will offer a GUI program that  
| allow users like us to "rent out" a server for cash, and credit them via  
| paypal. alas, maybe a special R cloud distribution (that has "everything R"  
| already installed, too) could support the R project itself?! I would donate  
| our free CPU time to the R project when the CPU is not otherwise used.  
| probably some others would do the same, too.

Debian never will as it is strictly a non-profit. Canonical (Ubuntu's paren)
might -- the next Ubuntu release will already contain what is said to be an
'amazon-ec2-compatible' "build you own cloud" system based on the Eucalyptus
system from UCSB: http://eucalyptus.cs.ucsb.edu/

A PS to your original question:  IIRC you can also buy systems from Dell and
HP with Ubuntu pre-installed.  Ubuntu gives you Atlas, you can try to add the
non-free Goto blas, or the commerical MKL blas, or ... to further speed up
your inner linear models (now that you learned about lm.fit() et al).

The rest of the discussion, incl the hardware and esp administration aspect,
may be more appropriate for r-sig-hpc (subscription needed for posting...)

Dirk
#
On Sat, Mar 28, 2009 at 9:55 AM, Dirk Eddelbuettel <edd at debian.org> wrote:
Well, there is a bit of a sad story there.  Accelerated BLAS are most
effective in speeding up numerical linear algebra when the level-3
BLAS are used. (Level-1 BLAS are vector-vector operations, level-2 are
matrix-vector and level-3 are matrix-matrix operations) Lapack is
based on level-3 BLAS whereas Linpack is based on level-1 BLAS.
Almost all numerical linear algebra in R uses Lapack.  The one
exception is - wait for it - the QR decomposition used in ls.fit,
because of the choice of pivoting schemes.  It is a long story but
when fitting a linear model you want to detect near-singularity in the
model matrix and move the offending columns to be the last columns in
the matrix but otherwise retain the original order.  That is, you
don't want to scramble columns corresponding to different terms in the
model.  Neither Linpack nor Lapack offered that type of pivoting but
it was retrofitted onto the dqrdc subroutine from Linpack.  (Notice
that the default path in qr.default calls a Fortran subroutine called
"dqrdc2".)

You could use the LAPACK = TRUE argument to R's qr function to get the
unconstrained pivoting scheme and use that to get coefficient
estimates according to the estimated rank of the model matrix (see
example(qr)) but that won't give you the information needed for the
analysis of variance decompositions.

Try

example(qr)
qr(hilbert(20))$pivot
qr(hilbert(20), LAPACK = TRUE)$pivot
#
Douglas Bates wrote:

            
Yup. However, my gut feeling is that there could be a way out:

First, how important are (sequential) ANOVA decompositions anyway; and 
secondly, is it crucial that they can be read directly off the QR 
decomposition? There's a lot of code that assumes that this is the case, 
so you can't _easily_ plug in a pivoting QR, but the ANOVA can obviously 
be obtained by other means - basically just fit the relevant sequence of 
models and look at the SSD differences (as I suppose glm() must already 
do for deviance tables).
#
On Sat, Mar 28, 2009 at 11:47 AM, Peter Dalgaard
<p.dalgaard at biostat.ku.dk> wrote:
I had proposed another scheme which is to do the unpivoted QR
decomposition and check for rank deficiency.  If the model matrix is
judged to have full column rank then return.  Otherwise check the
diagonal elements of R for the first apparent singularity, pivot that
column to the end, and either update the current decomposition or
recalculate the decomposition then iterate.  John Chambers said that
he might have a student work on something like this for a course
project in the Statistical Computing course he is teaching at
Stanford.