recommended computing server for R (March 2009)?

An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20090328/5a58d418/attachment-0002.pl>
| I need to speed up my monte-carlo simulations. my code is written in R (and  
| it was also the cause of my many questions here over the last few days). my  
[...]
| with $10/GB of DRAM, this is no longer a bottleneck. For my application,  
| parallelism is a given, since most of it is monte-carlo simulations. (I  
[...]
| My operating system will probably be ubuntu. (I also run a little of it on  
| an OSX Mac Pro I own.)

One thing you could consider is renting the compute hours from the cloud:

    http://aws.amazon.com/ec2

as EC2 now has a choice of Debian and Ubuntu (among others) and Debian /
Ubuntu have R and Open MPI work out of the box.  Examples as in my 'Intro to
High Performance Computing with R' tutorials from UseR and the BoC [ google
for the pdf slides if interested ] should apply 'as is', you don't need to
fiddle with (physical) hardware and can scale up CPU resources as needed.

Dirk
Three out of two people have difficulties with fractions.
An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20090328/d54f556d/attachment-0002.pl>
| thanks, dirk. I just read your tutorial. great information for our needs.

My pleasure.

| alas, the Amazon economics do not work well for us. the server that I am  
| planning to purchase should cost around $800 and is the equivalent of the  
| high-intensive CPU, which goes for $0.80/hour. that's about 2 months of  
| amazon server time for the same price. if administering the hardware is  
| very costly, then amazon is cost-effective. fortunately, we believe we can  
| run the hardware easily ourselves.

A steady supply of grad students can do that do a project, I suppose.

| I wonder how long it will take before debian will offer a GUI program that  
| allow users like us to "rent out" a server for cash, and credit them via  
| paypal. alas, maybe a special R cloud distribution (that has "everything R"  
| already installed, too) could support the R project itself?! I would donate  
| our free CPU time to the R project when the CPU is not otherwise used.  
| probably some others would do the same, too.

Debian never will as it is strictly a non-profit. Canonical (Ubuntu's paren)
might -- the next Ubuntu release will already contain what is said to be an
'amazon-ec2-compatible' "build you own cloud" system based on the Eucalyptus
system from UCSB: http://eucalyptus.cs.ucsb.edu/

A PS to your original question:  IIRC you can also buy systems from Dell and
HP with Ubuntu pre-installed.  Ubuntu gives you Atlas, you can try to add the
non-free Goto blas, or the commerical MKL blas, or ... to further speed up
your inner linear models (now that you learned about lm.fit() et al).

The rest of the discussion, incl the hardware and esp administration aspect,
may be more appropriate for r-sig-hpc (subscription needed for posting...)

Dirk
Three out of two people have difficulties with fractions.
On 28 March 2009 at 14:36, ivowel at gmail.com wrote:
| thanks, dirk. I just read your tutorial. great information for our needs.

My pleasure.

| alas, the Amazon economics do not work well for us. the server that I am
| planning to purchase should cost around $800 and is the equivalent of the
| high-intensive CPU, which goes for $0.80/hour. that's about 2 months of
| amazon server time for the same price. if administering the hardware is
| very costly, then amazon is cost-effective. fortunately, we believe we can
| run the hardware easily ourselves.

A steady supply of grad students can do that do a project, I suppose.

| I wonder how long it will take before debian will offer a GUI program that
| allow users like us to "rent out" a server for cash, and credit them via
| paypal. alas, maybe a special R cloud distribution (that has "everything R"
| already installed, too) could support the R project itself?! I would donate
| our free CPU time to the R project when the CPU is not otherwise used.
| probably some others would do the same, too.

Debian never will as it is strictly a non-profit. Canonical (Ubuntu's paren)
might -- the next Ubuntu release will already contain what is said to be an
'amazon-ec2-compatible' "build you own cloud" system based on the Eucalyptus
system from UCSB: http://eucalyptus.cs.ucsb.edu/

A PS to your original question: ?IIRC you can also buy systems from Dell and
HP with Ubuntu pre-installed. ?Ubuntu gives you Atlas, you can try to add the
non-free Goto blas, or the commerical MKL blas, or ... to further speed up
your inner linear models (now that you learned about lm.fit() et al).
Well, there is a bit of a sad story there.  Accelerated BLAS are most
effective in speeding up numerical linear algebra when the level-3
BLAS are used. (Level-1 BLAS are vector-vector operations, level-2 are
matrix-vector and level-3 are matrix-matrix operations) Lapack is
based on level-3 BLAS whereas Linpack is based on level-1 BLAS.
Almost all numerical linear algebra in R uses Lapack.  The one
exception is - wait for it - the QR decomposition used in ls.fit,
because of the choice of pivoting schemes.  It is a long story but
when fitting a linear model you want to detect near-singularity in the
model matrix and move the offending columns to be the last columns in
the matrix but otherwise retain the original order.  That is, you
don't want to scramble columns corresponding to different terms in the
model.  Neither Linpack nor Lapack offered that type of pivoting but
it was retrofitted onto the dqrdc subroutine from Linpack.  (Notice
that the default path in qr.default calls a Fortran subroutine called
"dqrdc2".)

You could use the LAPACK = TRUE argument to R's qr function to get the
unconstrained pivoting scheme and use that to get coefficient
estimates according to the estimated rank of the model matrix (see
example(qr)) but that won't give you the information needed for the
analysis of variance decompositions.

Try

example(qr)
qr(hilbert(20))$pivot
qr(hilbert(20), LAPACK = TRUE)$pivot
The rest of the discussion, incl the hardware and esp administration aspect,
may be more appropriate for r-sig-hpc (subscription needed for posting...)

Dirk

--
Three out of two people have difficulties with fractions.

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

You could use the LAPACK = TRUE argument to R's qr function to get the
unconstrained pivoting scheme and use that to get coefficient
estimates according to the estimated rank of the model matrix (see
example(qr)) but that won't give you the information needed for the
analysis of variance decompositions.
Yup. However, my gut feeling is that there could be a way out:

First, how important are (sequential) ANOVA decompositions anyway; and 
secondly, is it crucial that they can be read directly off the QR 
decomposition? There's a lot of code that assumes that this is the case, 
so you can't _easily_ plug in a pivoting QR, but the ANOVA can obviously 
be obtained by other means - basically just fit the relevant sequence of 
models and look at the SSD differences (as I suppose glm() must already 
do for deviance tables).
O__  ---- Peter Dalgaard             ?ster Farimagsgade 5, Entr.B
   c/ /'_ --- Dept. of Biostatistics     PO Box 2099, 1014 Cph. K
  (*) \(*) -- University of Copenhagen   Denmark      Ph:  (+45) 35327918
~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk)              FAX: (+45) 35327907
On Sat, Mar 28, 2009 at 11:47 AM, Peter Dalgaard
Douglas Bates wrote:

You could use the LAPACK = TRUE argument to R's qr function to get the
unconstrained pivoting scheme and use that to get coefficient
estimates according to the estimated rank of the model matrix (see
example(qr)) but that won't give you the information needed for the
analysis of variance decompositions.
Yup. However, my gut feeling is that there could be a way out:

First, how important are (sequential) ANOVA decompositions anyway; and
secondly, is it crucial that they can be read directly off the QR
decomposition? There's a lot of code that assumes that this is the case, so
you can't _easily_ plug in a pivoting QR, but the ANOVA can obviously be
obtained by other means - basically just fit the relevant sequence of models
and look at the SSD differences (as I suppose glm() must already do for
deviance tables).
I had proposed another scheme which is to do the unpivoted QR
decomposition and check for rank deficiency.  If the model matrix is
judged to have full column rank then return.  Otherwise check the
diagonal elements of R for the first apparent singularity, pivot that
column to the end, and either update the current decomposition or
recalculate the decomposition then iterate.  John Chambers said that
he might have a student work on something like this for a course
project in the Statistical Computing course he is teaching at
Stanford.