An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20090328/5a58d418/attachment-0002.pl>
recommended computing server for R (March 2009)?
7 messages · ivo welch, Dirk Eddelbuettel, Douglas Bates +1 more
On 28 March 2009 at 13:37, ivowel at gmail.com wrote:
| I need to speed up my monte-carlo simulations. my code is written in R (and
| it was also the cause of my many questions here over the last few days). my
[...]
| with $10/GB of DRAM, this is no longer a bottleneck. For my application,
| parallelism is a given, since most of it is monte-carlo simulations. (I
[...]
| My operating system will probably be ubuntu. (I also run a little of it on
| an OSX Mac Pro I own.)
One thing you could consider is renting the compute hours from the cloud:
http://aws.amazon.com/ec2
as EC2 now has a choice of Debian and Ubuntu (among others) and Debian /
Ubuntu have R and Open MPI work out of the box. Examples as in my 'Intro to
High Performance Computing with R' tutorials from UseR and the BoC [ google
for the pdf slides if interested ] should apply 'as is', you don't need to
fiddle with (physical) hardware and can scale up CPU resources as needed.
Dirk
Three out of two people have difficulties with fractions.
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20090328/d54f556d/attachment-0002.pl>
On 28 March 2009 at 14:36, ivowel at gmail.com wrote:
| thanks, dirk. I just read your tutorial. great information for our needs. My pleasure. | alas, the Amazon economics do not work well for us. the server that I am | planning to purchase should cost around $800 and is the equivalent of the | high-intensive CPU, which goes for $0.80/hour. that's about 2 months of | amazon server time for the same price. if administering the hardware is | very costly, then amazon is cost-effective. fortunately, we believe we can | run the hardware easily ourselves. A steady supply of grad students can do that do a project, I suppose. | I wonder how long it will take before debian will offer a GUI program that | allow users like us to "rent out" a server for cash, and credit them via | paypal. alas, maybe a special R cloud distribution (that has "everything R" | already installed, too) could support the R project itself?! I would donate | our free CPU time to the R project when the CPU is not otherwise used. | probably some others would do the same, too. Debian never will as it is strictly a non-profit. Canonical (Ubuntu's paren) might -- the next Ubuntu release will already contain what is said to be an 'amazon-ec2-compatible' "build you own cloud" system based on the Eucalyptus system from UCSB: http://eucalyptus.cs.ucsb.edu/ A PS to your original question: IIRC you can also buy systems from Dell and HP with Ubuntu pre-installed. Ubuntu gives you Atlas, you can try to add the non-free Goto blas, or the commerical MKL blas, or ... to further speed up your inner linear models (now that you learned about lm.fit() et al). The rest of the discussion, incl the hardware and esp administration aspect, may be more appropriate for r-sig-hpc (subscription needed for posting...) Dirk
Three out of two people have difficulties with fractions.
On Sat, Mar 28, 2009 at 9:55 AM, Dirk Eddelbuettel <edd at debian.org> wrote:
On 28 March 2009 at 14:36, ivowel at gmail.com wrote: | thanks, dirk. I just read your tutorial. great information for our needs. My pleasure. | alas, the Amazon economics do not work well for us. the server that I am | planning to purchase should cost around $800 and is the equivalent of the | high-intensive CPU, which goes for $0.80/hour. that's about 2 months of | amazon server time for the same price. if administering the hardware is | very costly, then amazon is cost-effective. fortunately, we believe we can | run the hardware easily ourselves. A steady supply of grad students can do that do a project, I suppose. | I wonder how long it will take before debian will offer a GUI program that | allow users like us to "rent out" a server for cash, and credit them via | paypal. alas, maybe a special R cloud distribution (that has "everything R" | already installed, too) could support the R project itself?! I would donate | our free CPU time to the R project when the CPU is not otherwise used. | probably some others would do the same, too. Debian never will as it is strictly a non-profit. Canonical (Ubuntu's paren) might -- the next Ubuntu release will already contain what is said to be an 'amazon-ec2-compatible' "build you own cloud" system based on the Eucalyptus system from UCSB: http://eucalyptus.cs.ucsb.edu/ A PS to your original question: ?IIRC you can also buy systems from Dell and HP with Ubuntu pre-installed. ?Ubuntu gives you Atlas, you can try to add the non-free Goto blas, or the commerical MKL blas, or ... to further speed up your inner linear models (now that you learned about lm.fit() et al).
Well, there is a bit of a sad story there. Accelerated BLAS are most effective in speeding up numerical linear algebra when the level-3 BLAS are used. (Level-1 BLAS are vector-vector operations, level-2 are matrix-vector and level-3 are matrix-matrix operations) Lapack is based on level-3 BLAS whereas Linpack is based on level-1 BLAS. Almost all numerical linear algebra in R uses Lapack. The one exception is - wait for it - the QR decomposition used in ls.fit, because of the choice of pivoting schemes. It is a long story but when fitting a linear model you want to detect near-singularity in the model matrix and move the offending columns to be the last columns in the matrix but otherwise retain the original order. That is, you don't want to scramble columns corresponding to different terms in the model. Neither Linpack nor Lapack offered that type of pivoting but it was retrofitted onto the dqrdc subroutine from Linpack. (Notice that the default path in qr.default calls a Fortran subroutine called "dqrdc2".) You could use the LAPACK = TRUE argument to R's qr function to get the unconstrained pivoting scheme and use that to get coefficient estimates according to the estimated rank of the model matrix (see example(qr)) but that won't give you the information needed for the analysis of variance decompositions. Try example(qr) qr(hilbert(20))$pivot qr(hilbert(20), LAPACK = TRUE)$pivot
The rest of the discussion, incl the hardware and esp administration aspect, may be more appropriate for r-sig-hpc (subscription needed for posting...) Dirk -- Three out of two people have difficulties with fractions.
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Douglas Bates wrote:
You could use the LAPACK = TRUE argument to R's qr function to get the unconstrained pivoting scheme and use that to get coefficient estimates according to the estimated rank of the model matrix (see example(qr)) but that won't give you the information needed for the analysis of variance decompositions.
Yup. However, my gut feeling is that there could be a way out: First, how important are (sequential) ANOVA decompositions anyway; and secondly, is it crucial that they can be read directly off the QR decomposition? There's a lot of code that assumes that this is the case, so you can't _easily_ plug in a pivoting QR, but the ANOVA can obviously be obtained by other means - basically just fit the relevant sequence of models and look at the SSD differences (as I suppose glm() must already do for deviance tables).
O__ ---- Peter Dalgaard ?ster Farimagsgade 5, Entr.B c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX: (+45) 35327907
On Sat, Mar 28, 2009 at 11:47 AM, Peter Dalgaard
<p.dalgaard at biostat.ku.dk> wrote:
Douglas Bates wrote:
You could use the LAPACK = TRUE argument to R's qr function to get the unconstrained pivoting scheme and use that to get coefficient estimates according to the estimated rank of the model matrix (see example(qr)) but that won't give you the information needed for the analysis of variance decompositions.
Yup. However, my gut feeling is that there could be a way out: First, how important are (sequential) ANOVA decompositions anyway; and secondly, is it crucial that they can be read directly off the QR decomposition? There's a lot of code that assumes that this is the case, so you can't _easily_ plug in a pivoting QR, but the ANOVA can obviously be obtained by other means - basically just fit the relevant sequence of models and look at the SSD differences (as I suppose glm() must already do for deviance tables).
I had proposed another scheme which is to do the unpivoted QR decomposition and check for rank deficiency. If the model matrix is judged to have full column rank then return. Otherwise check the diagonal elements of R for the first apparent singularity, pivot that column to the end, and either update the current decomposition or recalculate the decomposition then iterate. John Chambers said that he might have a student work on something like this for a course project in the Statistical Computing course he is teaching at Stanford.