Skip to content

Ultimate Linux Box (was RE: [R] optimal windows R machine)

4 messages · Liaw, Andy, Peter Dalgaard, Mark Myatt +1 more

#
Well, maybe this is sort of related...

Eric Raymond has written an article for the Linux Journal on how to build an
Ultimate Linux Box.  You can find the whole thing at

http://www2.linuxjournal.com/cgi-bin/frames.pl/articles/style/0014.html

The machine is said to compile the Linux 2.4.9 kernel in 1 min. 50 sec.  I
wonder how long it might take to build R on this beast...

Cheers,
Andy
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
#
"Liaw, Andy" <andy_liaw at merck.com> writes:
My guess is that it's about twice as fast as this beast (2xPentiumIII,866):

[Current R-devel]
291.64user 13.57system 4:01.62elapsed 126%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (586156major+348522minor)pagefaults 0swaps

The R builds spend quite a lot of time building the documentation,
and are locked to uniprocessor mode for the duration. This is because
it runs as a single perl script that processes all .Rd files in a
package. There's a good reason for that, but splitting things across a
couple of perl runs would speed things up considerably. The compile
part can be done (setting MAKE="make -j5") in 

165.89user 12.78system 1:46.26elapsed 168%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (634972major+360981minor)pagefaults 48swaps

(This machine is still running RedHat 6.2 with it's default dodgy
compiler, so make check doesn't run...)
#
All,

A handful of users have reported problems extracting the PDF file from
the ZIP archive (Rex1031.zip). I don't know what caused the problem so I
have recreated the archive and posted it to my website:

        http://www.myatt.demon.co.uk

This morning (19 Oct 2001). I hope that this solves the problem. It may
take some time for the file to migrate to the public facing servers ...
better to wait a day before retrieving the new file (dated 19 Oct 2001).

Thanks to all of you who reported the problems and sorry for any
inconvenience caused.

Best wishes,

Mark


--
Mark Myatt


-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
#
Hi,

I have a question about a simulation I've tried to write in R.  I'd
appreciate any suggestions for improvements in my R code or logic used to
construct the simulation.

Background:  I've recently come across a paper that cautions against using
the normal approximation to the binomial when P (population proportion)
is very small (or very large).  As an applied data analyst who is about to
do some work with differences between small p-hats (sample proportions), I
wanted to demonstrate to myself that even though the parent distributions
may not be well approximated by the normal distribution, the differences
between two proportions may be.  There may be an elegant proof of this (an
extension of the Central Limit Theorem?), but I wanted to see if I could
both demonstrate it the concept to myself and to sharpen my R usage to do
it.

Here's my strategy.  Assume two populations with population proportions of
.05.  Under the null hypothesis of no difference, the population
difference between proportions is 0.  Sample p-hats from these two
proportions repeatedly to create an empirical sampling distribution for
p-hat differences.  Bootstrap sample from the empirical sampling
distribution.  Find the mean p-hat difference for each bootstrap sample.
Then plot the resulting mean differences with a histogram, density plot,
and qq plot for visual inspection.  Test for normality just for kicks.

Here's my code:

### begin code

# an attempt to sample from two binomial distributions both
# with P=.05 to create a sampling distribution of p-hat1 - p-hat2
# differences

# create the binomial variable x and p(x)
source<-seq(0, 100, by=1)
px<-dbinom(source, 100, .05)

# sample from binomial distributions with probability px
# divide by 100 to create p-hats
# find 500 differences from randomly drawn p-hats with probability px
#  to create empircal sampling distribution for p-hat differences
dif4<-((sample(source, 500, replace=TRUE, px))/100 - (sample(source, 500,
        replace=TRUE, px))/100)

# get 5000 bootstrap samples from empirical sampling distribution for
p-hat differences
library(bootstrap)
sim4<-bootstrap(dif4, 5000, mean)

# pull out the p-hat difference statistics to plot
par(mfrow=c(2,2))
hist(sim4$thetastar, main="Histogram Prop<=.05, binomial draws")
plot(density(sim4$thetastar), main="Density Plot Prop<=.05, binomial
     draws")

# a significance test for normality
shapiro.test(sim4$thetastar)
qqnorm(sim4$thetastar, main="QQ Plot Prop<=.05, binomial draws")
plot(mean(sim4$thetastar), type="n", main="Density Plot Prop<=.05,
binomial draws")

### end code


This is my first attempt at something like this, so I'd consider any
suggestions as learning opportunities.  Does this make sense?  Does it
seem to do what I intend?  Are there faster/cleaner ways to do this?

I'm not sure it's relevant for this question but just in case:

platform i386-pc-mingw32
arch     x86
os       Win32
system   x86, Win32
status
major    1
minor    3.1
year     2001
month    08
day      31
language R


Thanks,

Scot McNary


--
  Scot W. McNary  email:smcnary at charm.net


-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._