Loop: noob question
Inline below. On Fri, Aug 5, 2011 at 10:23 AM, R. Michael Weylandt
<michael.weylandt at gmail.com> wrote:
Bert, You are absolutely correct: I was wrong not to vectorize in this case.
No. That wasn't my point at all. In this case, vectorizing doesn't seem to help because you still must do a loop (via *ply) in R. My point was only that replicate() is another form of a loop and that, when possible, vectorizing is preferable. Here I don't think it's possible. However, I interpreted your advice as a general prescription, and that was the problematic part.
I am surprised, however, by your remark that sapply() (or really lapply()) is faster than apply() -- is there a reason for this?
I apologize for being unclear. I **do not know** this to be true, although one might guess it to be so by looking at the apply() code, which has some overhead in it before calling vapply() to do the actual loop. However, I defer to experts for confirmation -- or not. Cheers, Bert I would have guessed
that the major difference between the two would have been memory management since replicate only holds 250 numbers at a time, rather than 2.5e6. I don't know how memory management is implemented on the C level for R so I might be entirely wrong about that though. For UnitRoot: if you go with Bert's code, there's a little typo: rmat <- matrix(rnorm(2.5e7, nrow=250)) ?## each column is a sample of 250 move a parenthesis: rmat <- matrix(rnorm(2.5e7), nrow=250)? ## each column is a sample of 250 Michael Weylandt On Fri, Aug 5, 2011 at 12:51 PM, Bert Gunter <gunter.berton at gene.com> wrote:
Michael: I'm sorry, but this "advice" is wrong. replicate() **IS** essentially a loop: it uses sapply(), which is basically an interpreted loop (with suitable caveats that R experts can provide). The correct advice is: whenever possible, move the loops down to underlying C code by vectorizing. In the example below, one can partly do this by removing the random number generation from the loop and structuring the result as a matrix: rmat <- matrix(rnorm(2.5e7, nrow=250)) ?## each column is a sample of 250 Now one must use apply() -- which is a loop for -- quantile() out <- apply(rmat,2,quantile, probs = .95) Unfortunately, in this case, it won't make much difference, as random number generation is very fast anyway and the looping for quantile() is the same either way. In fact, both versions took almost the same time on my computer (replicate() was actually 3 seconds faster --- 48 vs 51 seconds -- perhaps because sapply() is slightly more efficient than apply() ). As here (I think), one often cannot vectorize and must loop in interpreted code, and for this replicate() is fine and yields nice clean code. But it's still a loop. Cheers, Bert On Fri, Aug 5, 2011 at 9:15 AM, R. Michael Weylandt <michael.weylandt at gmail.com> wrote:
This is a textbook of when NOT to use a loop in R: rather make a
function
that does what you want and use the replicate function to do it
repeatedly.
f <- function(){
return(-1000*quantile(rnorm(250,0,0.2),0.95)
}
x = replicate(1e5,f())
There are your desired numbers.
Some general coding principles: firstly, you don't need to convert to
time
series: quantile immediately undoes that so you've just wasted the time
doing the coercions each direction. Secondly, the quantiles of c*X are
exactly c times the quantiles of X so you can cut down on the
multiplications needed by doing the -1000 multiplication after
quantilization.
Specific to R: don't use loops unless entirely necessary. (And it's
hardly
ever necessary) -- replicate is great for repeated operations, apply is
great for looping "over" thins.
More broadly, what do you intend to do with the v95 values? There are
probably much more efficient ways to get the desired numbers or even
closed
form results. I believe this idea is widely studied as VaR in finance.
Feel free to write back if we can be of more help.
Hope this helps,
Michael Weylandt
On Fri, Aug 5, 2011 at 11:35 AM, UnitRoot <akhussanov at gmail.com> wrote:
Hi, Can someone help me out to create a (for?) loop for the following procedure: x=rnorm(250,0,0.02) library(timeSeries) x=timeSeries(x) P=1000 loss=-P*x loss v95=quantile(loss,0.95) v95 I would like to generate 100 000 v95 values. Also, can anybody name a source where I could read up basic programming in R? Thank you. -- View this message in context: http://r.789695.n4.nabble.com/Loop-noob-question-tp3721500p3721500.html Sent from the R help mailing list archive at Nabble.com.
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
? ? ? ?[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
-- "Men by nature long to get on to the ultimate truths, and will often be impatient with elementary studies or fight shy of them. If it were possible to reach the ultimate truths without the elementary studies usually prefixed to them, these would not be preparatory studies but superfluous diversions." -- Maimonides (1135-1204) Bert Gunter Genentech Nonclinical Biostatistics