Rolling through fixed-length time windows
On Mon, Nov 7, 2011 at 8:50 AM, Matthew Clegg <matthewcleggphd at gmail.com> wrote:
On Fri, Nov 4, 2011 at 9:24 AM, Gabor Grothendieck <ggrothendieck at gmail.com> wrote:
On Fri, Nov 4, 2011 at 9:09 AM, Matthew Clegg <matthewcleggphd at gmail.com> wrote:
Hello R-Sig-Finance members: I was wondering if anyone has contributed functions that are similar to the zoo roll* functions but which operate on fixed-length time windows? ?For example, suppose I have a zoo-based object consisting of the daily closing prices of a stock, and I wish to know for each date, what was the volatility over the succeeding 30 calendar days? Probably many people would settle for something like: ?rollapply (log(lag(P))-log(P), 21, sd, align="left") * sqrt(252) (where P is the price series). ?However, this is an approximation. Not all periods of 30 calendar days include precisely 21 trading days. This seems like an obvious enough question that I would think that it has been asked (and answered) many times before, but I could not find a reference to the recommended solution. If no one has tackled this problem before, I might try to put together a small library of functions that are like roll* but which operate on fixed time windows. ?I am including an example of one such function below. Matthew Clegg [snip]
Here is a one liner (two if you count making the result into a zoo object):
z <- zoo(1:25) zz <- sapply(seq_along(z), function(i) sum(z[time(z) <= time(z)[i] & time(z) > time(z)[i] - 3])) zoo(zz, time(z))
?1 ?2 ?3 ?4 ?5 ?6 ?7 ?8 ?9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 ?1 ?3 ?6 ?9 12 15 18 21 24 27 30 33 36 39 42 45 48 51 54 57 60 63 66 69 72 -- Statistics & Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com
Aha!? That's an elegant solution and another great illustration of the power of vector processing in R. I found that after tweaking my code, I could achieve a significant improvement in running time over this sapply()-based one liner.? The following table compares the running times for various lengths of the underlying zoo vector:
The rollapply slowdown was reported and fixed in the development version of zoo already. It only affected recent versions of zoo since rollapply was rewritten to add certain features. See: http://r.789695.n4.nabble.com/zoo-performance-regression-noticed-1-6-5-is-faster-tt3990753.html#a3993387 Certainly zoo indexing can be expensive and in those cases that do involve indexing in an inner loop, replacing zoo object z with zc <- coredata(z) and tt <- time(z) speeds things up. Typically that covers fewer computations than you might think because most R code takes the whole object approach.
Statistics & Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com