An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-sig-finance/attachments/20111104/7632502c/attachment.pl>
Rolling through fixed-length time windows
4 messages · Matthew Clegg, Gabor Grothendieck
On Fri, Nov 4, 2011 at 9:09 AM, Matthew Clegg <matthewcleggphd at gmail.com> wrote:
Hello R-Sig-Finance members:
I was wondering if anyone has contributed functions that are similar
to the zoo roll* functions but which operate on fixed-length time
windows? ?For example, suppose I have a zoo-based object consisting
of the daily closing prices of a stock, and I wish to know for each
date, what was the volatility over the succeeding 30 calendar days?
Probably many people would settle for something like:
?rollapply (log(lag(P))-log(P), 21, sd, align="left") * sqrt(252)
(where P is the price series). ?However, this is an approximation.
Not all periods of 30 calendar days include precisely 21 trading days.
This seems like an obvious enough question that I would think that it
has been asked (and answered) many times before, but I could not find
a reference to the recommended solution.
If no one has tackled this problem before, I might try to put together
a small library of functions that are like roll* but which operate
on fixed time windows. ?I am including an example of one such function
below.
Matthew Clegg
ztw_sum <- function (X, delta, align="right", partial=FALSE) {
?# Zoo Time Window Sum
?#
?# On input, X is a zoo-based numeric vector and delta is a time
difference.
?# Constructs a zoo-based numeric vector of partial sums from X. ?The
values
?# included in a partial sum are those whose associated timestamps are
?# within delta of the corresponding element from X.
?#
?# If align="right", then result[i] is a sum of those elements
?# X[j] such that
?# ? ?0 <= timestamp[i] - timestamp[j] <= delta,
?# where timestamp[i] is the timestamp (index) associated with the
?# i-th element of X. ?Conversely, if align="left", then result[i] is a
?# sum of those elements X[j] such that
?# ? ?0 <= timestamp[j] - timestamp[i] <= delta.
?#
?# Parameters:
?# X: ? ? ? ?A zoo-based numeric vector with a time-based index type.
?# delta: ? ?An object of type difftime specifying the size of
?# ? ? ? ? ? the time window.
?# align: ? ?Specifies whether the sum for a given index should
?# ? ? ? ? ? be computed using elements of lower timestamps ("right")
?# ? ? ? ? ? or higher timestamps ("left").
?# partial: ?If TRUE, then partial sums are computed for elements
?# ? ? ? ? ? at the left (respectively, right) end of the vector.
?#
?# Returns a zoo-based numeric vector of partial sums.
?#
?# Running time is O(length(X)).
?if (!inherits(X, "zoo") || !inherits(coredata(X), "numeric")) {
? ?stop ("X must be a numeric vector of type zoo");
?} else if (delta <= 0) {
? ?stop ("delta must be positive");
?} else if ((align != "left") && (align != "right")) {
? ?stop ("align must be from c('left', 'right')");
?}
?timestamp <- index(X)
?R <- zoo(NA, order.by = timestamp); # The result vector
?sum <- 0; ?# The current partial sum
?if (align == "right") {
? ?# Invariants:
? ?# ? (a) 0 < i <= j <= length(X)
? ?# ? (b) 0 <= timestamp(j) - timestamp(i) <= delta
? ?i <- 1; ?# The leftmost index in the current window
? ?for (j in 1:length(X)) {
? ? ?if (!is.na(X[j])) {
? ? ? ?sum <- sum + as.numeric(X[j]);
? ? ?}
? ? ?while (timestamp[j] - timestamp[i] > delta) {
? ? ? ?if (!is.na(X[i])) {
? ? ? ? ?sum <- sum - as.numeric(X[i]);
? ? ? ?}
? ? ? ?i <- i+1;
? ? ?}
? ? ?if ((i > 1) || partial) {
? ? ? ?R[j] <- sum;
? ? ?}
? ?}
?} else { # align == "left"
? ?# Invariants:
? ?# ? (a) 0 < j <= i <= length(X)
? ?# ? (b) 0 <= timestamp(i) - timestamp(j) <= delta
? ?i <- length(X); ?# The rightmost index in the current window
? ?for (j in length(X):1) {
? ? ?if (!is.na(X[j])) {
? ? ? ?sum <- sum + as.numeric(X[j]);
? ? ?}
? ? ?while (timestamp[i] - timestamp[j] > delta) {
? ? ? ?if (!is.na(X[i])) {
? ? ? ? ?sum <- sum - as.numeric(X[i]);
? ? ? ?}
? ? ? ?i <- i-1;
? ? ?}
? ? ?if ((i < length(X)) || partial) {
? ? ? ?R[j] <- sum;
? ? ?}
? ?}
?}
?R
}
Here is a one liner (two if you count making the result into a zoo object):
z <- zoo(1:25) zz <- sapply(seq_along(z), function(i) sum(z[time(z) <= time(z)[i] & time(z) > time(z)[i] - 3])) zoo(zz, time(z))
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 1 3 6 9 12 15 18 21 24 27 30 33 36 39 42 45 48 51 54 57 60 63 66 69 72
Statistics & Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com
3 days later
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-sig-finance/attachments/20111107/50a82b82/attachment.pl>
On Mon, Nov 7, 2011 at 8:50 AM, Matthew Clegg <matthewcleggphd at gmail.com> wrote:
On Fri, Nov 4, 2011 at 9:24 AM, Gabor Grothendieck <ggrothendieck at gmail.com> wrote:
On Fri, Nov 4, 2011 at 9:09 AM, Matthew Clegg <matthewcleggphd at gmail.com> wrote:
Hello R-Sig-Finance members: I was wondering if anyone has contributed functions that are similar to the zoo roll* functions but which operate on fixed-length time windows? ?For example, suppose I have a zoo-based object consisting of the daily closing prices of a stock, and I wish to know for each date, what was the volatility over the succeeding 30 calendar days? Probably many people would settle for something like: ?rollapply (log(lag(P))-log(P), 21, sd, align="left") * sqrt(252) (where P is the price series). ?However, this is an approximation. Not all periods of 30 calendar days include precisely 21 trading days. This seems like an obvious enough question that I would think that it has been asked (and answered) many times before, but I could not find a reference to the recommended solution. If no one has tackled this problem before, I might try to put together a small library of functions that are like roll* but which operate on fixed time windows. ?I am including an example of one such function below. Matthew Clegg [snip]
Here is a one liner (two if you count making the result into a zoo object):
z <- zoo(1:25) zz <- sapply(seq_along(z), function(i) sum(z[time(z) <= time(z)[i] & time(z) > time(z)[i] - 3])) zoo(zz, time(z))
?1 ?2 ?3 ?4 ?5 ?6 ?7 ?8 ?9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 ?1 ?3 ?6 ?9 12 15 18 21 24 27 30 33 36 39 42 45 48 51 54 57 60 63 66 69 72 -- Statistics & Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com
Aha!? That's an elegant solution and another great illustration of the power of vector processing in R. I found that after tweaking my code, I could achieve a significant improvement in running time over this sapply()-based one liner.? The following table compares the running times for various lengths of the underlying zoo vector:
The rollapply slowdown was reported and fixed in the development version of zoo already. It only affected recent versions of zoo since rollapply was rewritten to add certain features. See: http://r.789695.n4.nabble.com/zoo-performance-regression-noticed-1-6-5-is-faster-tt3990753.html#a3993387 Certainly zoo indexing can be expensive and in those cases that do involve indexing in an inner loop, replacing zoo object z with zc <- coredata(z) and tt <- time(z) speeds things up. Typically that covers fewer computations than you might think because most R code takes the whole object approach.
Statistics & Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com