Excessive data needed for volatility{TTR} calculation? - R-SIG-Finance

James

Fri, May 27, 2011 4:52 PM #

Hi,

I have been using the volatility function from the TTR package and I
noticed something that I thought was a bit unusual. I expected that I
should be able to calculate the default 10-day volatility using the
close estimator starting with 10 or maybe 11 days of data.  That's not
what I found.  It appears that 18 days of data is necessary to
calculate a 10-day volatility.  For example:

[1] "SPY"

Error in `[.xts`(x, beg:(n + beg - 1)) : subscript out of bounds

Error in `[.xts`(x, beg:(n + beg - 1)) : subscript out of bounds

[,1]
2011-05-03         NA
2011-05-04         NA
2011-05-05         NA
- edited for brevity -
2011-05-23         NA
2011-05-24         NA
2011-05-25         NA
2011-05-26 0.09481466

Stranger still (at least to me), it appears that 38 days worth of data
is necessary to start calculating a 20-day volatility.

Error in `[.xts`(x, beg:(n + beg - 1)) : subscript out of bounds

[,1]
2011-04-04        NA
2011-04-05        NA
2011-04-06        NA
 - edited for brevity -
2011-05-23        NA
2011-05-24        NA
2011-05-25        NA
2011-05-26 0.1088309

58 days of data is necessary for a 30-day volatility calculation.

why so much additional data is needed to calculate the volatility.
Does anybody have an idea of why so much additional data is necessary?
 Thanks.

James

R version 2.13.0 (2011-04-13)
Copyright (C) 2011 The R Foundation for Statistical Computing
ISBN 3-900051-07-0
Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)

James

Fri, May 27, 2011 7:33 PM #

Hi again,

I've been trying to figure out the problem and I believe there is a
problem with the vectorization in volatility, which results in the
volatility calculations for the close to close method being
inaccurate.  I believe the issue is with this part of line 14.

runSum((r - rBar)^2, n - 1)

The first 9 r all have to be differenced against the same rBar, not a
running sum of rBars.  I believe a better way to accomplish this would
be:

s <- sqrt(N) * runSD(r, (n -1))

function (OHLC, n = 10, calc = "close", N = 260, ...)
{
    OHLC <- try.xts(OHLC, error = as.matrix)
    calc <- match.arg(calc, c("close", "garman.klass", "parkinson",
        "rogers.satchell", "gk.yz", "yang.zhang"))
    if (calc == "close") {
        if (NCOL(OHLC) == 1) {
            r <- ROC(OHLC[, 1], 1, ...)
        }
        else {
            r <- ROC(OHLC[, 4], 1, ...)
        }
        rBar <- runSum(r, n - 1)/(n - 1)
        s <- sqrt(N/(n - 2) * runSum((r - rBar)^2, n - 1))       # line 14
    }

Please let me know if this makes sense to anyone else, or if I'm
mistaken.  Thanks.

James

On Fri, May 27, 2011 at 6:52 PM, J Toll <jctoll at gmail.com> wrote:

Hi,

I have been using the volatility function from the TTR package and I
noticed something that I thought was a bit unusual. I expected that I
should be able to calculate the default 10-day volatility using the
close estimator starting with 10 or maybe 11 days of data. ?That's not
what I found. ?It appears that 18 days of data is necessary to
calculate a 10-day volatility. ?For example:

getSymbols("SPY")

[1] "SPY"

volatility(tail(SPY, 10), n = 10, calc = "close", N = 260)

Error in `[.xts`(x, beg:(n + beg - 1)) : subscript out of bounds

volatility(tail(SPY, 11), n = 10, calc = "close", N = 260)

Error in `[.xts`(x, beg:(n + beg - 1)) : subscript out of bounds

volatility(tail(SPY, 18), n = 10, calc = "close", N = 260)

? ? ? ? ? ? ? ? [,1]
2011-05-03 ? ? ? ? NA
2011-05-04 ? ? ? ? NA
2011-05-05 ? ? ? ? NA
- edited for brevity -
2011-05-23 ? ? ? ? NA
2011-05-24 ? ? ? ? NA
2011-05-25 ? ? ? ? NA
2011-05-26 0.09481466

Stranger still (at least to me), it appears that 38 days worth of data
is necessary to start calculating a 20-day volatility.

volatility(tail(SPY, 37), n = 20, calc = "close", N = 260)

Error in `[.xts`(x, beg:(n + beg - 1)) : subscript out of bounds

volatility(tail(SPY, 38), n = 20, calc = "close", N = 260)

? ? ? ? ? ? ? ?[,1]
2011-04-04 ? ? ? ?NA
2011-04-05 ? ? ? ?NA
2011-04-06 ? ? ? ?NA
?- edited for brevity -
2011-05-23 ? ? ? ?NA
2011-05-24 ? ? ? ?NA
2011-05-25 ? ? ? ?NA
2011-05-26 0.1088309

58 days of data is necessary for a 30-day volatility calculation.
From looking at the code for the volatility function, I'm not seeing
why so much additional data is needed to calculate the volatility.
Does anybody have an idea of why so much additional data is necessary?
?Thanks.

James

R version 2.13.0 (2011-04-13)
Copyright (C) 2011 The R Foundation for Statistical Computing
ISBN 3-900051-07-0
Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)

Joshua Ulrich

Fri, May 27, 2011 8:39 PM #

Hi James,

On Fri, May 27, 2011 at 9:33 PM, J Toll <jctoll at gmail.com> wrote:

Thanks for digging into this.  I've recently received one or two
emails about this off-list, but have not had time to look into the
issue.

I think your solution will work, but using 'n' instead of 'n-1'.  The
code below shows the same results using your solution and a formula
similar to the one found here (which I mis-interpreted when I
originally wrote the function):
http://web.archive.org/web/20081224134043/http://www.sitmo.com/eq/172

set.seed(21)
N <- 260
n <- 100
r <- rnorm(n)/100
last(sqrt(N) * runSD(r, n))
sqrt(N/(n-1)*sum((r-mean(r))^2))

Thanks!
--
Joshua Ulrich  |  FOSS Trading: www.fosstrading.com

On Fri, May 27, 2011 at 6:52 PM, J Toll <jctoll at gmail.com> wrote:

Hi,

I have been using the volatility function from the TTR package and I
noticed something that I thought was a bit unusual. I expected that I
should be able to calculate the default 10-day volatility using the
close estimator starting with 10 or maybe 11 days of data. ?That's not
what I found. ?It appears that 18 days of data is necessary to
calculate a 10-day volatility. ?For example:

getSymbols("SPY")

[1] "SPY"

volatility(tail(SPY, 10), n = 10, calc = "close", N = 260)

Error in `[.xts`(x, beg:(n + beg - 1)) : subscript out of bounds

volatility(tail(SPY, 11), n = 10, calc = "close", N = 260)

Error in `[.xts`(x, beg:(n + beg - 1)) : subscript out of bounds

volatility(tail(SPY, 18), n = 10, calc = "close", N = 260)

? ? ? ? ? ? ? ? [,1]
2011-05-03 ? ? ? ? NA
2011-05-04 ? ? ? ? NA
2011-05-05 ? ? ? ? NA
- edited for brevity -
2011-05-23 ? ? ? ? NA
2011-05-24 ? ? ? ? NA
2011-05-25 ? ? ? ? NA
2011-05-26 0.09481466

Stranger still (at least to me), it appears that 38 days worth of data
is necessary to start calculating a 20-day volatility.

volatility(tail(SPY, 37), n = 20, calc = "close", N = 260)

Error in `[.xts`(x, beg:(n + beg - 1)) : subscript out of bounds

volatility(tail(SPY, 38), n = 20, calc = "close", N = 260)

? ? ? ? ? ? ? ?[,1]
2011-04-04 ? ? ? ?NA
2011-04-05 ? ? ? ?NA
2011-04-06 ? ? ? ?NA
?- edited for brevity -
2011-05-23 ? ? ? ?NA
2011-05-24 ? ? ? ?NA
2011-05-25 ? ? ? ?NA
2011-05-26 0.1088309

58 days of data is necessary for a 30-day volatility calculation.
From looking at the code for the volatility function, I'm not seeing
why so much additional data is needed to calculate the volatility.
Does anybody have an idea of why so much additional data is necessary?
?Thanks.

James

R version 2.13.0 (2011-04-13)
Copyright (C) 2011 The R Foundation for Statistical Computing
ISBN 3-900051-07-0
Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)

_______________________________________________
R-SIG-Finance at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-finance
-- Subscriber-posting only. If you want to post, subscribe first.
-- Also note that this is not the r-help list where general R questions should go.

James

Fri, May 27, 2011 9:25 PM #

On Fri, May 27, 2011 at 10:39 PM, Joshua Ulrich <josh.m.ulrich at gmail.com> wrote:

Hi Joshua,

Thanks for replying and confirming my suspicions. However, I'm curious
why you would use 'n' rather than 'n-1'.  My thinking is that a 10-day
volatility (n = 10) is calculated as the annualized standard deviation
of 9 (n - 1) price returns (i.e. ln(p1/p0), ROC()).  The sample
standard deviation of 9 price returns would be the sum of the squared
deviations divided by 9 - 1, or n - 2.  Therefore, I believe your line

sqrt(N / (n - 1) * sum((r - mean(r)) ^ 2))

should actually be

sqrt(N / (n - 2) * sum((r - mean(r)) ^ 2))

I've been double-checking my work and went ahead and calculated 10 and
20-day vols by hand and I'm pretty sure

s <- sqrt(N) * runSD(r, (n - 1))

is correct, unless your defining 10-day volatility as 11 days of data
and 10 price returns.  Please let me know otherwise. Thanks.

James

Joshua Ulrich

Sat, May 28, 2011 5:13 AM #

Hi James,

On Fri, May 27, 2011 at 11:25 PM, J Toll <jctoll at gmail.com> wrote:

Actually, because the first return in the moving window would always
be NA, it should be:
sqrt(N/(n-2)*sum((r[-1]-mean(r[-1]))^2))

which yields the same result as:
last(sqrt(N) * runSD(r, n-1))

After getting some sleep, it's clear that your initial solution (n-1)
is correct.

Your patch will be on R-forge shortly.  Many thanks again!

Best,
--
Joshua Ulrich  |  FOSS Trading: www.fosstrading.com

James

Sat, May 28, 2011 7:44 AM #

Joshua,

On Sat, May 28, 2011 at 7:13 AM, Joshua Ulrich <josh.m.ulrich at gmail.com> wrote:

I've been trying both lines of code and unfortunately I'm not getting
the same results.  The first line seems to only work properly for me
in those instances when NCOL(OHLC) = n.  For the more common situation
where NCOL(OHLC) > n, you would want a rolling window of vol
calculations.  I'm still thinking that the code should be:

s <- sqrt(N) * runSD(r, (n - 1))

As a frame of reference, I believe the output should be:

[,1]
2011-05-20 0.1206382
2011-05-23 0.1181380
2011-05-24 0.1095445
2011-05-25 0.1069024
2011-05-26 0.1068434
2011-05-27 0.1038008

I've manually calculated the value for 2011-05-27 using a spreadsheet
to confirm the value. I believe the other values to be correct also.

You may want to hold off on a patch in the short term.  I still think
there might be an error in there.  I'm sorry to be such a nuisance
about this, but thanks so much for your help.

James

Joshua Ulrich

Sat, May 28, 2011 11:16 AM #

Hi James,

On Sat, May 28, 2011 at 9:44 AM, J Toll <jctoll at gmail.com> wrote:

<snip>

My last email wasn't very clear; I apologize.

I still agree with your suggestion and plan to use it as a patch.  The
first line in my prior email was to illustrate (and convince myself)
that your solution matched the formula here:
http://web.archive.org/web/20081224134043/http://www.sitmo.com/eq/172

And it only matches when NROW(OHLC) == n because your solution
operates on a rolling window and my first line operates on everything.
 Try something like this:

n <- 5
R <- cumprod(1+r)
FUN <- function(x) {
  r <- ROC(x); n <- NROW(x)
  sqrt(252/(n-2)*sum((r-mean(r, na.rm=TRUE))^2, na.rm=TRUE))
}
head(sqrt(N) * runSD(ROC(R), n-1),15)
head(rollapply(R, n, FUN, align="right", fill=NA),15)
n <- 10
head(sqrt(N) * runSD(ROC(R), n-1),15)
head(rollapply(R, n, FUN, align="right", fill=NA),15)

Sorry for the confusion.

Best,
--
Joshua Ulrich  |  FOSS Trading: www.fosstrading.com