Hi all, I am currently implementing the Engle & Rangel (2008) Spline GARCH model. I use the nlminb optimizer which does not provide a hessian unfortunately to get the standard errors of the coefficients. I can get around this using the 'hessian' function in numDeriv, but usually get NaN values for the omega parameter. Can anybody recommend additional optimizers that directly return a hessian? How sensitive are the coefficients to the initial starting values? Thanks in advance!
Spline GARCH
6 messages · Paul Gilbert, Alexios Ghalanos, Bastian Offermann
On 02/07/2014 08:19 AM, Bastian Offermann wrote:
Hi all, I am currently implementing the Engle & Rangel (2008) Spline GARCH model. I use the nlminb optimizer which does not provide a hessian unfortunately to get the standard errors of the coefficients. I can get around this using the 'hessian' function in numDeriv, but usually get NaN values for the omega parameter.
Do you know why this happens, or can you provide a simple example? An NaN value from hessian() is often because the function fails to evaluate in a small neighbourhood of the point where it is being calculated, that is, at your parameter estimate. Are you on the boundary of the feasible region?
Can anybody recommend additional optimizers that directly return a hessian?
A hessian returned by an optimizer is usually one that is built up by some approximation during the optimization process. One of the original purposes of hessian() was to try to do something that is usually better than that, specifically because you want a good approximation if you are going to use it to calculate standard errors. (And, of course, you want the conditions to hold for the hessian to be an approximation of the variance.) Just because an optimizer returns something for the hessian, it it not clear that you would want to use it to calculate standard errors. The purpose of the hessian built up by an optimizer is to speed the optimization, not necessarily to provide a good approximation to the hessian. In the case where hessian() is returning NaNs I would be concerned that anything returned by an optimizer could be simply bogus.
How sensitive are the coefficients to the initial starting values?
This depends on a number of things, the optimizer you use being one of them. Most optimizers have some mechanism to specify something different from the default for the stopping criteria and you can, for a problem without local optimum issues (e.g. convex level sets), reduce sensitivity to the starting value by tightening the stopping criteria. The more serious problem is when you have local optimum issues. Then you will get false convergence and thus extreme sensitivity to starting values. Even for a parameter space that is generally good, there are often parameter values for which the optimization is a bit sensitive. And, of course, all this also depends on your dataset. Generally, the sensitivity will increase with short datasets. The previous paragraph is about the coefficient estimate. At the same coefficient estimate hessian() will return the same thing, but a hessian built up by an optimizer will depend on the path, and generally needs a fairly large number of final steps in the vicinity of the optimum to give a good approximation. Thus, somewhat counter intuitively, if you do an optimization starting with values for the coefficients that are very close to the optimum you will get quick convergence but often a bad hessian approximation from the optimizer. Paul
Thanks in advance!
_______________________________________________ R-SIG-Finance at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-finance -- Subscriber-posting only. If you want to post, subscribe first. -- Also note that this is not the r-help list where general R questions should go.
You might also want to consider the issue of scaling when it comes to the intercepts (mean or variance), as well as your positivity and stationarity conditions/constraints. -Alexios
On 7 Feb 2014, at 15:28, Paul Gilbert <pgilbert902 at gmail.com> wrote:
On 02/07/2014 08:19 AM, Bastian Offermann wrote: Hi all, I am currently implementing the Engle & Rangel (2008) Spline GARCH model. I use the nlminb optimizer which does not provide a hessian unfortunately to get the standard errors of the coefficients. I can get around this using the 'hessian' function in numDeriv, but usually get NaN values for the omega parameter.
Do you know why this happens, or can you provide a simple example? An NaN value from hessian() is often because the function fails to evaluate in a small neighbourhood of the point where it is being calculated, that is, at your parameter estimate. Are you on the boundary of the feasible region?
Can anybody recommend additional optimizers that directly return a hessian?
A hessian returned by an optimizer is usually one that is built up by some approximation during the optimization process. One of the original purposes of hessian() was to try to do something that is usually better than that, specifically because you want a good approximation if you are going to use it to calculate standard errors. (And, of course, you want the conditions to hold for the hessian to be an approximation of the variance.) Just because an optimizer returns something for the hessian, it it not clear that you would want to use it to calculate standard errors. The purpose of the hessian built up by an optimizer is to speed the optimization, not necessarily to provide a good approximation to the hessian. In the case where hessian() is returning NaNs I would be concerned that anything returned by an optimizer could be simply bogus.
How sensitive are the coefficients to the initial starting values?
This depends on a number of things, the optimizer you use being one of them. Most optimizers have some mechanism to specify something different from the default for the stopping criteria and you can, for a problem without local optimum issues (e.g. convex level sets), reduce sensitivity to the starting value by tightening the stopping criteria. The more serious problem is when you have local optimum issues. Then you will get false convergence and thus extreme sensitivity to starting values. Even for a parameter space that is generally good, there are often parameter values for which the optimization is a bit sensitive. And, of course, all this also depends on your dataset. Generally, the sensitivity will increase with short datasets. The previous paragraph is about the coefficient estimate. At the same coefficient estimate hessian() will return the same thing, but a hessian built up by an optimizer will depend on the path, and generally needs a fairly large number of final steps in the vicinity of the optimum to give a good approximation. Thus, somewhat counter intuitively, if you do an optimization starting with values for the coefficients that are very close to the optimum you will get quick convergence but often a bad hessian approximation from the optimizer. Paul
Thanks in advance!
_______________________________________________ R-SIG-Finance at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-finance -- Subscriber-posting only. If you want to post, subscribe first. -- Also note that this is not the r-help list where general R questions should go.
_______________________________________________ R-SIG-Finance at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-finance -- Subscriber-posting only. If you want to post, subscribe first. -- Also note that this is not the r-help list where general R questions should go.
Thanks a lot, Paul. Here is some reproducible code. Do you know of any
good reference how to build up these models from scratch, e.g.
calculating standard errors of estimates coming from a numerical
optimizer, information criteria etc. ?
library('BB')
library('alabama')
library('nloptr')
### Spline GARCH ###
r = rnorm(13551, 0.0004, 0.016) # pseudo returns, 13551 is the number
of obs. of real market data I apply the model to, inputs arbitrary
### Specifying time trend
k = 3 # number of
knots
bounds = floor(1:k * 13551/k) # partition of time horizon
bounds = c(0, bounds[1:(k-1)])
time.lin = 0:13550 # linear time trend
time.nonlin <- matrix(rep(time.lin, k), length(time.lin), k) # quadratic
time trend
for(i in 1:k) { #
time.nonlin[,i] <- time.nonlin[,i] - bounds[i] #
time.nonlin[which(time.nonlin[,i] < 0), i] <- 0 #
time.nonlin[, i] <- time.nonlin[, i]^2 #
} #
time.trend = cbind(time.lin, time.nonlin)
for(i in 1:dim(time.trend)[2]) time.trend[,i] <-
time.trend[,i]/time.trend[dim(time.trend)[1], i] # normalizing
between 0 and 1
head(time.trend)
tail(time.trend)
### Spline function
splgarch <- function(para) {
mu <- para[1]
omega <- para[2]
alpha <- para[3]
beta <- para[4]
cc <- para[5]
w <- para[6:(k+6)]
Tau <- cc * exp( apply(t(diag(w)%*%t(time.trend)), 1, sum) )
e2 <- (r-mu)^2
e2t <- omega + alpha * c(mean(e2), e2[-length(r)]) / Tau^2
s2 <- filter(e2t, beta, "recursive", init = mean(e2))
sig2 <- s2 * Tau
0.5*sum( log(2*pi) + log(sig2) + e2/sig2) }
### Spline parameter initialization
mu <- mean(r)
small <- 1e-6
alpha <- 0.1
beta <- 0.8
omega <- (1-alpha-beta)
para <- c(mu, omega, alpha, beta, 1, rep(small, length(4:(k+4))))
lo <- c(-10*abs(mu), small, small, small, rep(-10, length(3:(k+4))))
hi <- c(10*abs(mu), 100*abs(mu), 1-small, 1-small, rep(10, length(3:(k+4))))
### Spline optimization
fit <- nlminb(start = para, objective = splgarch, lower = lo, upper =
hi, control = list(x.tol = 1e-8,trace=0))
names(fit$par) <- c('alpha', 'beta', 'c', paste('w', sep = '', 0:k))
round(fit$par, 6)
fit.hessian = hessian(splgarch, fit$par, method="complex")
fit.hessian
R version 2.15.2 (2012-10-26)
Platform: x86_64-w64-mingw32/x64 (64-bit)
locale:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United
States.1252
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C
[5] LC_TIME=English_United States.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] nloptr_0.9.6 alabama_2011.9-1 numDeriv_2012.9-1 BB_2014.1-1
Am 2/7/2014 10:28 AM, schrieb Paul Gilbert:
On 02/07/2014 08:19 AM, Bastian Offermann wrote:
Hi all, I am currently implementing the Engle & Rangel (2008) Spline GARCH model. I use the nlminb optimizer which does not provide a hessian unfortunately to get the standard errors of the coefficients. I can get around this using the 'hessian' function in numDeriv, but usually get NaN values for the omega parameter.
Do you know why this happens, or can you provide a simple example? An NaN value from hessian() is often because the function fails to evaluate in a small neighbourhood of the point where it is being calculated, that is, at your parameter estimate. Are you on the boundary of the feasible region?
Can anybody recommend additional optimizers that directly return a hessian?
A hessian returned by an optimizer is usually one that is built up by some approximation during the optimization process. One of the original purposes of hessian() was to try to do something that is usually better than that, specifically because you want a good approximation if you are going to use it to calculate standard errors. (And, of course, you want the conditions to hold for the hessian to be an approximation of the variance.) Just because an optimizer returns something for the hessian, it it not clear that you would want to use it to calculate standard errors. The purpose of the hessian built up by an optimizer is to speed the optimization, not necessarily to provide a good approximation to the hessian. In the case where hessian() is returning NaNs I would be concerned that anything returned by an optimizer could be simply bogus.
How sensitive are the coefficients to the initial starting values?
This depends on a number of things, the optimizer you use being one of them. Most optimizers have some mechanism to specify something different from the default for the stopping criteria and you can, for a problem without local optimum issues (e.g. convex level sets), reduce sensitivity to the starting value by tightening the stopping criteria. The more serious problem is when you have local optimum issues. Then you will get false convergence and thus extreme sensitivity to starting values. Even for a parameter space that is generally good, there are often parameter values for which the optimization is a bit sensitive. And, of course, all this also depends on your dataset. Generally, the sensitivity will increase with short datasets. The previous paragraph is about the coefficient estimate. At the same coefficient estimate hessian() will return the same thing, but a hessian built up by an optimizer will depend on the path, and generally needs a fairly large number of final steps in the vicinity of the optimum to give a good approximation. Thus, somewhat counter intuitively, if you do an optimization starting with values for the coefficients that are very close to the optimum you will get quick convergence but often a bad hessian approximation from the optimizer. Paul
Thanks in advance!
_______________________________________________ R-SIG-Finance at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-finance -- Subscriber-posting only. If you want to post, subscribe first. -- Also note that this is not the r-help list where general R questions should go.
There was an error in the previous code, this should be correct...
library('BB')
library('alabama')
library('nloptr')
r = rnorm(13551, 0.0004, 0.016) # pseudo returns
### Spline GARCH ###
### Specifying time trend
k = 3 # number of
knots
bounds = floor(1:k * 13551/k) # partition of time horizon
bounds = c(0, bounds[1:(k-1)])
time.lin = 0:13550 # linear time trend
time.nonlin <- matrix(rep(time.lin, k), length(time.lin), k) # quadratic
time trend
for(i in 1:k) { #
time.nonlin[,i] <- time.nonlin[,i] - bounds[i] #
time.nonlin[which(time.nonlin[,i] < 0), i] <- 0 #
time.nonlin[, i] <- time.nonlin[, i]^2 #
} #
time.trend = cbind(time.lin, time.nonlin)
for(i in 1:dim(time.trend)[2]) time.trend[,i] <-
time.trend[,i]/time.trend[dim(time.trend)[1], i] # normalizing between
0 and 1
head(time.trend)
tail(time.trend)
### Spline function
splgarch <- function(para) {
mu <- para[1]
omega <- para[2]
alpha <- para[3]
beta <- para[4]
cc <- para[5]
w <- para[6:(k+6)]
Tau <- cc * exp( apply(t(diag(w)%*%t(time.trend)), 1, sum) )
e2 <- (r-mu)^2
e2t <- omega + alpha * c(mean(e2), e2[-length(r)]) / Tau^2
s2 <- filter(e2t, beta, "recursive", init = mean(e2))
sig2 <- s2 * Tau
0.5*sum( log(2*pi) + log(sig2) + e2/sig2) }
### Spline parameter initialization
mu <- mean(r)
small <- 1e-6
alpha <- 0.1
beta <- 0.8
omega <- (1-alpha-beta)
para <- c(mu, omega, alpha, beta, 1, rep(small, length(4:(k+4))))
lo <- c(-10*abs(mu), small, small, small, rep(-10, length(3:(k+4))))
hi <- c(10*abs(mu), 100*abs(mu), 1-small, 1-small, rep(10, length(3:(k+4))))
### Spline optimization
fit <- nlminb(start = para, objective = splgarch, lower = lo, upper =
hi, hessian = TRUE, control = list(x.tol = 1e-8,trace=0))
names(fit$par) <- c('mu', 'omega', 'alpha', 'beta', 'c', paste('w', sep
= '', 0:k))
round(fit$par, 6)
fit.hessian = hessian(splgarch, fit$par, method="complex")
fit.hessian
Am 2/7/2014 10:28 AM, schrieb Paul Gilbert:
On 02/07/2014 08:19 AM, Bastian Offermann wrote:
Hi all, I am currently implementing the Engle & Rangel (2008) Spline GARCH model. I use the nlminb optimizer which does not provide a hessian unfortunately to get the standard errors of the coefficients. I can get around this using the 'hessian' function in numDeriv, but usually get NaN values for the omega parameter.
Do you know why this happens, or can you provide a simple example? An NaN value from hessian() is often because the function fails to evaluate in a small neighbourhood of the point where it is being calculated, that is, at your parameter estimate. Are you on the boundary of the feasible region?
Can anybody recommend additional optimizers that directly return a hessian?
A hessian returned by an optimizer is usually one that is built up by some approximation during the optimization process. One of the original purposes of hessian() was to try to do something that is usually better than that, specifically because you want a good approximation if you are going to use it to calculate standard errors. (And, of course, you want the conditions to hold for the hessian to be an approximation of the variance.) Just because an optimizer returns something for the hessian, it it not clear that you would want to use it to calculate standard errors. The purpose of the hessian built up by an optimizer is to speed the optimization, not necessarily to provide a good approximation to the hessian. In the case where hessian() is returning NaNs I would be concerned that anything returned by an optimizer could be simply bogus.
How sensitive are the coefficients to the initial starting values?
This depends on a number of things, the optimizer you use being one of them. Most optimizers have some mechanism to specify something different from the default for the stopping criteria and you can, for a problem without local optimum issues (e.g. convex level sets), reduce sensitivity to the starting value by tightening the stopping criteria. The more serious problem is when you have local optimum issues. Then you will get false convergence and thus extreme sensitivity to starting values. Even for a parameter space that is generally good, there are often parameter values for which the optimization is a bit sensitive. And, of course, all this also depends on your dataset. Generally, the sensitivity will increase with short datasets. The previous paragraph is about the coefficient estimate. At the same coefficient estimate hessian() will return the same thing, but a hessian built up by an optimizer will depend on the path, and generally needs a fairly large number of final steps in the vicinity of the optimum to give a good approximation. Thus, somewhat counter intuitively, if you do an optimization starting with values for the coefficients that are very close to the optimum you will get quick convergence but often a bad hessian approximation from the optimizer. Paul
Thanks in advance!
_______________________________________________ R-SIG-Finance at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-finance -- Subscriber-posting only. If you want to post, subscribe first. -- Also note that this is not the r-help list where general R questions should go.
3 days later
Sorry for the delay, I'm travelling and not always on my email. (See below.)
On 02/07/2014 02:40 PM, Bastian Offermann wrote:
There was an error in the previous code, this should be correct...
library('BB')
library('alabama')
library('nloptr')
r = rnorm(13551, 0.0004, 0.016) # pseudo returns
### Spline GARCH ###
### Specifying time trend
k = 3 # number of
knots
bounds = floor(1:k * 13551/k) # partition of time horizon
bounds = c(0, bounds[1:(k-1)])
time.lin = 0:13550 # linear time trend
time.nonlin <- matrix(rep(time.lin, k), length(time.lin), k) # quadratic
time trend
for(i in 1:k) { #
time.nonlin[,i] <- time.nonlin[,i] - bounds[i] #
time.nonlin[which(time.nonlin[,i] < 0), i] <- 0 #
time.nonlin[, i] <- time.nonlin[, i]^2 #
} #
time.trend = cbind(time.lin, time.nonlin)
for(i in 1:dim(time.trend)[2]) time.trend[,i] <-
time.trend[,i]/time.trend[dim(time.trend)[1], i] # normalizing between
0 and 1
head(time.trend)
tail(time.trend)
### Spline function
splgarch <- function(para) {
mu <- para[1]
omega <- para[2]
alpha <- para[3]
beta <- para[4]
cc <- para[5]
w <- para[6:(k+6)]
Tau <- cc * exp( apply(t(diag(w)%*%t(time.trend)), 1, sum) )
e2 <- (r-mu)^2
e2t <- omega + alpha * c(mean(e2), e2[-length(r)]) / Tau^2
s2 <- filter(e2t, beta, "recursive", init = mean(e2))
sig2 <- s2 * Tau
0.5*sum( log(2*pi) + log(sig2) + e2/sig2) }
### Spline parameter initialization
mu <- mean(r)
small <- 1e-6
alpha <- 0.1
beta <- 0.8
omega <- (1-alpha-beta)
para <- c(mu, omega, alpha, beta, 1, rep(small, length(4:(k+4))))
lo <- c(-10*abs(mu), small, small, small, rep(-10, length(3:(k+4))))
hi <- c(10*abs(mu), 100*abs(mu), 1-small, 1-small, rep(10,
length(3:(k+4))))
### Spline optimization
fit <- nlminb(start = para, objective = splgarch, lower = lo, upper =
hi, hessian = TRUE, control = list(x.tol = 1e-8,trace=0))
names(fit$par) <- c('mu', 'omega', 'alpha', 'beta', 'c', paste('w', sep
= '', 0:k))
round(fit$par, 6)
fit.hessian = hessian(splgarch, fit$par, method="complex")
This gives (at least with my working copy) Error in filter(e2t, beta, "recursive", init = mean(e2)) : invalid input In addition: Warning messages: 1: In filter(e2t, beta, "recursive", init = mean(e2)) : imaginary parts discarded in coercion 2: In filter(e2t, beta, "recursive", init = mean(e2)) : imaginary parts discarded in coercion Error in grad.default(func = fn, x = x, method = "complex", method.args = list(eps = .Machine$double.eps), : function does not accept complex argument as required by method 'complex'.
fit.hessian
Error: object 'fit.hessian' not found
The "complex" method is truly powerful in cases where it can be used,
but the requirement on the function (complex analytic, even though you
may be only interested in it as a real valued function with real
arguments) is non-trivial. I clearly need to beef up the warnings in the
documentation. They are there, but perhaps you did not follow the
suggestion to see ?grad, and even there I may be assuming people might
actual look at the reference. I will work on the documentation. Your
function does not even handle a complex argument, so it is definitely
not suitable for this method.
Try
fit.hessian = hessian(splgarch, fit$par, method="Richardson")
fit.hessian
[,1] [,2] [,3] [,4] [,5]
[1,] 54159760.8324 -391558.2 3.018716e+05 -4467.786 1.101539e+02
[2,] -391558.2296 90404150206.1 2.220223e+09 224552735.372 2.183666e+08
[3,] 301871.5678 2220223211.2 6.070358e+07 5516188.460 5.352351e+06
[4,] -4467.7858 224552735.4 5.516188e+06 558096.767 5.425405e+05
[5,] 110.1539 218366571.2 5.352351e+06 542540.452 5.277932e+05
[6,] -1624.6496 12373678.7 2.983966e+05 30769.665 2.989512e+04
[7,] -2190.8683 8249523.2 1.948851e+05 20514.137 1.993109e+04
[8,] -2569.8729 5499872.3 1.264006e+05 13676.544 1.328788e+04
[9,] -2991.0444 2750144.2 6.090795e+04 6838.777 6.644464e+03
[,6] [,7] [,8] [,9]
[1,] -1624.6496 -2190.8683 -2569.8729 -2991.0444
[2,] 12373678.7285 8249523.2423 5499872.3398 2750144.2107
[3,] 298396.5695 194885.1344 126400.6003 60907.9503
[4,] 30769.6650 20514.1374 13676.5444 6838.7768
[5,] 29895.1161 19931.0875 13287.8823 6644.4640
[6,] 2258.4950 1693.8479 1254.6936 690.5527
[7,] 1693.8479 1355.1300 1070.7734 636.5426
[8,] 1254.6936 1070.7734 903.6127 584.6161
[9,] 690.5527 636.5426 584.6161 453.8935
I think this may be what you are looking for, but beware that your
alpha parameter is on the boundary. The fact that hessian() works on the
boundary suggests that the function actually evaluates beyond the
boundary without returning an error, but you need to be sure the
function is not doing something strange outside the boundary. I'm not
familiar with how to interpret the hessian as an approximation of the
variance at a boundary point. Do you think of the se as symmetric in
this case?
Paul
Am 2/7/2014 10:28 AM, schrieb Paul Gilbert:
On 02/07/2014 08:19 AM, Bastian Offermann wrote:
Hi all, I am currently implementing the Engle & Rangel (2008) Spline GARCH model. I use the nlminb optimizer which does not provide a hessian unfortunately to get the standard errors of the coefficients. I can get around this using the 'hessian' function in numDeriv, but usually get NaN values for the omega parameter.
Do you know why this happens, or can you provide a simple example? An NaN value from hessian() is often because the function fails to evaluate in a small neighbourhood of the point where it is being calculated, that is, at your parameter estimate. Are you on the boundary of the feasible region?
Can anybody recommend additional optimizers that directly return a hessian?
A hessian returned by an optimizer is usually one that is built up by some approximation during the optimization process. One of the original purposes of hessian() was to try to do something that is usually better than that, specifically because you want a good approximation if you are going to use it to calculate standard errors. (And, of course, you want the conditions to hold for the hessian to be an approximation of the variance.) Just because an optimizer returns something for the hessian, it it not clear that you would want to use it to calculate standard errors. The purpose of the hessian built up by an optimizer is to speed the optimization, not necessarily to provide a good approximation to the hessian. In the case where hessian() is returning NaNs I would be concerned that anything returned by an optimizer could be simply bogus.
How sensitive are the coefficients to the initial starting values?
This depends on a number of things, the optimizer you use being one of them. Most optimizers have some mechanism to specify something different from the default for the stopping criteria and you can, for a problem without local optimum issues (e.g. convex level sets), reduce sensitivity to the starting value by tightening the stopping criteria. The more serious problem is when you have local optimum issues. Then you will get false convergence and thus extreme sensitivity to starting values. Even for a parameter space that is generally good, there are often parameter values for which the optimization is a bit sensitive. And, of course, all this also depends on your dataset. Generally, the sensitivity will increase with short datasets. The previous paragraph is about the coefficient estimate. At the same coefficient estimate hessian() will return the same thing, but a hessian built up by an optimizer will depend on the path, and generally needs a fairly large number of final steps in the vicinity of the optimum to give a good approximation. Thus, somewhat counter intuitively, if you do an optimization starting with values for the coefficients that are very close to the optimum you will get quick convergence but often a bad hessian approximation from the optimizer. Paul
Thanks in advance!
_______________________________________________ R-SIG-Finance at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-finance -- Subscriber-posting only. If you want to post, subscribe first. -- Also note that this is not the r-help list where general R questions should go.