On 16-07-31 12:27 PM, Steven Orzack wrote:
Ben, Many thanks for the informative reply. I will work on this in the next few days and update. Once upon a time, I knew about Householder transformations. I will try to dust off that part of my brain..... I am especially interested in the fact that ar and gls are using different algorithms. If you could expand on that, it would very helpful. I will also investigate. any information and thoughts are most welcome.
gls uses generalized least squares -- described in detail in Pinheiro and Bates 2000 -- basically, nonlinear optimization over the correlation parameter, minimizing the sums of squares of the 'whitened' residuals. For ar, see ?ar: the default solves the Yule-Walker equations (see ar.yw) ... As I hinted, if I had to do this I would (1) build my own brute-force AR solver (probably based on nlme::corARMA), (2) use try() to catch the cases where AR(2) failed and revert to AR(1)
S. On 16-07-29 06:12 PM, Steven Orzack wrote:
I am analyzing time series of a statistic (coefficient of variation or
CV) generated by a Monte Carlo simulation of population dynamics. A
given simulation might generate thousands of such time series.
For any given time series, the sequential estimates of CV are correlated
with one another (because they are based on partially overlapping sets
of the population numbers). Accordingly, I am using gls models with
autocorrelated errors to analyze each time series.
Although the stochastic process generating all sample paths is the same,
any given sample path might consist of a time series of CV for which
AR(0), AR(1), AR(2), etc. generates the smallest AIC value, as produced
by ar() (see below).
In practice, I have seen time series for which AR(0), AR(1), AR(2)
generate the smallest AIC value and so always just using a specific
order (e.g., AR(1)) does not seem appropriate. This is especially so
because I am developing methods that would apply to time series for
which the generating dynamics are not known.
For any given sample path y, the code is
#assess order of AR model using ar. use order with smallest AIC
arorder <- ar(moving_statistics$CV, order.max = 5)$order
#evaluate model with constant and time
if (arorder < 1) {
summary(int_model <- lm(CV ~ 1, data = moving_statistics))
summary(time_model <- lm(CV ~ 1 + time, data = moving_statistics))
#extract statistics for sample path y
pvalue_cv$pvalue[y] <- anova(int_model,time_model)$Pr
pvalue_cv$slope_time[y] <- time_model$coefficients[2]
} else {
summary(int_model <- gls(CV ~ 1, data = moving_statistics, correlation =
corARMA(p=arorder), method="ML"))
summary(time_model <- gls(CV ~ 1 + time, data = moving_statistics,
correlation = corARMA(p=arorder), method="ML"))
#extract statistics for sample path y
pvalue_cv$pvalue[y] <- anova(int_model,time_model)$p[2]
pvalue_cv$slope_time[y] <- time_model$coefficients[2]
}
When I run the simulation, there are some sample paths for which AR(2)
has the lowest AIC and so this is the model called, as in
summary(time_model <- gls(CV ~ 1 + time, data = moving_statistics,
correlation = corARMA(p=2), method="ML"))
This call appears to always generate this error message
Error in `coef<-.corARMA`(`*tmp*`, value = value[parMap[, i]]) :
Coefficient matrix not invertible
Here are my questions:
1. how does ar generate an AIC value for p =2 even though the
coefficient matrix is not invertible and so the model fit fails?
gls and ar are using different algorithms. Furthermore, I think ar() is assuming a constant mean whereas your gls() has a linear effect of time, so they're actually trying to fit different models.
2. Is there a way to actually fit such a model, say, by adjusting tolerances?
Don't know. I would start by debugging my way through the gls code to say where it is actually breaking. I don't think I have much to say other than what I already said in the SO thread you reference below.
3. a related thread
indicates that there are too many parameters being estimated given the length of the time series. Is there a numerical criterion that can be queried before the gls call so that I can test for invertibility and fit, say, a lower order model (e.g., AR(1)), so as to avoid an error?
You could certainly wrap this attempt in a try() or tryCatch() clause so that it wouldn't break your simulation run and you could then fall back on AR(1) ...
I cannot figure out from the code for gls what actually generates the error message. A pointer to the specific numerical criterion that generates this error would be very much appreciated.
Unpacking the nlme source code and searching for the error message
finds it on line 557 of src/corStruct.c:
F77_CALL(dqrdc2) (coef, &P, &P, &P, &sqrt_eps, &i, qraux, pivot, w\
ork);
if (i < P)
error(_("Coefficient matrix not invertible" ));
https://svn.r-project.org/R/trunk/src/appl/dqrdc2.f
says that dqrdc2 is a Householder transformation, but you'd really have
to figure this out for yourself ...
the i parameter corresponds to a k argument inside the FORTRAN
function: "k contains the number of columns of x judged to be linearly
independent." so it looks like you end up with a rank-deficient matrix
at some point.
In the stackoverflow thread, this is mentioned: The general rule of thumb is that you should have at least 10 times as many data points as parameters, and that's for standard fixed effect/regression parameters. (Generally variance structure parameters such as AR parameters are even a bit harder/require a bit more data than regression parameters to estimate.) Where in the literature is this rule discussed?
I got it from Frank Harrell's _Regression Modeling Strategies_ book.
4. If the only option is to fit a lower order model of error, say, AR(1), instead of AR(2), what kind of bias does this generate in the analysis?
Don't know. Since you're running simulations, you could find out ...
Many thanks in advance, S.