Skip to content

Time series and GLS

6 messages · LisaB, Kingsford Jones, Scott Foster

#
Hello -

I need to analyze some time series data in an ANOVA framework, but am unsure
of how to go about it.  I have data on nest success (response) over a 22
year period for two populations.  For each year I have one value of nest
success per population.  I am interested in determining 1) whether there are
differences in nest success over time between these two populations and 2)
what are the trends for each population over time.  My thought is to use GLS
and model temporal autocorrelation if the acf function indicates this is an
issue, but since population is a categorical variable I'm unsure if this is
appropriate.  Any advice would be much appreciated. Thank you. Lisa
#
The gls function in nlme fits a general linear model, so yes you can
have categorical predictors (the advantage over the lm function is the
error covariance matrix may have non-zero off-diagonals, such as with
an autocorrelation structure, and non-constant diagonals).

hth,
Kingsford Jones
On Fri, Jan 1, 2010 at 2:44 PM, LisaB <lisabaril at hotmail.com> wrote:
#
Thanks Kinsford.  I thought it would be appropriate.  As a follow up
question: My first thought is to set up the data file with three columns:
year, population (A,B), and nest success and then to input the following
formula: success.gls=gls(success~year*population).  This would allow me to
test for the effect of year for each population and then also test for
differences between the two populations.  My questions are 1) have I
specified the model right for those questions and 2) would the acf function
calculate the autocorrelation correctly even though my 'year' in the data
file is repeated twice (once for each value of nest success/population)? 
Thanks. Lisa (hope all is well with you)
Kingsford Jones wrote:

  
    
#
On Sat, Jan 2, 2010 at 3:34 PM, LisaB <lisabaril at hotmail.com> wrote:
Hi Lisa -- I didn't realize that was you in the first email -- all is
well, thanks.

To answer your first question, assuming normality and linearity I
would say that success ~ year*population is indeed a good place to
start.  The right-hand side expands to 1 + year + population
+year:population, and those 4 terms respectively will produce
estimates of the baseline (probably level A) intercept, baseline
slope, adjustment of that line up or down for population B, and
adjustment of the slope of the line for population B. So, for example,
for population B the predicted increase in mean success for a one unit
increase in year would come from the sum of the beta-hats from the 2nd
and 4th terms.  Checking for population effects could be an LRT
between models with and without the last two terms.

To answer the second question, you would be interested in modeling
autocorrelation within each of the two trajectories.  So if for
example an AR(1) structure was appropriate the correlation argument
could be specified as AR1(form = ~year | population)

Be sure to do lots of plotting as you build and check your model. I
would use lattice or ggplot to plot the fits within populations and to
get a feeling for the plausibility of linearity of the relationships,
normality and homogeneity of scatter around the two lines, and
independence.  As you mention, independence can be further checked
with an ACF plot (and semivariograms are useful for time series as
well as spatial data).  QQ plots of resids within populations are good
for normality checks and you can calibrate your judgement for 22
sample points by repeatedly using something like

par(mfrow = c(5, 5), mar = rep(1, 4))
for (i in 1:25) qqnorm(rnorm(22), main = '')


best,

Kingsford
#
Hi,

Just a few quick thoughts.

*) your success.gls model contains a linear effect for year.  Is this 
really likely over the time period you mention?  I would highly doubt it 
(but this is really just a guess).  If this is not the case then your 
residuals are likely to show falsely high autocorrelation, not because 
it is there but because the residuals come from an inappropriate mean model.
*) With the previous point in mind: have you considered using GAM 
models?  It seems like a perfect application as you can specify 
different smooth functions for each of the populations and then see if 
they really are all that different (through LRTs).
*) The GLS function will assume normality (albeit correlated).  Is this 
really all that believable?  In the GAM framework you could specify 
binomial data, an assumption that is much more likely to make sense.  Of 
course, your data may contain enough nests sampled and a favourable 
probability of success, to make the normality assumption very plausible.
*) The GAM model, when viewed as a random effects model, does specify a 
correlation structure amongst the outcomes.  It may not be the most 
appropriate correlation structure, nor even *an* appropriate structure 
but it may be a suitable starting place.  Most analysts would consider 
it a useful finishing place too (but you can extend the GAM model -- 
Richard Morton has done some work in this line although I can't find the 
reference right now). 

Be careful taking acf of residuals in GAM models -- the residuals from 
the model conditional on the random effects may not tell much about the 
correlation structure (need the marginal distribution for this).

I just notice that Kingsford Jones has sent through some more pointers.  
They are excellent suggestions, especially the one about plotting up 
your data -- visualisation is much more important than formal testing 
(in my opinion).  I believe that the above points are complementary to 
Kingsford's comments.

I hope that this helps,

Scott

PS  An excellent reference for GAMs is Simon Wood's Book (with 
accompanying R package).
LisaB wrote:

  
    
1 day later
#
Thanks for the very helpful advice.  I've been doing a lot of plotting and it
looks like portions of my time series are linearly related with year, but
there are clear deviations from this so I've also been looking at using gam
with smoothers, but still have a ways to go before settling on a model that
adequately describes the data.  After describing trends over time for nest
success, I'll eventually want to relate nest success with food availability
(also a time series variable).  Both nest success and food availability
generally decline over time in a similar way.  Since year and food
availability are correlated and I probably should not use them in the same
model to describe nest success, but I still need to control for year
-correct?  Is it valid to specify the general model as: nest success~food
availability+error even though both the response and explanatory variables
are negatively correlated with time?  I ask because in the "Mixed models and
extensions in R" book, chapter 14 describes a similar problem in which they
leave out year in a model where the explanatory variable exhibited a trend
over time.  However, in other sources I read that this could be a major
problem.  Sorry for the rather long-winded question.  I'm new to time series
analysis.  Thanks to anyone who can provide some insight.
LisaB wrote: