Generalised Linear Mixed Model with Moving Average

2 messages · Mark Bilton, Ben Bolker

Fri, Aug 6, 2010 7:39 AM #

Dear all potential friends,

I am having a few problems with a glmer I am trying to create in R.
Some of the them are the normal basic problems (defining random factors
correctly) and some are a bit complicated (ie including a moving average). I
have done a lot of reading and I've been going round in circles and I'm just not
sure about some things. So, any help will be greatly appreciated.

Firstly I will define my dataset and objectives:

My data to be modelled is the abundance of a plant species in a fixed quadrat
(and for the purposes of this post, just 1 species).
I have a spatial hierachical design. At a site, I have 15 'plots', each plot
contains 5 'quadrats'.

I have a 'treatment' of 3 climates (control, +water, -water), and so for the 15
plots 5 of them were assigned to each treatment (therefore 5 is my 'real'
replication per treatment).

The data for each species was also recorded for 8 'years'.
The first year no treatment was applied to any plots.

My objective is to find if there is a change in species abundance across time
(slope) for the different climate treatments.

Firstly, I will only talk about the full model (and not the comparisons for full
significance testing).

Year is a continuous variable.
Plot, Treatment, and Quadrat are factors.

The model I have then is:

Model1<-glmer(Species~Year*Treatment+(1|Plot), REML = FALSE, family=poisson,
na.action = na.omit)

You will notice that I am looking at log-likelihood to compare my models. That I
am modelling with a poisson link function. And that my data contains some NA's
which are omitted in the analysis.

QUESTION 1:
I do not include 'Quadrat' as a random variable (as stated in Crawley R book for
the lowest level of the hierarchy). However, I am not sure if this is correct,
as I do get slightly different results when included or not. So, for the spatial
nesting should I use ((1|Plot/Quadrat) ??

QUESTION 2:
Do I need to include 'Year' in the random variables as well ?
I have tried  (1|Plot) + (Year|Plot) (doesn't converge) and just (Year|Plot)
(doesn't converge) and (as.factor(Year)|Plot) (doesn't converge). The lack of
converging may be due to a large amount of zero's in my dataset, but not sure if
this is the case or not.

QUESTION 3:
I could include a covariate for the level of the species in the first year.
Although, I am not sure this is necessary as I am really interested in slope (ie
the interaction between Year and Treatment), and a covariate should not alter
this. Do you agree with my statement, or do you think I should include it ? If I
do include it, is this a fixed or random variable. My feeling was random, but a
colleague disagreed.

QUESTION 4:
I also want to test some moving average models using this framework. Is it
possible to do this with glmer ?? I tried coding a simple function myself for
different lag distances, but the averages then are not integers, which the
poisson link function does not like. Any suggestions as to how I could run this,
either within glmer, lmer, or something like the 'arima' function ?? (although I
struggle to set up a time series dataset (ts) with the nested data). For
example, I tried just using (log(Species+1)) for the data, but for the original
model, I lose my significant differences between slopes.

Finally then
QUESTION 5:
Using the full model I can use the default contrasts (contr.treatment) to look
for significance between the slopes. Although clearly I should further
investigate whether these aspects of the model should be included in the final
model (e.g. look at model with the same slope, same intercept etc). The default
contrasts are good for what I want, but I also want to see if I can have a
single slope or intercept for 2 of the treatments, and different for the other
one. Contrasts have always confused me, and my logical thought was
[2,-1,-1];[-1,2,-1];[-1,-1,2] but this doesn't seem to do what I thought it
would. This is not quite as important as the other aspects, but if you do have
any advice, it would be gratefully received.

I know there are a number of questions here. But I hope the details are fairly
clear. Any help you can offer on just one question would be great.

Thank you for your time
Mark

Ben Bolker

Fri, Aug 6, 2010 7:48 PM #

Mark Bilton <mcbilton at ...> writes:

[snip]

A single site, or multiple sites?

Do you really need to keep quadrats separate?  If you take the
mean of quadrats to get a single value for each plot, are your
data adequately/approximately normally distributed?  This would let
you do a LMM instead of a GLMM, which would simplify your life.
In fact, then you might not need a LMM at all.

REML=FALSE is redundant/ignored in glmer, as explained in a recent
thread on this list.

If you include quadrat, then you are essentially allowing for
extra-Poisson variation in the samples, or equivalently using
a lognormal-Poisson rather than a Poisson distribution.  If your
model is feasible and there is a non-negligible amount of variability
estimated at the among-quadrat/within-plot level, you probably
do want to retain quadrat as a random variable (or lump and hope
for normality, see above).

If you include (Year|Plot), you are allowing for
the possibility that there is random variation in slopes (as well
as intercepts) across plots.  as.factor(Year)|Plot allows for random
(not necessarily linear) year-to-year variation that varies across plots.

(1|Plot) and (Year|Plot) should be the same, because the latter includes
an implicit intercept term.

It's certainly not impossible that there could be a relationship
between overall level and slope. Have you tried looking at the data?
Fixed makes more sense to me.

glmer does not allow so-called 'R-side' correlation structures;
nor does lmer.  You can do this in lme, and possibly handle a GLMM
with such correlation structures in glmmPQL (in the MASS package),
but glmmPQL has some disadvantages.

  I would suggest you look at the residuals from an otherwise
full model to see if there is any hint of temporal autocorrelation ...
this is a pretty small/short data set for detecting temporal autocorrelation.

I'm going to let someone else have question 5.