Dear all potential friends, I am having a few problems with a glmer I am trying to create in R. Some of the them are the normal basic problems (defining random factors correctly) and some are a bit complicated (ie including a moving average). I have done a lot of reading and I've been going round in circles and I'm just not sure about some things. So, any help will be greatly appreciated. Firstly I will define my dataset and objectives: My data to be modelled is the abundance of a plant species in a fixed quadrat (and for the purposes of this post, just 1 species). I have a spatial hierachical design. At a site, I have 15 'plots', each plot contains 5 'quadrats'. I have a 'treatment' of 3 climates (control, +water, -water), and so for the 15 plots 5 of them were assigned to each treatment (therefore 5 is my 'real' replication per treatment). The data for each species was also recorded for 8 'years'. The first year no treatment was applied to any plots. My objective is to find if there is a change in species abundance across time (slope) for the different climate treatments. Firstly, I will only talk about the full model (and not the comparisons for full significance testing). Year is a continuous variable. Plot, Treatment, and Quadrat are factors. The model I have then is: Model1<-glmer(Species~Year*Treatment+(1|Plot), REML = FALSE, family=poisson, na.action = na.omit) You will notice that I am looking at log-likelihood to compare my models. That I am modelling with a poisson link function. And that my data contains some NA's which are omitted in the analysis. QUESTION 1: I do not include 'Quadrat' as a random variable (as stated in Crawley R book for the lowest level of the hierarchy). However, I am not sure if this is correct, as I do get slightly different results when included or not. So, for the spatial nesting should I use ((1|Plot/Quadrat) ?? QUESTION 2: Do I need to include 'Year' in the random variables as well ? I have tried (1|Plot) + (Year|Plot) (doesn't converge) and just (Year|Plot) (doesn't converge) and (as.factor(Year)|Plot) (doesn't converge). The lack of converging may be due to a large amount of zero's in my dataset, but not sure if this is the case or not. QUESTION 3: I could include a covariate for the level of the species in the first year. Although, I am not sure this is necessary as I am really interested in slope (ie the interaction between Year and Treatment), and a covariate should not alter this. Do you agree with my statement, or do you think I should include it ? If I do include it, is this a fixed or random variable. My feeling was random, but a colleague disagreed. QUESTION 4: I also want to test some moving average models using this framework. Is it possible to do this with glmer ?? I tried coding a simple function myself for different lag distances, but the averages then are not integers, which the poisson link function does not like. Any suggestions as to how I could run this, either within glmer, lmer, or something like the 'arima' function ?? (although I struggle to set up a time series dataset (ts) with the nested data). For example, I tried just using (log(Species+1)) for the data, but for the original model, I lose my significant differences between slopes. Finally then QUESTION 5: Using the full model I can use the default contrasts (contr.treatment) to look for significance between the slopes. Although clearly I should further investigate whether these aspects of the model should be included in the final model (e.g. look at model with the same slope, same intercept etc). The default contrasts are good for what I want, but I also want to see if I can have a single slope or intercept for 2 of the treatments, and different for the other one. Contrasts have always confused me, and my logical thought was [2,-1,-1];[-1,2,-1];[-1,-1,2] but this doesn't seem to do what I thought it would. This is not quite as important as the other aspects, but if you do have any advice, it would be gratefully received. I know there are a number of questions here. But I hope the details are fairly clear. Any help you can offer on just one question would be great. Thank you for your time Mark
Generalised Linear Mixed Model with Moving Average
2 messages · Mark Bilton, Ben Bolker
Mark Bilton <mcbilton at ...> writes:
I am having a few problems with a glmer I am trying to create in R.
[snip]
My data to be modelled is the abundance of a plant species in a fixed quadrat (and for the purposes of this post, just 1 species). I have a spatial hierachical design. At a site, I have 15 'plots', each plot contains 5 'quadrats'.
A single site, or multiple sites?
I have a 'treatment' of 3 climates (control, +water, -water), and so for the 15 plots 5 of them were assigned to each treatment (therefore 5 is my 'real' replication per treatment). The data for each species was also recorded for 8 'years'. The first year no treatment was applied to any plots. My objective is to find if there is a change in species abundance across time (slope) for the different climate treatments. Firstly, I will only talk about the full model (and not the comparisons for full significance testing). Year is a continuous variable. Plot, Treatment, and Quadrat are factors.
Do you really need to keep quadrats separate? If you take the mean of quadrats to get a single value for each plot, are your data adequately/approximately normally distributed? This would let you do a LMM instead of a GLMM, which would simplify your life. In fact, then you might not need a LMM at all.
The model I have then is: Model1<-glmer(Species~Year*Treatment+(1|Plot), REML = FALSE, family=poisson, na.action = na.omit)
REML=FALSE is redundant/ignored in glmer, as explained in a recent thread on this list.
QUESTION 1: I do not include 'Quadrat' as a random variable (as stated in Crawley R book for the lowest level of the hierarchy). However, I am not sure if this is correct, as I do get slightly different results when included or not. So, for the spatial nesting should I use ((1|Plot/Quadrat) ??
If you include quadrat, then you are essentially allowing for extra-Poisson variation in the samples, or equivalently using a lognormal-Poisson rather than a Poisson distribution. If your model is feasible and there is a non-negligible amount of variability estimated at the among-quadrat/within-plot level, you probably do want to retain quadrat as a random variable (or lump and hope for normality, see above).
QUESTION 2: Do I need to include 'Year' in the random variables as well ? I have tried (1|Plot) + (Year|Plot) (doesn't converge) and just (Year|Plot) (doesn't converge) and (as.factor(Year)|Plot) (doesn't converge). The lack of converging may be due to a large amount of zero's in my dataset, but not sure if this is the case or not.
If you include (Year|Plot), you are allowing for the possibility that there is random variation in slopes (as well as intercepts) across plots. as.factor(Year)|Plot allows for random (not necessarily linear) year-to-year variation that varies across plots. (1|Plot) and (Year|Plot) should be the same, because the latter includes an implicit intercept term.
QUESTION 3: I could include a covariate for the level of the species in the first year. Although, I am not sure this is necessary as I am really interested in slope (ie the interaction between Year and Treatment), and a covariate should not alter this. Do you agree with my statement, or do you think I should include it ? If I do include it, is this a fixed or random variable. My feeling was random, but a colleague disagreed.
It's certainly not impossible that there could be a relationship between overall level and slope. Have you tried looking at the data? Fixed makes more sense to me.
QUESTION 4: I also want to test some moving average models using this framework. Is it possible to do this with glmer ?? I tried coding a simple function myself for different lag distances, but the averages then are not integers, which the poisson link function does not like.
glmer does not allow so-called 'R-side' correlation structures; nor does lmer. You can do this in lme, and possibly handle a GLMM with such correlation structures in glmmPQL (in the MASS package), but glmmPQL has some disadvantages. I would suggest you look at the residuals from an otherwise full model to see if there is any hint of temporal autocorrelation ... this is a pretty small/short data set for detecting temporal autocorrelation.
I'm going to let someone else have question 5.