Skip to content

Mixed models and multivariate methods for temporal-spatial nested data

3 messages · Tania Bird, Bob OHara

#
Hi all,

I am seeking advice on how to analyse my unbalanced, multi-nested
multivariate data set. I realise there are many questions in this
email and I would be willing to consult with someone privately on this
if it is an option.

I am using abundance data for insect species (I have the same
experimental design for reptiles, and annual plants as well). I use
Simpson's diversity as a univariate response and species composition
as a multivariate response.

Experimental Design:
Plots are divided into three habitat types A, B, C based on vegetation.
Each habitat has 3 or 4 replicate control plots that are repeat
sampled (one sample a year always in spring).
In addition B and C have 3 or 4 treatment (vegetation removal) plots.
 'S' plots are disturbed( trampling and off-road vehicles) but the
disturbance is unquantified and I don't know the pre-disturbance
habitat type.

The total data set is across a 12 year period, but the sampling was
unbalanced for various reasons. I attach a png of the metadata of the
plots over time to show the unbalanced sampling.
https://www.dropbox.com/s/7vxvo3x9lnywdbm/insects_years.gif?dl=0

Each year the sampling across plots was conducted at the same time,
and so plots are comparable within a year.
In general, As were sampled every year and are considered the 'target'
habitat. B's were sampled in the earlier years and C's later on, and
in the last couple of years all three types were sampled together.

The treatments on B & C were conducted using different methods and in
different years, so in principle I should probably test each
separately just against their own control pairs. However the
hypothesis for both treatments is that treated plots will be more
similar in composition to A plots than the paired control plots (if
possible I want to check if they become more or less similar to A over
time).

So in that regard I thought there might be a way to include all
habitat types in one analysis? Perhaps using time as "number of years
since treatment" rather than a date? (Although I have no environmental
data with which to standardise).  S dunes have no "pre-treatment"  but
the hypothesis is that S plots will be most similar to A compared to
all other (treated and control) plot types.  I am not sure how to
include these plots in a testable model.

Questions regarding the design:
e.g. A's were the only plots sampled in 2010- should I remove that
year completely?
e.g. C1 & C5 were sampled in 2005 while the rest were not until 2011,-
should I only include data from 2011 onwards for all C's?
e.g. Should I remove A4 completely since its only sampled in the last
few years or its still useable?
I have already analysed my first research question
Q1) To understanding the differences in diversity and composition
across control habitats, irrelevant of time.

The analysis approach I used for this is:
i) Mixed effect model:  GLMM PQL (Penalised Quasi-Likelihood) using
MASS R package.
    Diversity ~   fixed effect = habitat type + random effect = year ,
Family = poisson

ii) Pairwise permutational multivariate analysis of variance (MANOVA)
with R code based on the adonis2 function, to determine if the
composition among habitats (visualised in NMDS) were significantly
different from each other.

iii) RDA with habitat as explanatory and year as covariate to test
explained variance.

Now I am trying to expand this analysis to include a temporal element
to answer Q2 & Q3
Q2) to understand the trends in diversity and composition over time in
control habitats
Q3) to understand the impact of treatment on diversity and composition
(over time if possible?)

The addition of time into the analyses is a bit difficult for me to
work out, due to the multi-nested and unbalanced design of the data; I
am not sure what methods to use to include time as a variable for
looking at a) diversity and b) composition

Questions regarding analyses:
I thought to create a Principle Response Curve to see relative
differences over time, but as far as I understand, I cannot use a
permutation test here due to the unbalance design. I also thought to
take the scores on the first RDA axis as a univariate measure, and
then plot this over time.. but I'm not sure if its an appropriate
approach or how to then test this statistically.

I also thought to try and create some measure of "compositional
temporal stability" for each plot and test this using ANOVA (like some
sort of "multivariate Coefficient of Variation"). One such measure
could be distance of each plot-year from the habitat centroid in
ordination space but again, I'm not sure if this is an appropriate
approach. Any suggestions for other measures would be welcome.
I would like to see if I can detect some form of resistance to, or
recovery from, the treatment over time ... But if not, can I test the
overall treatment affect and use time as a random effect like i did
for my first question?

Thank you for any suggestions of analyses and/or ways to subset the
data that would allow me to answer these questions.

With kind regards,

Tania

PhD Student
Geo-Ecology Lab
Ben Gurion University
#
Hm, this is a big job. The optimal solution is to see if your university 
offers a statistical consulting service. I don't see any big conceptual 
problems, but getting a good analysis will take a bit of time and 
exploration. I think you can probably 'just' use a GLMM, but getting the 
right GLMM and deciding what a good model is will take time and some 
poking of the data.

Anyway, some answers below, which may (or may not) help.
On 02/27/2017 04:27 PM, Tania Bird wrote:
Yes you can. you obviously need a Treatment effect, and you should 
expect to have a Treatment by Habitat interaction.

There may also be some sort of interaction with time (either as Time, or 
Time Since Treatment)
No, you should be able to use all of the data, you just have to be a bit 
careful about how you model Time.
Yes, in principal. It just doesn't have a Habitat:Treatment interaction.
There are better tools than glmmPQL nowadays. Have a look at the lme4 
package, for example.
Take a look at repeated measures models. There are a few ways this could 
be set up, depending a bit on the data.
There are probably several. :-) For example you could include Time as a 
continuous covariate, alongside the random effect. You could also just 
include it as a fixed effect, but that could get messy.
That's essentially a question about the variance in responses. There are 
doubly hierarchical models that you could try, but you might not want to 
go there.
Essentially you need some structure on the time covariate. You could 
start by using time since treatment as a factor, and plot those 
estimates. Again, there should be a bit of playing around with the 
model, to see what makes sense.

Bob
#
Many thanks for your useful advice Bob!
Unfortunately I did try to use my University's statistical consulting
department, but they were not able to provide advice at this level for
either the multivariate or mixed effect models. :(
I would be happy to consult with someone else if anyone if offering
such a service?

Tania Bird
On 27 February 2017 at 18:14, Bob OHara <bohara at senckenberg.de> wrote: