[EXTERNAL] Re: Using GLMs or GLMMs for diversity metrics?

Hello Tom,
Thanks!? That?s great news about gllvm.

Yes..I think it is a very useful statistical method, the gllvm package 
is great, and the people who have programmed it are very helpful (which 
is also important).
I?m a bit worried about your comment that you don?t like to use (1 | 
yearF) as a random effect.

Well...opinions may (and will) differ....but the way I see random 
effects is something that is a random grab out of a large number of 
possible values. Say you have 10000s of birds out there in the wild, and 
somehow you decided to catch and tag a random selection of them....say 
100. You measure multiple observations from the same bird, and then you 
use a random intercept 'Bird' to model a different mean value per bird, 
and en-passant the random intercept models dependency. And you then want 
to generalise that back to all those birds out there. Three for the 
price of one. But you do assume that the random effects are iid. As in 
'independent and identical' distributed. With emphasis on independent.? 
Whether that assumptions holds, I don't know.

Now to your 'year' as random effect....if you collect 5, 10 or 20 years 
of sequential data...then that is not a random grab out of a 
large?number of possible years. They are (mostly) sequential.? And if 
you plot the estimated random effects versus time, then I am pretty sure 
that you are going to see a nice temporal trend in those estimated 
random effects, which is not iid. What is the consequence of this? I 
don't know. That will be very data and model specific. Sometimes, if you 
do the right thing and you compare it with a GLMM in which year is a 
random effect, then you see minimal differences in the fixed effects, 
but sometimes you do see differences. And finally..if you use Year with 
only (say) 5 levels as a random effect....to what exactly do you 
generalise it to?

Also...if you are going to do simulations/predictions from the model 
(e.g. as in DHARMa model validation).....assuming idd for a random 
effect that is in fact AR1 (or something else) may give you some trouble.
When there are multiple sites, how else do you account for the 
temporal fluctuations being concordant across sites?? I assume you 
don?t just treat them as if each site is experiencing year qualities 
independently?

I guess there is where the R-INLA fun starts provided that you have 
enough spatial locations. And if you don't...then yeah...maybe use year 
as a random effect after all?

But as I said...I'm pretty sure that there are plenty of people with a 
different opinion.

This may be a relevant link (within the context of GLLVM) as well:

https://openresearch-repository.anu.edu.au/server/api/core/bitstreams/a7eecd7d-e78a-410d-ab4e-7c12222a6b4a/content
Thanks for all your contributions to ecology-l, and for your 
books.I?ve recommended them to several of my co-workers.

Thanks......urgently begging for second editions.

Kind regards,

Alain
Tom

*From:* highstat at highstat.com <highstat at highstat.com>
*Sent:* Tuesday, October 15, 2024 11:56 AM
*To:* r-sig-ecology at r-project.org; acbarton at mun.ca; Philippi, Tom 
<Tom_Philippi at nps.gov>
*Cc:* tephilippi <tephilippi at gmail.com>; Bert van der Veen 
<bert.v.d.veen at ntnu.no>
*Subject:* RE: [EXTERNAL] Re: [R-sig-eco] Using GLMs or GLMMs for 
diversity metrics?

Hello Tom,

The development version of gllvm allows you to include multiple random 
effects. And it can also do random slopes.??Hence, I think that the 
models below can be fitted in gllvm. But you may want to double check 
that with the gllvm-folks (they are super helpful).

With a package available like gllvm, one should really stop doing the 
more classical multivariate methods.

Personally, I do not like to use year (your yearF) as a random 
intercept, but that is a different discussion.

Kind regards,

Alain

On 15 Oct 2024 at 20:24 +0200, Philippi, Tom <Tom_Philippi at nps.gov>, 
wrote:

    One reason Alana might not want to use GLLVM instead is the
    repeated sites and repeated years structure.

    The standard model for a trend in a single species with multiple
    sites revisited multiple times (years) from VanLeeuwen et al 1996
    and Piepho & Ogutu 2002 includes random effect intercepts and
    (temporal) slopes among sites, and random effects of temporal
    (year to year) fluctuations concordant across all sites. In lme4
    formula notation (where yearC is centered continuous year and
    yearF is year as a factor):
    Y ~ yearC + (1 | yearF) + (1 + yearC | siteID)

    One of those papers notes that what I denote as "yearC" could just
    as well be a predictor covariate that varies over years: the
    general model still needs to account for the correlated /
    concordant across all sites fluctuations in the covariate. In this
    case with an interest in warm v cold years, the model would be:

    Y ~ temperature + (1 | yearF) + (1 + temperature | siteID)
    with temperature a factor of {warm, cold}

    The appropriate "test" of warm v cold years is against year to
    year fluctuations, lest a wet spring or some other event affecting
    all sites be interpreted as 6 independent events across the 6 sites.

    Unless there are a largish number of years in the study, given the
    binomial nature of the temperature predictor, with only 6 sites
    the simpler model might need to be fit:
    Y ~ temperature + (1 | yearF) + (1 | siteID)

    GLLVM is an awesome tool, but it addresses different questions,
    and to the best of my knowledge & experimentation with it doesn't
    accommodate this form of sampling design. I would love to be
    corrected by an example of how to specify this random effect
    structure driven by the sampling process. And, I would probably
    include something from GLLVM as an additional perspective on
    patterns in the data.

    That Piepho & Ogutu model in glmer works for counts of a species,
    and for species richness, via Poisson or negative binomial
    families. [brms or glmmTMB may do a better job on the estimation
    than glmer.] Diversity indices & evenness are a bit trickier
    because they are continuous but their error distributions are
    rarely normal and can be constrained or truncated.

    I'm jumping in here because I was dealing with just this issue
    last week for vegetation monitoring at Santa Monica Mountains
    National Recreation Area. They have species richness recorded by
    segments of a single 1x30m transect at each site, and will be
    testing for trends over time in species richness at the spatial
    scales of transects, and of 1x5m segments (nested within
    transects). Their approach to a Shannon diversity metric from 100
    point intercepts at each sites is to fit the P&O model with the
    additional covariates as normal error, then densityplot of the
    residuals to see if they are unimodal and close to normally
    distributed. If not, they'll use a different error distribution in
    glmmTMB or brms. Note that they also have a beta diversity among
    segments within each site, and could in theory treat that the same
    way they treat the diversity metric from the point intercepts.

    I hope this helps you think about what you are trying to learn
    about your data, especially the not always obvious artifacts of
    the sampling that should be accounted for in the analyses, less
    your results reflect the sampling design rather than the
    ecological responses.

    Irwin, B.J., Wagner, T., Bence, J.R., Kepler, M.V., Liu, W. and
    Hayes, D.B., 2013. Estimating spatial and temporal components of
    variation for fisheries count data using negative binomial mixed
    models. Transactions of the American Fisheries Society, 142(1),
    pp.171-183.

    Piepho, H.P. and Ogutu, J.O., 2002. A simple mixed model for trend
    analysis in wildlife populations. Journal of agricultural,
    biological, and environmental statistics, 7, pp.350-360.

    VanLeeuwen, D.M., Murray, L.W. and Urquhart, N.S., 1996. A mixed
    model with both fixed and random trend components across time.
    Journal of Agricultural, Biological, and Environmental Statistics,
    pp.435-453.

    Tom Philippi
    Inventory and Monitoring Program Central Support Office
    National Park Service
    Tom_Philippi at nps.gov

    -----Original Message-----
    From: R-sig-ecology <r-sig-ecology-bounces at r-project.org> On
    Behalf Of Alain Zuur via R-sig-ecology
    Sent: Sunday, October 13, 2024 2:41 AM
    To: r-sig-ecology at r-project.org; acbarton at mun.ca
    Subject: [EXTERNAL] Re: [R-sig-eco] Using GLMs or GLMMs for
    diversity metrics?

        Message: 1
        Date: Fri, 11 Oct 2024 18:25:38 -0400
        From: "Barton, Alana Charlotte" <acbarton at mun.ca>
        To: r-sig-ecology at r-project.org
        Subject: [R-sig-eco] Using GLMs or GLMMs for diversity metrics?
        Message-ID:

        <CAP+=Be1jJ_qrckg7TC8fd7NFbwU=8Z4fkUoX-itrgjeNzRDBMQ at mail.gmail.com>
        Content-Type: text/plain; charset="utf-8"

        Hello,
        I would appreciate some help in a question regarding
        statistical analysis.
        I'm looking at species count data where sampling was carried
        out over
        multiple years in repeated sites. So each year was sampled at six
        different sites for example. The years were categorized into a
        temperature group with two factors:warm or cold. However, I'm only
        interested in exploring community differences between temp.
        groups and
        across years. I used the vegan package in R for calculating
        diversity
        metrics(abundance, richness, diversity index, evenness) and
        want to
        statistically check differences among metrics from factors of
        group and year.
        I have been using the manyglm-mvabund package with negative
        binomial
        distribution, but there is the issue that mvabund doesn't fit
        non-integer data well, and I'm worried its incorrectly computing
        diversity and evenness stats. Additionally, I'm wondering if the
        repeated sites should be added as a fixed effect to mitigate
        this? Or
        if it's even considered a random effect actually and a mixed
        model is
        more appropriate, using glmmTMB instead in this case? I'm not
        terribly familiar with using mixed models in R so any help is
        appreciated.
        Thank you for your help

    Hello Alana,

    Instead of using a diversity index, why not focus on the original
    species using a multivariate GLMM? You can use a generalised
    linear latent variable model (GLLVM) for this. That is a more
    useful analysis as compared to using 4 different diversity indices
    (which, by the way, are all derived from the same data, and that
    is a problem on itself).

    You can find information of GLLVM here:

    https://jenniniku.github.io/gllvm/articles/vignette1.html

    Or you can join one of our upcoming online workshops on GLLVM:

    https://www.highstat.com/Courses/Flyers/Flyer2024_01_SpatTempGLM.pdf

    This workshop is in the EU time zone, but we are planning the same
    workshop in the 9 December week in the EST time zone.

    The setup of the random effects structure and covariates were
    already discussed by Michael Zyphur, and can be applied in GLLVM
    as well.

    Kind regards,

    Alain

    _______________________________________________
    R-sig-ecology mailing list
    R-sig-ecology at r-project.org
    https://stat.ethz.ch/mailman/listinfo/r-sig-ecology