model advice - R-SIG-mixed-models

Wed, Jul 27, 2016 7:03 PM #

Hello!

I'm a master's student studying pollination networks. I have been furiously trying to learn about linear mixed models and glmms, but I have some specific questions relating to my project analysis that I am hoping someone can help me with


Here's the short of my project. I can provide more details if need be. I am looking at 16 pollination metrics (ex. specialization). A few of the metrics are count data (ex. floral abundance) and several are proportions (limited to be between 0 and 1). I am interested in how rainfall (high and low location) and wildlife exclusion (treatment) affect the pollination metrics. I have constructed 12 networks in total. 6 networks in the low rainfall area having 3 networks with wildlife excluded and 3 networks allowing wildlife. Then there are 6 networks in the high rainfall area again have 3 networks with wildlife excluded and 3 with wildlife included. So sample size is obviously small. It's a block design with 3 blocks in the low rainfall and 3 blocks in the south location. Each block has the wildlife excluded treatment and the wildlife allowed treatment.

Here are my questions:

The majority of my metrics fit model assumptions (normality of residuals, variance within groups, normality within groups, normality of random effects, and linearity/absence of heteroskedasticity). However I have some where normality appear to be violated and the fitted vs residuals plot is no good. Various transformations (log, ln, sqrt,arcsin(sqrt)) don't seem to help.  >From reading papers by Dr. Ben Bolker, this is where it appears GLMMs come in.

So for the metrics that fit model assumptions my plan is to fit this model

    metric.model <- lmer(metric ~ Treatment + Location + (1 |Blocks), data = UHURUnets)?

but for those where model assumptions aren't met, I'm not sure how one picks which exponential family to use and which link to use. How does one go about deciding what family and link to use?

I read in Dr. Bolker's TREE paper that binomial distribution and logit link are best for proportions. Is this generally the case?

NODF.M1 <- glmer(weighted_NODF ~ Treatment + Location + (1|Blocks), data = UHURUnets, family = binomial(link = "logit")?

For the count data (ex. floral abundance, insect abundance), it seems like I should use Poisson and log link according to that same paper paper.

No.Fl.units.M1 <- glmer(number_of_floral_units ~ Treatment + Location + (1|Blocks), data = UHURUnets, family = poisson(link = "log")?

But what distribution and link would one use for continuous data that is not in proportions?

And once you have made a GLMM model, I am assuming it is okay that this model still does not fit the normality assumptions or the residual vs fitted plots. Is this true?

My models (both glmms and lmer) currently only have random intercepts. I have read that it might be wise to also have random slopes as well because the pollination metric could vary for each treatment and location depending on which block it is in.

So then I believe I would have a model like this
?
Vuln.LL.M3 <- glmer(vulnerability.LL ~ Treatment + Location + (1 + Treatment|Blocks) + (1 + Location|Blocks), family = gaussian(link = log), data = UHURUnets)

I am not sure if this is correct. I get 2 warnings (failed to converge and unable to evaluated scaled gradient). Interestingly I appear to not get these warnings if I am running linear mixed models (lmer). Am I doing this correctly?

Lastly, is it appropriate to use interaction terms in GLMMs and lmers? I imagine that the rainfall level my interact with the treatment to influence the pollination metric.

 metric.model <- glmer(metric ~ Treatment*Location + (1 |Blocks), data = UHURUnets, family = gaussian(link = log)??)

Many thanks in advance for your help!

Cheers,
Travis

Ben Bolker

Sun, Jul 31, 2016 7:06 PM #

On 16-07-27 10:03 PM, Guy,Travis J wrote: > Hello!  >

furiously trying to learn about linear mixed models and glmms, but I
have some specific questions relating to my project analysis that I am
hoping someone can help me with

be. I am looking at 16 pollination metrics (ex. specialization). A
  few of the metrics are count data (ex. floral abundance) and several
  are proportions (limited to be between 0 and 1). I am interested in
  how rainfall (high and low location) and wildlife exclusion
  (treatment) affect the pollination metrics. I have constructed 12
  networks in total. 6 networks in the low rainfall area having 3
  networks with wildlife excluded and 3 networks allowing
  wildlife. Then there are 6 networks in the high rainfall area again
  have 3 networks with wildlife excluded and 3 with wildlife
  included. So sample size is obviously small. It's a block design
  with 3 blocks in the low rainfall and 3 blocks in the south
  location. Each block has the wildlife excluded treatment and the
  wildlife allowed treatment.

residuals, variance within groups, normality within groups,
  normality of random effects, and linearity/absence of
  heteroskedasticity). However I have some where normality appear to
  be violated and the fitted vs residuals plot is no good. Various
  transformations (log, ln, sqrt,arcsin(sqrt)) don't seem to help.
  From reading papers by Dr. Ben Bolker, this is where it appears
  GLMMs come in.

metric.model <- lmer(metric ~ Treatment + Location + (1 |Blocks),
    data = UHURUnets)

This is really a general question about generalized linear models,
not about GLMMs - there are a fair number of questions (& answers)
on CrossValidated:

http://stats.stackexchange.com/search?q=glm+which+distribution

logit link are best for proportions. Is this generally the case?

   That depends.  If you know the denominator (i.e. maximum possible
number, also referred to as N), which would seem to be the case (in
your case it would be the total number of species available for
pollination, I guess?), then binomial/logit makes sense.

  If your response is weighted (as suggested by
the variable name) it might get a little tricky, but as long
as it seems sensible to set a maximum number on the possible
responses it should be OK (although you will get warnings).
You do need to include the N, in this case probably via a 'weights'
argument

data = UHURUnets, family = binomial(link = "logit")?

seems like I should use Poisson and log link according to that same
  paper paper.

Location + (1|Blocks), data = UHURUnets, family = poisson(link =
  "log")?

seems reasonable, although you should make sure to account for
overdispersion

that is not in proportions?

Generally your best hope for continuous data is a transformation.
You can use a Gamma for data that are positive, but log-transformation
followed by a linear mixed model is often reasonable too.  We would
probably need more information.

this model still does not fit the normality assumptions or the
  residual vs fitted plots. Is this true?

well, the residuals should still **approximately** fit these
assumptions (worst for binary data)

intercepts. I have read that it might be wise to also have random
  slopes as well because the pollination metric could vary for each
  treatment and location depending on which block it is in.

Yes, although it can be hard to get enough data to make this
worthwhile.

Treatment|Blocks) + (1 + Location|Blocks), family = gaussian(link =
log), data = UHURUnets)

and unable to evaluated scaled gradient). Interestingly I appear to not
get these warnings if I am running linear mixed models (lmer). Am I
doing this correctly?

  Probably. There are lots of false positives.  See ?convergence

I imagine that the rainfall level my interact with the treatment to
influence the pollination metric.

= UHURUnets, family = gaussian(link = log)??)

Definitely OK.