Hello!
I'm a master's student studying pollination networks. I have been furiously trying to learn about linear mixed models and glmms, but I have some specific questions relating to my project analysis that I am hoping someone can help me with
Here's the short of my project. I can provide more details if need be. I am looking at 16 pollination metrics (ex. specialization). A few of the metrics are count data (ex. floral abundance) and several are proportions (limited to be between 0 and 1). I am interested in how rainfall (high and low location) and wildlife exclusion (treatment) affect the pollination metrics. I have constructed 12 networks in total. 6 networks in the low rainfall area having 3 networks with wildlife excluded and 3 networks allowing wildlife. Then there are 6 networks in the high rainfall area again have 3 networks with wildlife excluded and 3 with wildlife included. So sample size is obviously small. It's a block design with 3 blocks in the low rainfall and 3 blocks in the south location. Each block has the wildlife excluded treatment and the wildlife allowed treatment.
Here are my questions:
The majority of my metrics fit model assumptions (normality of residuals, variance within groups, normality within groups, normality of random effects, and linearity/absence of heteroskedasticity). However I have some where normality appear to be violated and the fitted vs residuals plot is no good. Various transformations (log, ln, sqrt,arcsin(sqrt)) don't seem to help. >From reading papers by Dr. Ben Bolker, this is where it appears GLMMs come in.
So for the metrics that fit model assumptions my plan is to fit this model
metric.model <- lmer(metric ~ Treatment + Location + (1 |Blocks), data = UHURUnets)?
but for those where model assumptions aren't met, I'm not sure how one picks which exponential family to use and which link to use. How does one go about deciding what family and link to use?
I read in Dr. Bolker's TREE paper that binomial distribution and logit link are best for proportions. Is this generally the case?
NODF.M1 <- glmer(weighted_NODF ~ Treatment + Location + (1|Blocks), data = UHURUnets, family = binomial(link = "logit")?
For the count data (ex. floral abundance, insect abundance), it seems like I should use Poisson and log link according to that same paper paper.
No.Fl.units.M1 <- glmer(number_of_floral_units ~ Treatment + Location + (1|Blocks), data = UHURUnets, family = poisson(link = "log")?
But what distribution and link would one use for continuous data that is not in proportions?
And once you have made a GLMM model, I am assuming it is okay that this model still does not fit the normality assumptions or the residual vs fitted plots. Is this true?
My models (both glmms and lmer) currently only have random intercepts. I have read that it might be wise to also have random slopes as well because the pollination metric could vary for each treatment and location depending on which block it is in.
So then I believe I would have a model like this
?
Vuln.LL.M3 <- glmer(vulnerability.LL ~ Treatment + Location + (1 + Treatment|Blocks) + (1 + Location|Blocks), family = gaussian(link = log), data = UHURUnets)
I am not sure if this is correct. I get 2 warnings (failed to converge and unable to evaluated scaled gradient). Interestingly I appear to not get these warnings if I am running linear mixed models (lmer). Am I doing this correctly?
Lastly, is it appropriate to use interaction terms in GLMMs and lmers? I imagine that the rainfall level my interact with the treatment to influence the pollination metric.
metric.model <- glmer(metric ~ Treatment*Location + (1 |Blocks), data = UHURUnets, family = gaussian(link = log)??)
Many thanks in advance for your help!
Cheers,
Travis
model advice
2 messages · Guy,Travis J, Ben Bolker
4 days later
On 16-07-27 10:03 PM, Guy,Travis J wrote: > Hello! >
I'm a master's student studying pollination networks. I have been
furiously trying to learn about linear mixed models and glmms, but I have some specific questions relating to my project analysis that I am hoping someone can help me with
Here's the short of my project. I can provide more details if need
be. I am looking at 16 pollination metrics (ex. specialization). A few of the metrics are count data (ex. floral abundance) and several are proportions (limited to be between 0 and 1). I am interested in how rainfall (high and low location) and wildlife exclusion (treatment) affect the pollination metrics. I have constructed 12 networks in total. 6 networks in the low rainfall area having 3 networks with wildlife excluded and 3 networks allowing wildlife. Then there are 6 networks in the high rainfall area again have 3 networks with wildlife excluded and 3 with wildlife included. So sample size is obviously small. It's a block design with 3 blocks in the low rainfall and 3 blocks in the south location. Each block has the wildlife excluded treatment and the wildlife allowed treatment.
Here are my questions:
The majority of my metrics fit model assumptions (normality of
residuals, variance within groups, normality within groups, normality of random effects, and linearity/absence of heteroskedasticity). However I have some where normality appear to be violated and the fitted vs residuals plot is no good. Various transformations (log, ln, sqrt,arcsin(sqrt)) don't seem to help. From reading papers by Dr. Ben Bolker, this is where it appears GLMMs come in.
So for the metrics that fit model assumptions my plan is to fit this model
metric.model <- lmer(metric ~ Treatment + Location + (1 |Blocks),
data = UHURUnets)
but for those where model assumptions aren't met, I'm not sure how one picks which exponential family to use and which link to use. How does one go about deciding what family and link to use?
This is really a general question about generalized linear models, not about GLMMs - there are a fair number of questions (& answers) on CrossValidated: http://stats.stackexchange.com/search?q=glm+which+distribution
I read in Dr. Bolker's TREE paper that binomial distribution and
logit link are best for proportions. Is this generally the case? That depends. If you know the denominator (i.e. maximum possible number, also referred to as N), which would seem to be the case (in your case it would be the total number of species available for pollination, I guess?), then binomial/logit makes sense. If your response is weighted (as suggested by the variable name) it might get a little tricky, but as long as it seems sensible to set a maximum number on the possible responses it should be OK (although you will get warnings). You do need to include the N, in this case probably via a 'weights' argument
NODF.M1 <- glmer(weighted_NODF ~ Treatment + Location + (1|Blocks),
data = UHURUnets, family = binomial(link = "logit")?
For the count data (ex. floral abundance, insect abundance), it
seems like I should use Poisson and log link according to that same paper paper.
No.Fl.units.M1 <- glmer(number_of_floral_units ~ Treatment +
Location + (1|Blocks), data = UHURUnets, family = poisson(link = "log")? seems reasonable, although you should make sure to account for overdispersion
But what distribution and link would one use for continuous data
that is not in proportions? Generally your best hope for continuous data is a transformation. You can use a Gamma for data that are positive, but log-transformation followed by a linear mixed model is often reasonable too. We would probably need more information.
And once you have made a GLMM model, I am assuming it is okay that
this model still does not fit the normality assumptions or the residual vs fitted plots. Is this true? well, the residuals should still **approximately** fit these assumptions (worst for binary data)
My models (both glmms and lmer) currently only have random
intercepts. I have read that it might be wise to also have random slopes as well because the pollination metric could vary for each treatment and location depending on which block it is in. Yes, although it can be hard to get enough data to make this worthwhile.
So then I believe I would have a model like this ? Vuln.LL.M3 <- glmer(vulnerability.LL ~ Treatment + Location + (1 +
Treatment|Blocks) + (1 + Location|Blocks), family = gaussian(link = log), data = UHURUnets)
I am not sure if this is correct. I get 2 warnings (failed to converge
and unable to evaluated scaled gradient). Interestingly I appear to not get these warnings if I am running linear mixed models (lmer). Am I doing this correctly? Probably. There are lots of false positives. See ?convergence
Lastly, is it appropriate to use interaction terms in GLMMs and lmers?
I imagine that the rainfall level my interact with the treatment to influence the pollination metric.
metric.model <- glmer(metric ~ Treatment*Location + (1 |Blocks), data
= UHURUnets, family = gaussian(link = log)??)
Definitely OK.