Thank you Thierry,
The model below does converge and does not produce any warning messages,
but the random effect variance and std dev are both = 0:
mDist <- glmer(distance ~ CSs.lat + CSdirect + CSstart + year + age*sex +
(1|id), data = birds, family = Gamma(link = log), nAGQ = 10, control =
glmerControl(optimizer = "bobyqa"))
As I understand it, and please correct me if I'm wrong, it is possible
(but perhaps unlikely) to have these values = 0. If so, I believe this
implies that either the random effect variable is truly not variable or
that the variance of the random effect is being captured by the other fixed
effects. In my case, that might imply that any variation between birds is
captured by year, age, or sex. So, assuming that logic is correct (and it
may not be), then the following model would most likely show a variance and
std dev > 0:
mDist <- glmer(distance ~ CSs.lat + (1|id), data = birds, family =
Gamma(link = log), nAGQ = 10, control = glmerControl(optimizer = "bobyqa"))
But, it does not, and still shows a variance and std dev of 0. A quick
boxplot of distance grouped by bird id shows both substantial variation
across birds and at times within birds.
Perhaps I'm still missing something? Is 137 observations really too few
for a model with 1 fixed and 1 random effect variable?
Apologies for my ignorance. I do appreciate the guidance while I learn to
swim in the GLMM sea.
scott
Scott LaPoint
Postdoctoral Researcher, Lamont-Doherty Earth Observatory, Columbia
University
Associate Scientist, Max-Planck Institute for Ornithology
skype: scott_lapoint
twitter @sdlapoint
scottlapoint.weebly.com
On Wed, Jul 25, 2018 at 6:29 PM, Thierry Onkelinx <
thierry.onkelinx at inbo.be> wrote:
Dear Scott,
Random effects model only information which is not captured by the fixed
effects. And the random effects are subject to shrinkage. Combine this with
a large number of fixed effect parameters, a small data set and unbalanced
repeated measurements. Then zero variance random effects and convergence
issues don't come as a surprise.
?Bottom line: ?your model is too complex for the data. You'll need to
drop variables or make more observations (often not feasible, I know).
Using a different transformation/link/distribution won't solve any of
these issues.
Best regards,
ir. Thierry Onkelinx
Statisticus / Statistician
Vlaamse Overheid / Government of Flanders
INSTITUUT VOOR NATUUR- EN BOSONDERZOEK / RESEARCH INSTITUTE FOR NATURE
AND FOREST
Team Biometrie & Kwaliteitszorg / Team Biometrics & Quality Assurance
thierry.onkelinx at inbo.be
Havenlaan 88
<https://maps.google.com/?q=Havenlaan+88&entry=gmail&source=g> bus 73,
1000 Brussel
www.inbo.be
////////////////////////////////////////////////////////////
///////////////////////////////
To call in the statistician after the experiment is done may be no more
than asking him to perform a post-mortem examination: he may be able to say
what the experiment died of. ~ Sir Ronald Aylmer Fisher
The plural of anecdote is not data. ~ Roger Brinner
The combination of some data and an aching desire for an answer does not
ensure that a reasonable answer can be extracted from a given body of data.
~ John Tukey
////////////////////////////////////////////////////////////
///////////////////////////////
<https://www.inbo.be>
2018-07-25 22:05 GMT+02:00 Scott LaPoint <sdlapoint at gmail.com>:
Thank you Paul, I appreciate your time. And, apologies if my
understanding
is often incomplete.
Hi Scott,
1. Is a Gamma distribution best for my distance data? If so, which
function is most appropriate? I explored two link functions:
log. I have concerns and see potential issues with both (see my
in the reproducible example below.
I don?t know (I haven?t run your code) but I?ve always somehow managed
avoid gamma regression for strictly positive data by logging the
and fitting a model with normal errors.
If possible, I'd rather not transform the raw data to facilitate
interpretation of the coefficient estimates. I'm likely naive or
misunderstanding something though. Log transforming the distance data
does
produce a reasonably normal distribution. The following two models have
very similar AIC, BIC, LogLik, etc. estimates and the p-values of the
fixed
effects produce similar interpretations. However, the fixed effects
estimates are quite different.
gammaDist <- glmer(distance ~ CSs.lat + CSdirect + CSstart + year +
age*sex
+ (1|id), data = birds, family = Gamma(link = log), nAGQ = 10, control =
glmerControl(optimizer = "bobyqa"))
summary(gammaDist)
logGausDist <- glmer(log(distance) ~ CSs.lat + CSdirect + CSstart + year
+
age*sex + (1|id), data = birds, family = gaussian(link = log), nAGQ = 10,
control = glmerControl(optimizer = "bobyqa"))
summary(logGausDist)
The interpretation from these two models are mostly the same: only
starting
latitude is a marginally significant predictor of bird migration
distance.
Correct?
2. If the log link is the best or most appropriate to use, then the
summary(mDist) produces a sd of the random effect = 0 with the bobyqa
optimizer. Switching to Nelder_Mead gives a reasonable sd, but
(For clarity, I assume that by "sd of the random effect? you mean the
square root of the variance parameter that gauges residual inter-bird
variation in mean distance and not the SD of the estimate of that
parameter, which anyway isn?t output by glmer.)
Why is a random effect variance estimate of zero implausible? I would
trust a converged estimate over a non-converged estimate, regardless of
whether the estimate is zero. Also? you could compare the
using logLik() ? you?d expect the converged fit to have a higher LL.
Yes, I believe your assumption is correct. In case I am wrong, I'm
referring to these estimates from the summary(model) output:
Random effects:
Groups Name Variance Std.Dev.
id (Intercept) 0.00000 0.0000
Residual 0.02879 0.1697
Number of obs: 137, groups: id, 79
The reason I said that a Std.Dev. = 0 is implausible is because the
ecologist in me says that there is no way that individual birds do not
vary
between each other (or even within for birds with multiple migration
route
data). Am I misunderstanding the meaning of the Std.Dev here?
Another quick check I often do is to fit the non-converged model with
glmmTMB (which appears to be more robust than lme4), and compare
likelihoods and estimates with lme4.
A quick and dirty model fit assessment is to simulate from the fitted
model (which is as easy as simulate(my.fit)), and see if the simulated
responses look more or less like the real responses.
Good luck,
Paul
[[alternative HTML version deleted]]