Thank you Paul, I appreciate your time. And, apologies if my understanding
is often incomplete.
Hi Scott,
An incomplete answer?
1. Is a Gamma distribution best for my distance data? If so, which link
function is most appropriate? I explored two link functions: identity and
log. I have concerns and see potential issues with both (see my
annotations
in the reproducible example below.
I don?t know (I haven?t run your code) but I?ve always somehow managed to
avoid gamma regression for strictly positive data by logging the response
and fitting a model with normal errors.
If possible, I'd rather not transform the raw data to facilitate
interpretation of the coefficient estimates. I'm likely naive or
misunderstanding something though. Log transforming the distance data does
produce a reasonably normal distribution. The following two models have
very similar AIC, BIC, LogLik, etc. estimates and the p-values of the fixed
effects produce similar interpretations. However, the fixed effects
estimates are quite different.
gammaDist <- glmer(distance ~ CSs.lat + CSdirect + CSstart + year + age*sex
+ (1|id), data = birds, family = Gamma(link = log), nAGQ = 10, control =
glmerControl(optimizer = "bobyqa"))
summary(gammaDist)
logGausDist <- glmer(log(distance) ~ CSs.lat + CSdirect + CSstart + year +
age*sex + (1|id), data = birds, family = gaussian(link = log), nAGQ = 10,
control = glmerControl(optimizer = "bobyqa"))
summary(logGausDist)
The interpretation from these two models are mostly the same: only starting
latitude is a marginally significant predictor of bird migration distance.
Correct?
2. If the log link is the best or most appropriate to use, then the
summary(mDist) produces a sd of the random effect = 0 with the bobyqa
optimizer. Switching to Nelder_Mead gives a reasonable sd, but throws a
convergence warning.
(For clarity, I assume that by "sd of the random effect? you mean the
square root of the variance parameter that gauges residual inter-bird
variation in mean distance and not the SD of the estimate of that
parameter, which anyway isn?t output by glmer.)
Why is a random effect variance estimate of zero implausible? I would
trust a converged estimate over a non-converged estimate, regardless of
whether the estimate is zero. Also? you could compare the log-likelihoods
using logLik() ? you?d expect the converged fit to have a higher LL. For
more general troubleshooting of convergence warnings:
http://rpubs.com/bbolker/lme4trouble1
Yes, I believe your assumption is correct. In case I am wrong, I'm
referring to these estimates from the summary(model) output:
Random effects:
Groups Name Variance Std.Dev.
id (Intercept) 0.00000 0.0000
Residual 0.02879 0.1697
Number of obs: 137, groups: id, 79
The reason I said that a Std.Dev. = 0 is implausible is because the
ecologist in me says that there is no way that individual birds do not vary
between each other (or even within for birds with multiple migration route
data). Am I misunderstanding the meaning of the Std.Dev here?
Another quick check I often do is to fit the non-converged model with
glmmTMB (which appears to be more robust than lme4), and compare
likelihoods and estimates with lme4.
A quick and dirty model fit assessment is to simulate from the fitted
model (which is as easy as simulate(my.fit)), and see if the simulated
responses look more or less like the real responses.
Good luck,
Paul