Skip to content

Problems fitting GLMM and getting AIC

3 messages · Mario Garrido, Thierry Onkelinx, Mollie Brooks

#
Dear lme4-users,
I am trying to fit my data to a generalized linear model with repeated
measures. My data consists on the relative amount of bacterial colonies in
hosts' blood provided by qPCR (qpcr.bart), I measure the amount of qpcr.bart at
7 different time points (day) for each of the 19 individuals (exp.ID).
As random
factor I use (1 | exp.ID). At time zero all the values are 0 cause the
indivuals were still not infected.
As a fixed factor, I also include treatment (trtmnt): whether animals were
infected with 1 bacteria (Bartonella) or 2 (coinfected with another)

Attach is my data. I want to fit my data to different model distributions
(even if I know in advance that are not the correct distributionss) then
compare between them using AICc, as some reviewers asked me

#I have no problem to fit the model to *Gaussian*. even if I know that my
distribution is not normal, as analyses of the residuals show.
lmer(qpcr.bart ~ day * trtmnt + (1 | exp.ID)) ->mgauss

#I get an error when trying to fit to *Gamma*, even if I add 1 to
qpcr.bart cause
Gamma not accept zeroes
glmer((qpcr.bart+1) ~ day * trtmnt + ( 1 | exp.ID), family = Gamma) ->mGamma
Error in (function (fr, X, reTrms, family, nAGQ = 1L, verbose = 0L, maxit =
100L,  :
  (maxstephalfit) PIRLS step-halvings failed to reduce deviance in
pwrssUpdate

*Q1:* is this error due to the low variability in the data and the use of
too many parameters?

#By round the numbers and avoid integers (dezimals) I can consider data as
count data and fit to *Poisson*. I also have the following problems and, in
addition, I do not know whether I can extract AIC or any other I-T index
glmer(round(qpcr.bart,digits=0) ~ day * trtmnt + ( 1 | exp.ID), family =
poisson) ->mPoisson
Warning messages:
1: In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv,  :
  Model failed to converge with max|grad| = 0.00263583 (tol = 0.001,
component 1)
2: In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv,  :
  Model is nearly unidentifiable: very large eigenvalue
 - Rescale variables?;Model is nearly unidentifiable: large eigenvalue ratio
 - Rescale variables?

*Q2:* What this error meaning? I know I have overdispersion, but how to
deal with it? is this problem related with overdispersion?

#Lastly, if I accept that I have overdispersion I can go to Quasi approac
(but do not give me AIC) or try to converge to *negative binomial *using
glmer.nb from MASS.
glmer.nb(qpcr.bart ~ day * trtmnt + ( 1 | exp.ID))
Also I have 50 or more warnings
Mainly these ones
 1: In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv,
... :
  Model failed to converge with max|grad| = 0.00263583 (tol = 0.001,
component 1)
2: In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv,
... :
  Model is nearly unidentifiable: very large eigenvalue
 - Rescale variables?;Model is nearly unidentifiable: large eigenvalue ratio
 - Rescale variables?
3: In theta.ml(Y, mu, weights = object at resp$weights, limit = limit,  ... :
  iteration limit reached
4: In sqrt(1/i) : NaNs produced

*Q3:* At this point I am totally lost, what is the problem?

*Q4:* Another question regarding the random factor: What is the meaning of
using (day | exp.ID) instead of (1 | exp.ID) as a random factor?

Thanks, here is my data


*exp ID* *BM* *day* *qpcr bart* *trtmnt *
EA115 35.15 0 0 coinfection
EA115 33.51 11 24100 coinfection
EA115 35.01 21 10700 coinfection
EA115 35.34 31 1580 coinfection
EA115 30.89 42 228 coinfection
EA115 32.98 52 8.75 coinfection
EA115 33.56 63 23.9 coinfection
EA100 40.18 0 0 coinfection
EA100 41.64 11 1710000 coinfection
EA100 42.23 21 2830 coinfection
EA100 42.4 31 245 coinfection
EA100 43.03 42 5.81 coinfection
EA100 43.35 52 0 coinfection
EA100 44.32 63 16 coinfection
EA109 41.12 0 0 coinfection
EA109 39.98 11 1440000 coinfection
EA109 39.9 21 6710 coinfection
EA109 41.23 31 28.9 coinfection
EA109 41.58 42 10.8 coinfection
EA109 42.1 52 0 coinfection
EA109 42.38 63 14.1 coinfection
EA91 33.31 0 0 coinfection
EA91 33.74 11 752000 coinfection
EA91 32.68 21 3850 coinfection
EA91 34.21 31 1460 coinfection
EA91 33.29 42 28.6 coinfection
EA91 34.38 52 0 coinfection
EA91 34.05 63 10.6 coinfection
EA86 38.56 11 18600 coinfection
EA86 38.01 21 11300 coinfection
EA86 37.63 0 0 coinfection
EA86 38.5 31 74.4 coinfection
EA86 38.68 42 7.18 coinfection
EA86 38.93 52 0 coinfection
EA86 38.45 63 20 coinfection
EA90 51.94 0 0 coinfection
EA90 50.59 11 1000000 coinfection
EA90 50.19 21 4030 coinfection
EA90 51.34 31 134 coinfection
EA90 51.17 42 11.7 coinfection
EA90 52.54 52 0 coinfection
EA90 52.48 63 15 coinfection
EA102 36.54 0 0 coinfection
EA102 37.37 11 1270000 coinfection
EA102 37.49 21 12200 coinfection
EA102 38.34 31 2040 coinfection
EA102 36.51 42 35.2 coinfection
EA102 37.23 52 8.19 coinfection
EA102 37.57 63 0 coinfection
EA104 40.37 0 0 coinfection
EA104 38.74 11 1020000 coinfection
EA104 39.48 21 8910 coinfection
EA104 40.81 31 2530 coinfection
EA104 39.37 42 26.9 coinfection
EA104 40.78 52 14.4 coinfection
EA104 40.96 63 0 coinfection
EA116 33.31 0 0 coinfection
EA116 32.85 11 442000 coinfection
EA116 33.29 21 7840 coinfection
EA116 34.06 31 1890 coinfection
EA116 33.92 42 204 coinfection
EA116 34.39 52 14.5 coinfection
EA116 34.42 63 14.5 coinfection
EA118 40.06 0 0 coinfection
EA118 40.78 11 330000 coinfection
EA118 41.66 21 1790 coinfection
EA118 42.06 31 153 coinfection
EA118 42.17 42 7.09 coinfection
EA118 42.28 52 12.5 coinfection
EA118 43.07 63 13.7 coinfection
EA96 34.39 0 0 Bart
EA96 33.94 11 1930000 Bart
EA96 34.65 21 1500 Bart
EA96 35.66 31 16.6 Bart
EA96 32.8 42 0 Bart
EA96 32.22 52 13.3 Bart
EA96 33.01 63 11.8 Bart
EA92 45.53 0 0 Bart
EA92 47.55 11 0 Bart
EA92 47.81 21 13000 Bart
EA92 48.38 31 750 Bart
EA92 47.33 42 18.7 Bart
EA92 47.37 52 0 Bart
EA92 47.27 63 18.6 Bart
EA97 35.42 0 0 Bart
EA97 33.37 11 480000 Bart
EA97 33.02 21 4200 Bart
EA97 34.88 31 17.9 Bart
EA97 34.63 42 0 Bart
EA97 34.88 52 0 Bart
EA97 34.75 63 11.2 Bart
EA119 45.31 0 0 Bart
EA119 42.72 11 468000 Bart
EA119 43.31 21 2320 Bart
EA119 44.06 31 59.4 Bart
EA119 44.6 42 9.56 Bart
EA119 43.51 52 0 Bart
EA119 45.08 63 0 Bart
EA112 40.15 0 0 Bart
EA112 39.54 11 1650000 Bart
EA112 39.43 21 3190 Bart
EA112 40.72 31 15.7 Bart
EA112 38.94 42 0 Bart
EA112 38.8 52 0 Bart
EA112 38.42 63 19.1 Bart
EA93 30.54 0 0 Bart
EA93 29.32 11 892000 Bart
EA93 28.89 21 7940 Bart
EA93 28.77 31 36.7 Bart
EA93 27.92 42 14.8 Bart
EA93 28.71 52 0 Bart
EA93 28 63 14.9 Bart
EA114 46.01 0 0 Bart
EA114 41.42 11 0 Bart
EA114 41.09 21 4780 Bart
EA114 41.7 31 18.6 Bart
EA114 41.78 42 17.4 Bart
EA114 41.91 52 6.24 Bart
EA114 43.01 63 20.7 Bart
EA120 41.86 0 0 Bart
EA120 40.95 11 1110000 Bart
EA120 41.19 21 6450 Bart
EA120 41.96 31 8.2 Bart
EA120 41.66 42 0 Bart
EA120 41.86 52 10.3 Bart
EA120 41.15 63 14.5 Bart
EA105 32.4 0 0 Bart
EA105 34.35 11 1850000 Bart
EA105 34.53 21 13800 Bart
EA105 34.82 31 1290 Bart
EA105 33.8 42 64.4 Bart
EA105 35.13 52 14.7 Bart
EA105 34.44 63 17.7 Bart
EA82 36.86 0 0 Bart
EA82 36.84 11 698000 Bart
EA82 36.24 21 1470 Bart
EA82 37.11 31 10.4 Bart
EA82 36 42 0 Bart
EA82 36.3 52 5.23 Bart
EA82 36.53 63 15.1 Bart
5 days later
#
Dear Mario,

A reproducible example makes it much easier to help you. See
http://reprex.tidyverse.org/articles/reprex.html and
https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example

I'm not sure if you can compare AICc when using different
distributions. Rather choose the relevant distribution based on the
dataset.

What is the relevant of the measurements on day 0? Are they 0 by
definition? In that case I would omit then from the dataset.

Q4: see http://bbolker.github.io/mixedmodels-misc/glmmFAQ.html#model-definition

For your other questions: please post a reproducible example.

Best regards,

ir. Thierry Onkelinx
Statisticus / Statistician

Vlaamse Overheid / Government of Flanders
INSTITUUT VOOR NATUUR- EN BOSONDERZOEK / RESEARCH INSTITUTE FOR NATURE
AND FOREST
Team Biometrie & Kwaliteitszorg / Team Biometrics & Quality Assurance
thierry.onkelinx at inbo.be
Kliniekstraat 25, B-1070 Brussel
www.inbo.be

///////////////////////////////////////////////////////////////////////////////////////////
To call in the statistician after the experiment is done may be no
more than asking him to perform a post-mortem examination: he may be
able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher
The plural of anecdote is not data. ~ Roger Brinner
The combination of some data and an aching desire for an answer does
not ensure that a reasonable answer can be extracted from a given body
of data. ~ John Tukey
///////////////////////////////////////////////////////////////////////////////////////////


Van 14 tot en met 19 december 2017 verhuizen we uit onze vestiging in
Brussel naar het Herman Teirlinckgebouw op de site Thurn & Taxis.
Vanaf dan ben je welkom op het nieuwe adres: Havenlaan 88 bus 73, 1000 Brussel.

///////////////////////////////////////////////////////////////////////////////////////////



2017-11-15 16:50 GMT+01:00 Mario Garrido <gaiarrido at gmail.com>:
#
Hi Mario,

In general, I believe it?s ok to simultaneously select (using information criteria) both the formula and distribution because the amount of variance and its shape around the mean depends on what predictors are in the formula. However, rounding the response variable to be integers is almost always a bad idea so I would stop considering the Poisson and negative binomial unless there is some reason why qPCR should really be an integer. Is this what the reviewers asked you to do? I would first check what distributions other people use for qPCR. 

You may want to model the zeros separately using a hurdle model with predictors (~ day * trtmnt + (1 | exp.ID)) on the zero vs non-zeros. This could be done in two steps with glmer: 
(1) do logistic regression to model zero vs non-zero qPCR (family=binomial) 
and then (2) for the subset of the data that has a non-zero qPCR, model it with family=Gamma, or the log of this subset of the qPCR could be Gaussian. It?s ok to select between the Gamma and Lognormal for this subset. Do not add 1 to the qPCR.

Based on the snippets of code you sent, it looks like you don?t have your data in a data.frame. Putting the data in a data frame will make the analyses easier and it will be easier to get the right subset for the positive part of the hurdle model. 

Assuming your data frame is called dat?
z1 = glmer((qpcr.bart > 0) ~ day * trtmnt + ( 1 | exp.ID), family=binomial, dat)

c1= glmer(qpcr.bart ~ day * trtmnt + ( 1 | exp.ID), family=Gamma, subset(dat, qpcr.bart > 0))
c2= lmer(log(qpcr.bart) ~ day * trtmnt + ( 1 | exp.ID), subset(dat, qpcr.bart > 0))
Then compare the AICc values of c1 and c2.

cheers,
Mollie