Please advise: I have a dichotomous outcome on 2500 individuals. From 18 geographical areas, and many households nested within areas. I need to assess the association between various predictors and my outcome, adjusting for the correlation within households, as well as within areas. The following R functions provide dramatically different results. glmer(CC~predictor+1|area/household,family=binomial) and glmmPQL(CC~predictor, random=~1|area/household),family=binomial) Why? Which is correct? Thanks in advance. (I posted this on another site too.) Lize
nested mixed effects logistic regression binomial glm) results differ by function.
5 messages · David Duffy, Thierry Onkelinx, Linus Holtermann +1 more
On Fri, 24 Apr 2015, Lize van der Merwe wrote:
I have a dichotomous outcome on 2500 individuals. From 18 geographical areas, and many households nested within areas. I need to assess the association between various predictors and my outcome, adjusting for the correlation within households, as well as within areas. The following R functions provide dramatically different results. glmer(CC~predictor+1|area/household,family=binomial) glmmPQL(CC~predictor, random=~1|area/household),family=binomial)
PQL is known to be biased, the amount depending on a few things including the proportion CC in the sample, and number of levels for the REs. You could try hglm (package hglm, using EQL) and see how different the results are from that ;) It is also possible one or both programs encountered numerical problems because of features of your data. If you can send your original data, or simulated data of the same structure (that gives a similar problem!), we could have a look. Cheers, David. | David Duffy (MBBS PhD) | email: David.Duffy at qimrberghofer.edu.au ph: INT+61+7+3362-0217 fax: -0101 | Genetic Epidemiology, QIMR Berghofer Institute of Medical Research | 300 Herston Rd, Brisbane, Queensland 4006, Australia GPG 4D0B994A
Dear Lize, glmmPQL() uses Penalized Quasi-Likelihood and glmer() uses the likelihood in case of a binomial family. I prefer methods that uses the likelihood. Best regards, ir. Thierry Onkelinx Instituut voor natuur- en bosonderzoek / Research Institute for Nature and Forest team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance Kliniekstraat 25 1070 Anderlecht Belgium To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher The plural of anecdote is not data. ~ Roger Brinner The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data. ~ John Tukey 2015-04-23 18:16 GMT+02:00 Lize van der Merwe <lizestats at gmail.com>:
Please advise:
I have a dichotomous outcome on 2500 individuals. From 18 geographical
areas, and many households nested within areas. I need to assess the
association between various predictors and my outcome, adjusting for the
correlation within households, as well as within areas. The following R
functions provide dramatically different results.
glmer(CC~predictor+1|area/household,family=binomial)
and
glmmPQL(CC~predictor, random=~1|area/household),family=binomial)
Why? Which is correct?
Thanks in advance. (I posted this on another site too.)
Lize
[[alternative HTML version deleted]]
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
Dear Lize, maybe you give Bayesian methods a try. The excellent MCMCglmm package should be able to handle your model. Often MCMC provides more reliable results when a wide range of variation in group size and relative small number of observations per group are present in the data. Best regards, Linus Holtermann Hamburgisches WeltWirtschaftsInstitut gemeinn?tzige GmbH (HWWI) Heimhuder Stra?e 71 20148 Hamburg Tel +49-(0)40-340576-336 Fax+49-(0)40-340576-776 Internet: www.hwwi.org Email: holtermann at hwwi.org ? Amtsgericht Hamburg HRB 94303 Gesch?ftsf?hrer: PD Dr. Christian Growitsch | Prof. Dr. Henning V?pel Prokura: Dipl. Kauffrau Alexis Malchin Umsatzsteuer-ID: DE 241849425 -----Urspr?ngliche Nachricht----- Von: R-sig-mixed-models [mailto:r-sig-mixed-models-bounces at r-project.org] Im Auftrag von Thierry Onkelinx Gesendet: Freitag, 24. April 2015 09:43 An: Lize van der Merwe Cc: r-sig-mixed-models at r-project.org Betreff: Re: [R-sig-ME] nested mixed effects logistic regression binomial glm) results differ by function. Dear Lize, glmmPQL() uses Penalized Quasi-Likelihood and glmer() uses the likelihood in case of a binomial family. I prefer methods that uses the likelihood. Best regards, ir. Thierry Onkelinx Instituut voor natuur- en bosonderzoek / Research Institute for Nature and Forest team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance Kliniekstraat 25 1070 Anderlecht Belgium To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher The plural of anecdote is not data. ~ Roger Brinner The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data. ~ John Tukey 2015-04-23 18:16 GMT+02:00 Lize van der Merwe <lizestats at gmail.com>:
Please advise:
I have a dichotomous outcome on 2500 individuals. From 18 geographical
areas, and many households nested within areas. I need to assess the
association between various predictors and my outcome, adjusting for
the correlation within households, as well as within areas. The
following R functions provide dramatically different results.
glmer(CC~predictor+1|area/household,family=binomial)
and
glmmPQL(CC~predictor, random=~1|area/household),family=binomial)
Why? Which is correct?
Thanks in advance. (I posted this on another site too.)
Lize
[[alternative HTML version deleted]]
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
Thank you so very much to everyone responding to my request. I learnt a lot. You helped me figure out my mistake. I wanted to adjust for the correlation inside households. Most of the households, however, contained a single individual. When I combined them into a single cluster, the answers were exactly what I needed. Regards Lize -----Original Message----- From: David Duffy [mailto:David.Duffy at qimr.edu.au] Sent: 24 April 2015 05:12 To: Lize van der Merwe Cc: r-sig-mixed-models at r-project.org Subject: Re: [R-sig-ME] nested mixed effects logistic regression binomial glm)results differ by function.
On Fri, 24 Apr 2015, Lize van der Merwe wrote:
I have a dichotomous outcome on 2500 individuals. From 18 geographical areas, and many households nested within areas. I need to assess the association between various predictors and my outcome, adjusting for the correlation within households, as well as within areas. The following R functions provide dramatically different results. glmer(CC~predictor+1|area/household,family=binomial) glmmPQL(CC~predictor, random=~1|area/household),family=binomial)
PQL is known to be biased, the amount depending on a few things including the proportion CC in the sample, and number of levels for the REs. You could try hglm (package hglm, using EQL) and see how different the results are from that ;) It is also possible one or both programs encountered numerical problems because of features of your data. If you can send your original data, or simulated data of the same structure (that gives a similar problem!), we could have a look. Cheers, David. | David Duffy (MBBS PhD) | email: David.Duffy at qimrberghofer.edu.au ph: INT+61+7+3362-0217 fax: | -0101 Genetic Epidemiology, QIMR Berghofer Institute of Medical | Research | 300 Herston Rd, Brisbane, Queensland 4006, Australia GPG 4D0B994A