very different model estimates in GLM vs. GLMM
Dear Vincenzo, 1. They are conditional on the random effects. Using re.form = ~ 0 gives predictions for the ?average? item, that is the item with random intercept = 0. It is NOT the average over all items. It only takes the fixed effects into account to make the **predictions**. Note that the fixed effects were fitted conditional on the random effects. 2. Re.form = NULL in my example would give a bunch of parallel curves. It would give predictions for the **observed** items. 3. No, that would still be conditional. If you need the marginal distribution, you need to integrate over the random effects. As Henrik pointed out in https://stat.ethz.ch/pipermail/r-sig-mixed-models/2014q2/022258.html. Best regards, ir. Thierry Onkelinx Instituut voor natuur- en bosonderzoek / Research Institute for Nature and Forest team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance Kliniekstraat 25 1070 Anderlecht Belgium + 32 2 525 02 51 + 32 54 43 61 85 Thierry.Onkelinx at inbo.be www.inbo.be To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher The plural of anecdote is not data. ~ Roger Brinner The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data. ~ John Tukey Van: Vincenzo Ellis [mailto:vincenzoaellis at gmail.com] Verzonden: zaterdag 20 december 2014 16:26 Aan: ONKELINX, Thierry CC: r-sig-mixed-models at r-project.org Onderwerp: Re: [R-sig-ME] very different model estimates in GLM vs. GLMM Dear Thierry, Thank you so much for the example and explanation. I have just a few follow-up questions for you or anyone else in the group if you have time. Or if there is a good reference that I could be directed to, that would also be great, as I clearly have some reading to do. 1. On what are the conditional distributions conditional? I noticed that in the predict function in your code for the glmer model you set re.form equal to ~0, which the help file for predict.merMod leads me to believe means that the random effect in the model was not accounted for. So what is going on there, i.e., what is the conditional distribution based on there? And are the fixed effects estimates from the glmer model conditional on the random effect of the model? 2. If you had let the predict function for the glmer model use the random effect from that model (e.g., re.form = NULL), how would it have worked? Would it have compared observations that are in different levels of the random effect somehow? 3. This last one has been asked before on this listserv (https://stat.ethz.ch/pipermail/r-sig-mixed-models/2014q2/022258.html), and that is how to make glmer give marginal probabilities instead of conditional probabilities. The suggested answers apparently all ignore random effects in the model. If you ignore the random effects in the glmer model would the marginal probabilities be equivalent to those from a glm model with no random effects? Thanks again for all your help. Vincenzo
On Fri, Dec 19, 2014 at 8:33 PM, ONKELINX, Thierry <Thierry.ONKELINX at inbo.be> wrote:
Dear Vincenzo, You are getting confused by the difference between a marginal and conditinal distribution. The glm returns the marginal distribution, while the glmm returns the conditional distribution. These distributions coindice with the gaussian distribution, but they don't with the binomial distribution. I've put a example together to illustrate the difference between the marginal and conditional distribution in the binomial case. https://gist.github.com/anonymous/b1188a5cd3213132267a Best regards, Thierry ir. Thierry Onkelinx Instituut voor natuur- en bosonderzoek / Research Institute for Nature and Forest team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance Kliniekstraat 25 1070 Anderlecht Belgium + 32 2 525 02 51 + 32 54 43 61 85 Thierry.Onkelinx at inbo.be www.inbo.be To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher The plural of anecdote is not data. ~ Roger Brinner The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data. ~ John Tukey ________________________________________ Van: R-sig-mixed-models [r-sig-mixed-models-bounces at r-project.org] namens Vincenzo Ellis [vincenzoaellis at gmail.com] Verzonden: vrijdag 19 december 2014 18:45 Aan: r-sig-mixed-models at r-project.org Onderwerp: [R-sig-ME] very different model estimates in GLM vs. GLMM Dear all, I'm trying to model parasite prevalence in bird species as a function of the nest height (factor) of those species. When I run a binomial GLM I get estimates for each nest height level that are rather similar to the mean observed prevalence at each level. However when I run a binomial GLMM with taxonomic family as a random effect (intercept) one of the level estimates differs greatly from the GLM estimates. I suspect the problem is that the random effect has only one or two replicates per level (i.e., species), but I want to confirm that I haven't made a mistake in coding. Data and code follow. Thanks for any insights and for taking the time to help. Vincenzo *# the data* data <- structure(list(Species_Code = structure(1:20, .Label = c("AMGO", "AMRO", "BGGN", "BHCO", "BRTH", "CARW", "COYE", "EABL", "EAPH", "FISP", "HOFI", "HOSP", "INBU", "NOBO", "NOCA", "NOMO", "SOSP", "TRES", "WEVI", "YBCH"), class = "factor"), n = c(19L, 5L, 5L, 6L, 8L, 16L, 32L, 4L, 4L, 27L, 4L, 12L, 50L, 4L, 36L, 9L, 5L, 8L, 8L, 29L), taxonomic.family = structure(c(3L, 12L, 10L, 5L, 6L, 11L, 8L, 12L, 13L, 2L, 3L, 9L, 1L, 7L, 1L, 6L, 2L, 4L, 14L, 8L), .Label = c("Cardinalidae", "Emberizidae", "Fringillidae", "Hirundinidae", "Icteridae", "Mimidae", "Odontophoridae", "Parulidae", "Passeridae", "Polioptilidae", "Troglodytidae", "Turdidae", "Tyrannidae", "Vireonidae"), class = "factor"), Total.pos = c(1L, 3L, 2L, 2L, 4L, 1L, 7L, 3L, 0L, 13L, 1L, 1L, 29L, 0L, 32L, 6L, 5L, 0L, 1L, 18L), Total.neg = c(18L, 2L, 3L, 4L, 4L, 15L, 25L, 1L, 4L, 14L, 3L, 11L, 21L, 4L, 4L, 3L, 0L, 8L, 7L, 11L), nest_ht2 = structure(c(3L, 2L, 3L, 3L, 2L, 2L, 1L, 2L, 2L, 1L, 3L, 2L, 1L, 1L, 2L, 2L, 2L, 3L, 1L, 1L), .Label = c("1", "2", "3"), class = "factor")), .Names = c("Species_Code", "n", "taxonomic.family", "Total.pos", "Total.neg", "nest_ht2"), class = "data.frame", row.names = c(3L, 5L, 7L, 8L, 11L, 13L, 17L, 19L, 20L, 23L, 25L, 26L, 27L, 28L, 29L, 30L, 34L, 37L, 40L, 43L)) *# when we look at the total prevalence across nest heights (i.e., nest_ht2) we see:* require(plyr) (observed <- ddply(data, "nest_ht2", summarize, mean.prevalence = mean(Total.pos/n))) *# if we run a normal binomial GLM the estimates for each of the nest heights are* # quick function to convert logit scale values to probabilities probs <- function(x){ exp(x)/(exp(x)+1) } model.basic <- glm(cbind(Total.pos, Total.neg) ~ 0 + nest_ht2, family=binomial, data=data) probs(coef(model.basic)) *# if we run a binomial GLMM with taxonomic.family as a random effect (intercept), the estimates* *# for each of the nest heights are* require(lme4) model.mixed <- glmer(cbind(Total.pos, Total.neg) ~ 0 + nest_ht2 + (1|taxonomic.family), family=binomial, data=data) probs(fixef(model.mixed)) *# if we put all of these together we can see that the model.basic is closer to the observed mean prevalence across* *# nest heights than the model.mixed is. In particular, in the model.mixed the estimate for the first level of nest * *# height is much lower (0.14) than in the model.basic (0.45).* data.frame(observed, model.basic = probs(coef(model.basic)), model.mixed = probs(fixef(model.mixed))) *# Could this difference between the GLM and the GLMM have to do with the low number of replicates in each of the* *# levels of the random effect (taxonomic.family)? Or is there a problem with coding?* arrange(ddply(data, "taxonomic.family", summarize, taxonomic.family_sample_size = length(taxonomic.family)), desc(taxonomic.family_sample_size)) *# You can see that only six of the families have more than one species sampled and none of those have more than two. If the* *# problem really is the mixed effect model not having enough data, how should one respond to reviewers who insist that* *# taxonomy must be controlled for in such an analysis?* _______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models Disclaimer Bezoek onze website / Visit our website<https://drupal.inbo.be/nl/disclaimer-mailberichten-van-het-inbo> Disclaimer Bezoek onze website / Visit our website<https://drupal.inbo.be/nl/disclaimer-mailberichten-van-het-inbo>