Hello,
I have a few questions concering the interpretation of a GLMM output table
when the model includes an interaction.
We want to analyse bear presence at feeding sites (bear_pres) related to
the year (two years: 2016, 2017) and the feed supplied at feeding sites
(carrion, maize). So the response is binary (0 = no bear present, 1 = bear
present within 5-min intervals over the whole day) and both predictors are
categorical, we include feeding site ID as random factor.
The model includes some other variables too but for simplicity I just use
those two variables for explanation.
1) As I understand, in a model without interaction, the interpretation of
the results would be as follows:
M1 <- glmer((bear_pres ~ feed + year + (1|Feeding.site), family=binomial,
data=df10)
Fixed effects:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -4.58524 0.08529 -53.76 <2e-16 ***the intercept is bear
presence at maize sites in 2016
feedcarrion 0.39178 0.02139 18.32 <2e-16 ***bear presence at
feeding sites in 2017 compared to 2016
year2017 0.23027 0.01978 11.64 <2e-16 ***bear presence at carrion
feeding sites compared to maize feeding sites
Is this interpretation right?
2) To my knowledge, the output changes when you include an interaction:
M2<- glmer(bear_pres ~ year*feed + (1|Feeding.site), family=binomial,
data=df10)
Fixed effects:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -4.36413 0.10730 -40.67 < 2e-16 ***the intercept is
bear presence at maize sites in 2016 (baseline)
year2017 -0.18010 0.05119 -3.52 0.000434 ***difference in bear
presence in 2017 compared to 2016 for maize
feedcarrion -0.02933 0.05318 -0.55 0.581222 difference in
bear presence at carrion sites compared to maize sites in 2016
year2017:feedcarrion 0.85275 0.09953 8.57 < 2e-16 ***difference in
bear presence at carrion sites 2017 and the sum of ?0+ ?1+ ?2
So to my questions: Is this interpretation right? What is the coding of the
model so it does produce this output, e.g. why is the year not comparing
2016 to 2017 anymore as in the model without the interaction? Or why
doesn?t the model still use the two food types for comparison?
As I understand, when you include an intercation between the two binary
dummy-coded categorical variables, the interpretation of what was main
effects before (year, carrion) changes, and so do the betas (these are
called ?simple effects? afterwards).
In my group, there is a strong believe that in M2, the year still compares
the two years (and so does feed), it?s just the coefficient cannot be
interpreted anymore. Also, there is a believe that the interaction term
compares to feedmaize in the year 2016.
If my interpreation is correct, I need some background on how the algorithm
works, how simple effects evolve and why the interaction should be
interpreted as in the output table of M2.
Thank you for your help in advance!
Mixed model interpretation with interaction
8 messages · Patricia Graf, d@iuedecke m@iii@g oii uke@de, René +1 more
Hello Patricia, I think the interpretation of the fixed part is the same as for a classical glm. See below...
On Sun, Jun 09, 2019 at 09:16:43AM +0200, Patricia Graf wrote:
? Hello,
?
? I have a few questions concering the interpretation of a GLMM output table
? when the model includes an interaction.
?
? We want to analyse bear presence at feeding sites (bear_pres) related to
? the year (two years: 2016, 2017) and the feed supplied at feeding sites
? (carrion, maize). So the response is binary (0 = no bear present, 1 = bear
? present within 5-min intervals over the whole day) and both predictors are
? categorical, we include feeding site ID as random factor.
?
? 1) As I understand, in a model without interaction, the interpretation of
? the results would be as follows:
?
? M1 <- glmer((bear_pres ~ feed + year + (1|Feeding.site), family=binomial,
? data=df10)
?
? Fixed effects:
?
? Estimate Std. Error z value Pr(>|z|)
?
? (Intercept) -4.58524 0.08529 -53.76 <2e-16 ***the intercept is bear
? presence at maize sites in 2016
More exactly, it is the log-odds of bear presence for an average site
where maize is supplied in 2016
? feedcarrion 0.39178 0.02139 18.32 <2e-16 ***bear presence at
? feeding sites in 2017 compared to 2016
?
? year2017 0.23027 0.01978 11.64 <2e-16 ***bear presence at carrion
? feeding sites compared to maize feeding sites
?
You have swapped the two lines. ? feedcarrion ? is the change in the
log-odds [that is, the log-odds ratio] when replacing maize by carrion
for a given site, both in 2017 and in 2016 since there is no
interaction; ? year2017 ?, the same for 2017 vs 2016.
? Is this interpretation right?
?
? 2) To my knowledge, the output changes when you include an
? interaction:
Yes. Unless you have a perfectly null interaction in the sample, it
has to change, at least with default R coding for factors.
? M2<- glmer(bear_pres ~ year*feed + (1|Feeding.site), family=binomial,
? data=df10)
?
? Fixed effects:
?
? Estimate Std. Error z value Pr(>|z|)
?
? (Intercept) -4.36413 0.10730 -40.67 < 2e-16 ***the intercept is
? bear presence at maize sites in 2016 (baseline)
Same remark as above.
?
? year2017 -0.18010 0.05119 -3.52 0.000434 ***difference in bear
? presence in 2017 compared to 2016 for maize
?
? feedcarrion -0.02933 0.05318 -0.55 0.581222 difference in
? bear presence at carrion sites compared to maize sites in 2016
?
? year2017:feedcarrion 0.85275 0.09953 8.57 < 2e-16 ***difference in
? bear presence at carrion sites 2017 and the sum of ?0+ ?1+ ?2
?
? So to my questions: Is this interpretation right?
Yes.
? What is the coding of the
? model so it does produce this output, e.g. why is the year not comparing
? 2016 to 2017 anymore as in the model without the interaction?
The model is, in the log-odds scale,
y = ?0 + ?1 ? 1(year = 2017) + ?2 ? 1(feed = carrion)
+ ?3 ? 1(year = 2017) ? 1(feed = carrion)
where 1(x) is the ? indicatrice ? of x, that is the function equalling
1 if x is true, 0 otherwise.
? Or why
? doesn?t the model still use the two food types for comparison?
Because the interaction says that the comparison between years is
different for each food type, so a global comparison is (roughly
speaking) meaningless.
? As I understand, when you include an intercation between the two binary
? dummy-coded categorical variables, the interpretation of what was main
? effects before (year, carrion) changes, and so do the betas (these are
? called ?simple effects? afterwards).
The interpretation of what is a main effect does not really change,
but coefficients of the model do not correspond anymore to main
effects. To have main effect evaluation, you should use suited
analysis of variance/deviance tables (see Type I, Type II, Type III
sums of squares and all these kind of things) or coefficients linear
combinations.
? In my group, there is a strong believe that in M2, the year still compares
? the two years (and so does feed), it?s just the coefficient cannot be
? interpreted anymore. Also, there is a believe that the interaction term
? compares to feedmaize in the year 2016.
Not sure to understand, but seems wrong.
? If my interpreation is correct, I need some background on how the algorithm
? works, how simple effects evolve and why the interaction should be
? interpreted as in the output table of M2.
Read more about two-ways, 2?2 crossed factors, analysis of variance:
this is the basic and most simple case to understand interaction. Or
may be, as an alternative, analysis of covariance. Interpretation is
basically the same afterthat for glm/logistic regression, and when
adding random effects. Just a little bit more abstract because of the
change of the modeling scale (log-odds for glm/glmer here, for
instance).
Hope this helps,
Best regards,
Emmanuel CURIS
emmanuel.curis at parisdescartes.fr
Page WWW: http://emmanuel.curis.online.fr/index.html
Dear Patricia, when you include an interaction, your assumption is that the relationship between an independent X1 and the dependent variable Y varies *depending on the values of another independent variable X2*. Indeed, for logistic regression models (as well as for many models with non-Gaussian families), the interpretation of interaction terms can be tricky. In such cases, I would recommend to compute (at least additionally) marginal effects, which give you an intuitive output of your results. You can do so e.g. with the "ggeffects" package (https://strengejacke.github.io/ggeffects/), and there is also an example for a logistic mixed effects model (https://strengejacke.github.io/ggeffects/articles/practical_logisticmixedmodel.html), which might help you. In your case, the code would be ggpredict(M1, c("feed", "year")) for the model with interaction. If you want to plot the results, simply call me <- ggpredict(M1, c("feed", "year")) plot(me) A comment on your model: I'm not sure, but if you compare subjects (or feeding sites) at two time points, you might want to model the auto-correlation of subjects / feeding site ("repeated measure") using your time variable as random slope: M1 <- glmer((bear_pres ~ feed * year + (1 + year | Feeding.site), family = binomial, data = df10) Computing marginal effects than would be the same function call: ggpredict(M1, c("feed", "year")) Best Daniel -----Urspr?ngliche Nachricht----- Von: R-sig-mixed-models <r-sig-mixed-models-bounces at r-project.org> Im Auftrag von Patricia Graf Gesendet: Sonntag, 9. Juni 2019 09:17 An: r-sig-mixed-models at r-project.org Betreff: [R-sig-ME] Mixed model interpretation with interaction Hello, I have a few questions concering the interpretation of a GLMM output table when the model includes an interaction. We want to analyse bear presence at feeding sites (bear_pres) related to the year (two years: 2016, 2017) and the feed supplied at feeding sites (carrion, maize). So the response is binary (0 = no bear present, 1 = bear present within 5-min intervals over the whole day) and both predictors are categorical, we include feeding site ID as random factor. The model includes some other variables too but for simplicity I just use those two variables for explanation. 1) As I understand, in a model without interaction, the interpretation of the results would be as follows: M1 <- glmer((bear_pres ~ feed + year + (1|Feeding.site), family=binomial, data=df10) Fixed effects: Estimate Std. Error z value Pr(>|z|) (Intercept) -4.58524 0.08529 -53.76 <2e-16 ***the intercept is bear presence at maize sites in 2016 feedcarrion 0.39178 0.02139 18.32 <2e-16 ***bear presence at feeding sites in 2017 compared to 2016 year2017 0.23027 0.01978 11.64 <2e-16 ***bear presence at carrion feeding sites compared to maize feeding sites Is this interpretation right? 2) To my knowledge, the output changes when you include an interaction: M2<- glmer(bear_pres ~ year*feed + (1|Feeding.site), family=binomial, data=df10) Fixed effects: Estimate Std. Error z value Pr(>|z|) (Intercept) -4.36413 0.10730 -40.67 < 2e-16 ***the intercept is bear presence at maize sites in 2016 (baseline) year2017 -0.18010 0.05119 -3.52 0.000434 ***difference in bear presence in 2017 compared to 2016 for maize feedcarrion -0.02933 0.05318 -0.55 0.581222 difference in bear presence at carrion sites compared to maize sites in 2016 year2017:feedcarrion 0.85275 0.09953 8.57 < 2e-16 ***difference in bear presence at carrion sites 2017 and the sum of ?0+ ?1+ ?2 So to my questions: Is this interpretation right? What is the coding of the model so it does produce this output, e.g. why is the year not comparing 2016 to 2017 anymore as in the model without the interaction? Or why doesn?t the model still use the two food types for comparison? As I understand, when you include an intercation between the two binary dummy-coded categorical variables, the interpretation of what was main effects before (year, carrion) changes, and so do the betas (these are called ?simple effects? afterwards). In my group, there is a strong believe that in M2, the year still compares the two years (and so does feed), it?s just the coefficient cannot be interpreted anymore. Also, there is a believe that the interaction term compares to feedmaize in the year 2016. If my interpreation is correct, I need some background on how the algorithm works, how simple effects evolve and why the interaction should be interpreted as in the output table of M2. Thank you for your help in advance! _______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models -- _____________________________________________________________________ Universit?tsklinikum Hamburg-Eppendorf; K?rperschaft des ?ffentlichen Rechts; Gerichtsstand: Hamburg | www.uke.de Vorstandsmitglieder: Prof. Dr. Burkhard G?ke (Vorsitzender), Prof. Dr. Dr. Uwe Koch-Gromus, Joachim Pr?l?, Marya Verdel _____________________________________________________________________ SAVE PAPER - THINK BEFORE PRINTING
Hi, I don't know if this adds anything new but the most direct answers that come into my mind would be. 1) It seems you use dummy coding, and this defines the interpretation of the estimated coefficients, which would be different from (often preferred because more easy to interpret) effect / or contrast coding (dummy coding has some 'fitting' advantages which are mainly discussed with respected to centered vs. non-centered likelihood (or least-square-mean) estimation processes, which might be insightful for you to look up in the internet); but any coding design you use will eventually simply try to estimate cell-means (in your case on a log scale), and you need to check how to get these cell means out of your coefficients (via back-transformation). One way of doing this is by using marginal predictions, as Daniel points out. 2) For another (technical) illustration: a test-design matrix as yours with (e.g.) 2 feeding sites and 2 years, then it would be a 2(site 1 vs. site 2) by 2(year 1 vs year 2) independent measures design; or 2 x 2 for short, which could be simply expressed by 4 probabilities or by using means on a log scale, one mean for each of the design-cells, which would be the "centered" variant of estimation; but usually dummy coding implies a non-centered (but mathematically equivalent - standard) coding: If the model is: y = site+year (ignoring random effects now), then cellmean(site1:year1) = Model_Intercept cellmean(site1:year2) = Model_Intercept + year2 cellmean(site2:year1) = Model_Intercept + site2 cellmean(site2:year2) = Model_Intercept + site2 + year2 mean(site1) = (2*Model_intercept + year2)/2 mean(site2) = ( 2(Model_intercept + site2)+year2))/2 and so on... (Where intercept in most estimation methods is by default is defined in reference to the first level of the first predictor in the equation; thus site1 (+year1, which is 0 in this type of coding); but the reference point can be changed manually) If the model is: y=site+year+site:year, then cellmean(site1:year1) = Model_Intercept cellmean(site1:year2) = Model_Intercept + year2 cellmean(site2:year1) = Model_Intercept + site2 cellmean(site2:year2) = Model_Intercept+site2+ year2 + site2:year2 Where only the fourth equation changes, which nontheless can have a huge impact on the estimation of the other parameters (usually R outputs the reference levels for the intercept and the coefficients, which you can easily identify) In case there are more sites than two... e.g.. 4 of them, then: cellmean(site1:year1) = Model_Intercept cellmean(site2:year1) = Model_Intercept + site2 cellmean(site3:year1) = Model_Intercept + site3 cellmean(site4:year1) = Model_Intercept + site4 You might get the gist :) Finally, if you actually want to test for an overall interaction in this way (or main effects), looking at these coefficients is not meaningful, which you can tell by just looking at the formulas above..., So you might want to do it differently (correctly), namely by using likelihood ratio tests: (in R like coding) Model1<- y=site+year+site:year vs Model2<- y=site+year with anova(Model1,Model2) (I think aov() should work as well) If the interaction of both variables is significant (i.e. the anova() output gives a * for the comparison between Model 1 and Model 2... :))) then the interaction effect explains some 'significant' amount of variance. (If there is no *, you can consider the models as equal in terms of explained variance). Same for other effects (e.g. full model vs. model a specific main effect). Maybe Check whether the "afex::mixed" function which does this for you in a sensible way (there are different ways of doing LRT tests...) ;)) Having done this in the first place, is often viewed as prerequisite for 'digging' into the model estimates (as discussed above) to find out, what significant then actually means in terms of 'mean-changes' :) Hope this helps, Best, Ren? Am So., 9. Juni 2019 um 12:45 Uhr schrieb <d.luedecke at uke.de>:
Dear Patricia, when you include an interaction, your assumption is that the relationship between an independent X1 and the dependent variable Y varies *depending on the values of another independent variable X2*. Indeed, for logistic regression models (as well as for many models with non-Gaussian families), the interpretation of interaction terms can be tricky. In such cases, I would recommend to compute (at least additionally) marginal effects, which give you an intuitive output of your results. You can do so e.g. with the "ggeffects" package ( https://strengejacke.github.io/ggeffects/), and there is also an example for a logistic mixed effects model ( https://strengejacke.github.io/ggeffects/articles/practical_logisticmixedmodel.html), which might help you. In your case, the code would be ggpredict(M1, c("feed", "year")) for the model with interaction. If you want to plot the results, simply call me <- ggpredict(M1, c("feed", "year")) plot(me) A comment on your model: I'm not sure, but if you compare subjects (or feeding sites) at two time points, you might want to model the auto-correlation of subjects / feeding site ("repeated measure") using your time variable as random slope: M1 <- glmer((bear_pres ~ feed * year + (1 + year | Feeding.site), family = binomial, data = df10) Computing marginal effects than would be the same function call: ggpredict(M1, c("feed", "year")) Best Daniel -----Urspr?ngliche Nachricht----- Von: R-sig-mixed-models <r-sig-mixed-models-bounces at r-project.org> Im Auftrag von Patricia Graf Gesendet: Sonntag, 9. Juni 2019 09:17 An: r-sig-mixed-models at r-project.org Betreff: [R-sig-ME] Mixed model interpretation with interaction Hello, I have a few questions concering the interpretation of a GLMM output table when the model includes an interaction. We want to analyse bear presence at feeding sites (bear_pres) related to the year (two years: 2016, 2017) and the feed supplied at feeding sites (carrion, maize). So the response is binary (0 = no bear present, 1 = bear present within 5-min intervals over the whole day) and both predictors are categorical, we include feeding site ID as random factor. The model includes some other variables too but for simplicity I just use those two variables for explanation. 1) As I understand, in a model without interaction, the interpretation of the results would be as follows: M1 <- glmer((bear_pres ~ feed + year + (1|Feeding.site), family=binomial, data=df10) Fixed effects: Estimate Std. Error z value Pr(>|z|) (Intercept) -4.58524 0.08529 -53.76 <2e-16 ***the intercept is bear presence at maize sites in 2016 feedcarrion 0.39178 0.02139 18.32 <2e-16 ***bear presence at feeding sites in 2017 compared to 2016 year2017 0.23027 0.01978 11.64 <2e-16 ***bear presence at carrion feeding sites compared to maize feeding sites Is this interpretation right? 2) To my knowledge, the output changes when you include an interaction: M2<- glmer(bear_pres ~ year*feed + (1|Feeding.site), family=binomial, data=df10) Fixed effects: Estimate Std. Error z value Pr(>|z|) (Intercept) -4.36413 0.10730 -40.67 < 2e-16 ***the intercept is bear presence at maize sites in 2016 (baseline) year2017 -0.18010 0.05119 -3.52 0.000434 ***difference in bear presence in 2017 compared to 2016 for maize feedcarrion -0.02933 0.05318 -0.55 0.581222 difference in bear presence at carrion sites compared to maize sites in 2016 year2017:feedcarrion 0.85275 0.09953 8.57 < 2e-16 ***difference in bear presence at carrion sites 2017 and the sum of ?0+ ?1+ ?2 So to my questions: Is this interpretation right? What is the coding of the model so it does produce this output, e.g. why is the year not comparing 2016 to 2017 anymore as in the model without the interaction? Or why doesn?t the model still use the two food types for comparison? As I understand, when you include an intercation between the two binary dummy-coded categorical variables, the interpretation of what was main effects before (year, carrion) changes, and so do the betas (these are called ?simple effects? afterwards). In my group, there is a strong believe that in M2, the year still compares the two years (and so does feed), it?s just the coefficient cannot be interpreted anymore. Also, there is a believe that the interaction term compares to feedmaize in the year 2016. If my interpreation is correct, I need some background on how the algorithm works, how simple effects evolve and why the interaction should be interpreted as in the output table of M2. Thank you for your help in advance! [[alternative HTML version deleted]]
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models -- _____________________________________________________________________ Universit?tsklinikum Hamburg-Eppendorf; K?rperschaft des ?ffentlichen Rechts; Gerichtsstand: Hamburg | www.uke.de Vorstandsmitglieder: Prof. Dr. Burkhard G?ke (Vorsitzender), Prof. Dr. Dr. Uwe Koch-Gromus, Joachim Pr?l?, Marya Verdel _____________________________________________________________________ SAVE PAPER - THINK BEFORE PRINTING _______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
Ps: I also agree with Daniel to take care of repeated measurements of the same bears coming to the sites in both years. However, the main problem I guess, will be that not every bear comes back in the second year. This means, having random slopes for bears that were observed only once, will bias the effect estimate (i.e. the random slopes for year will not be separable from the fixed effect of year). A solution to this, however would be, to use an extra variable (lets call it 'repeat') that codes, whether a bear has there in both years (=1) or not (=0; numeric coding - not factored). Then you the following should work: model<-(y~site*year+(0+repeat*year| bearID)) Which will estimate random slopes for year for all bears that were there at least twice, but not for others, where the term before the | becomes 0 (and nothing happens)). Best, Ren? Pps: You can tell your colleagues that: the Model-intercept is the only direct mean that the model estimates directly (i.e. the reference cell) and all other deviations (including other means) are linear combinations from that intercept (for any factor) ... And ... of course, the parameters still can be interpreted this way (as illustrated above) ... but you need to know some details how to do so :)) (you can impress them now...) The easiest way to let a function reconstruct the model outputs is using emmeans() e.g. emmeans(model1, ~Site) should give the marginal estimates of the - site main effect - (site 1 and site 2 means) on the log scale and emmeans(model1, ~Site, type = "response") will give the estimates on the actual response (probability) scale. I find this often very helpful (also for plotting). Am So., 9. Juni 2019 um 16:57 Uhr schrieb Ren? <bimonosom at gmail.com>:
Hi, I don't know if this adds anything new but the most direct answers that come into my mind would be. 1) It seems you use dummy coding, and this defines the interpretation of the estimated coefficients, which would be different from (often preferred because more easy to interpret) effect / or contrast coding (dummy coding has some 'fitting' advantages which are mainly discussed with respected to centered vs. non-centered likelihood (or least-square-mean) estimation processes, which might be insightful for you to look up in the internet); but any coding design you use will eventually simply try to estimate cell-means (in your case on a log scale), and you need to check how to get these cell means out of your coefficients (via back-transformation). One way of doing this is by using marginal predictions, as Daniel points out. 2) For another (technical) illustration: a test-design matrix as yours with (e.g.) 2 feeding sites and 2 years, then it would be a 2(site 1 vs. site 2) by 2(year 1 vs year 2) independent measures design; or 2 x 2 for short, which could be simply expressed by 4 probabilities or by using means on a log scale, one mean for each of the design-cells, which would be the "centered" variant of estimation; but usually dummy coding implies a non-centered (but mathematically equivalent - standard) coding: If the model is: y = site+year (ignoring random effects now), then cellmean(site1:year1) = Model_Intercept cellmean(site1:year2) = Model_Intercept + year2 cellmean(site2:year1) = Model_Intercept + site2 cellmean(site2:year2) = Model_Intercept + site2 + year2 mean(site1) = (2*Model_intercept + year2)/2 mean(site2) = ( 2(Model_intercept + site2)+year2))/2 and so on... (Where intercept in most estimation methods is by default is defined in reference to the first level of the first predictor in the equation; thus site1 (+year1, which is 0 in this type of coding); but the reference point can be changed manually) If the model is: y=site+year+site:year, then cellmean(site1:year1) = Model_Intercept cellmean(site1:year2) = Model_Intercept + year2 cellmean(site2:year1) = Model_Intercept + site2 cellmean(site2:year2) = Model_Intercept+site2+ year2 + site2:year2 Where only the fourth equation changes, which nontheless can have a huge impact on the estimation of the other parameters (usually R outputs the reference levels for the intercept and the coefficients, which you can easily identify) In case there are more sites than two... e.g.. 4 of them, then: cellmean(site1:year1) = Model_Intercept cellmean(site2:year1) = Model_Intercept + site2 cellmean(site3:year1) = Model_Intercept + site3 cellmean(site4:year1) = Model_Intercept + site4 You might get the gist :) Finally, if you actually want to test for an overall interaction in this way (or main effects), looking at these coefficients is not meaningful, which you can tell by just looking at the formulas above..., So you might want to do it differently (correctly), namely by using likelihood ratio tests: (in R like coding) Model1<- y=site+year+site:year vs Model2<- y=site+year with anova(Model1,Model2) (I think aov() should work as well) If the interaction of both variables is significant (i.e. the anova() output gives a * for the comparison between Model 1 and Model 2... :))) then the interaction effect explains some 'significant' amount of variance. (If there is no *, you can consider the models as equal in terms of explained variance). Same for other effects (e.g. full model vs. model a specific main effect). Maybe Check whether the "afex::mixed" function which does this for you in a sensible way (there are different ways of doing LRT tests...) ;)) Having done this in the first place, is often viewed as prerequisite for 'digging' into the model estimates (as discussed above) to find out, what significant then actually means in terms of 'mean-changes' :) Hope this helps, Best, Ren? Am So., 9. Juni 2019 um 12:45 Uhr schrieb <d.luedecke at uke.de>:
Dear Patricia, when you include an interaction, your assumption is that the relationship between an independent X1 and the dependent variable Y varies *depending on the values of another independent variable X2*. Indeed, for logistic regression models (as well as for many models with non-Gaussian families), the interpretation of interaction terms can be tricky. In such cases, I would recommend to compute (at least additionally) marginal effects, which give you an intuitive output of your results. You can do so e.g. with the "ggeffects" package ( https://strengejacke.github.io/ggeffects/), and there is also an example for a logistic mixed effects model ( https://strengejacke.github.io/ggeffects/articles/practical_logisticmixedmodel.html), which might help you. In your case, the code would be ggpredict(M1, c("feed", "year")) for the model with interaction. If you want to plot the results, simply call me <- ggpredict(M1, c("feed", "year")) plot(me) A comment on your model: I'm not sure, but if you compare subjects (or feeding sites) at two time points, you might want to model the auto-correlation of subjects / feeding site ("repeated measure") using your time variable as random slope: M1 <- glmer((bear_pres ~ feed * year + (1 + year | Feeding.site), family = binomial, data = df10) Computing marginal effects than would be the same function call: ggpredict(M1, c("feed", "year")) Best Daniel -----Urspr?ngliche Nachricht----- Von: R-sig-mixed-models <r-sig-mixed-models-bounces at r-project.org> Im Auftrag von Patricia Graf Gesendet: Sonntag, 9. Juni 2019 09:17 An: r-sig-mixed-models at r-project.org Betreff: [R-sig-ME] Mixed model interpretation with interaction Hello, I have a few questions concering the interpretation of a GLMM output table when the model includes an interaction. We want to analyse bear presence at feeding sites (bear_pres) related to the year (two years: 2016, 2017) and the feed supplied at feeding sites (carrion, maize). So the response is binary (0 = no bear present, 1 = bear present within 5-min intervals over the whole day) and both predictors are categorical, we include feeding site ID as random factor. The model includes some other variables too but for simplicity I just use those two variables for explanation. 1) As I understand, in a model without interaction, the interpretation of the results would be as follows: M1 <- glmer((bear_pres ~ feed + year + (1|Feeding.site), family=binomial, data=df10) Fixed effects: Estimate Std. Error z value Pr(>|z|) (Intercept) -4.58524 0.08529 -53.76 <2e-16 ***the intercept is bear presence at maize sites in 2016 feedcarrion 0.39178 0.02139 18.32 <2e-16 ***bear presence at feeding sites in 2017 compared to 2016 year2017 0.23027 0.01978 11.64 <2e-16 ***bear presence at carrion feeding sites compared to maize feeding sites Is this interpretation right? 2) To my knowledge, the output changes when you include an interaction: M2<- glmer(bear_pres ~ year*feed + (1|Feeding.site), family=binomial, data=df10) Fixed effects: Estimate Std. Error z value Pr(>|z|) (Intercept) -4.36413 0.10730 -40.67 < 2e-16 ***the intercept is bear presence at maize sites in 2016 (baseline) year2017 -0.18010 0.05119 -3.52 0.000434 ***difference in bear presence in 2017 compared to 2016 for maize feedcarrion -0.02933 0.05318 -0.55 0.581222 difference in bear presence at carrion sites compared to maize sites in 2016 year2017:feedcarrion 0.85275 0.09953 8.57 < 2e-16 ***difference in bear presence at carrion sites 2017 and the sum of ?0+ ?1+ ?2 So to my questions: Is this interpretation right? What is the coding of the model so it does produce this output, e.g. why is the year not comparing 2016 to 2017 anymore as in the model without the interaction? Or why doesn?t the model still use the two food types for comparison? As I understand, when you include an intercation between the two binary dummy-coded categorical variables, the interpretation of what was main effects before (year, carrion) changes, and so do the betas (these are called ?simple effects? afterwards). In my group, there is a strong believe that in M2, the year still compares the two years (and so does feed), it?s just the coefficient cannot be interpreted anymore. Also, there is a believe that the interaction term compares to feedmaize in the year 2016. If my interpreation is correct, I need some background on how the algorithm works, how simple effects evolve and why the interaction should be interpreted as in the output table of M2. Thank you for your help in advance! [[alternative HTML version deleted]]
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models -- _____________________________________________________________________ Universit?tsklinikum Hamburg-Eppendorf; K?rperschaft des ?ffentlichen Rechts; Gerichtsstand: Hamburg | www.uke.de Vorstandsmitglieder: Prof. Dr. Burkhard G?ke (Vorsitzender), Prof. Dr. Dr. Uwe Koch-Gromus, Joachim Pr?l?, Marya Verdel _____________________________________________________________________ SAVE PAPER - THINK BEFORE PRINTING _______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
A small comment on a specific point of the detailed explaination given by Ren?. Reflecting my understanding, so comments on these are wellcome especially if I'm wrong.
On Sun, Jun 09, 2019 at 04:57:06PM +0200, Ren? wrote:
? 2) For another (technical) illustration: a test-design matrix as yours with ? (e.g.) 2 feeding sites and 2 years, then it would be a 2(site 1 vs. site 2) ? by 2(year 1 vs year 2) independent measures design; or 2 x 2 for short, ? which could be simply expressed by 4 probabilities or by using means on a ? log scale, one mean for each of the design-cells, which would be the ? "centered" variant of estimation; but usually dummy coding implies a ? non-centered (but mathematically equivalent - standard) coding: ? If the model is: ? y = site+year (ignoring random effects now), then ? cellmean(site1:year1) = Model_Intercept ? cellmean(site1:year2) = Model_Intercept + year2 ? cellmean(site2:year1) = Model_Intercept + site2 ? cellmean(site2:year2) = Model_Intercept + site2 + year2 ? ? mean(site1) = (2*Model_intercept + year2)/2 ? mean(site2) = ( 2(Model_intercept + site2)+year2))/2 ? and so on... Note that these formula for means of each site assume either that observed sample sizes are exactly the same for both years (if one wants to obtain the empirical means) or that the two years are equally probable in the population (if one wants to obtain estimations of the theoretical mean ; on this specific context, that may seem a weird way of saying things, but for other experimental designs it should apply). That may, or may not, be sensible assumptions depending on the context. For instance, if the bear population has increased between the two years, it may be expected that more bears are observed on the second year, so a weighted average may be more sounded. Or not... ? (Where intercept in most estimation methods is by default is defined in ? reference to the first level of the first predictor in the equation; thus ? site1 (+year1, which is 0 in this type of coding); but the reference point ? can be changed manually)
Emmanuel CURIS
emmanuel.curis at parisdescartes.fr
Page WWW: http://emmanuel.curis.online.fr/index.html
If you have multiple (repeated) measurements from both bears and feeding site, you may even have a nested or cross classified design. In such case, bears might be nested within feeding sites, and both bear and feeding site might be modelled as random intercept. Here?s a very short gist showing the difference between nested and cross classified design and how to write this in lme4-notation: http://htmlpreview.github.io/?https://github.com/strengejacke/mixed-models-snippets/blob/master/nested_fully-crossed_cross-classified_models.html Best Daniel Von: Ren? <bimonosom at gmail.com> Gesendet: Sonntag, 9. Juni 2019 17:29 An: d.luedecke at uke.de Cc: Patricia Graf <patricia.graf03 at gmail.com>; r-sig-mixed-models <r-sig-mixed-models at r-project.org> Betreff: Re: [R-sig-ME] Mixed model interpretation with interaction Ps: I also agree with Daniel to take care of repeated measurements of the same bears coming to the sites in both years. However, the main problem I guess, will be that not every bear comes back in the second year. This means, having random slopes for bears that were observed only once, will bias the effect estimate (i.e. the random slopes for year will not be separable from the fixed effect of year). A solution to this, however would be, to use an extra variable (lets call it 'repeat') that codes, whether a bear has there in both years (=1) or not (=0; numeric coding - not factored). Then you the following should work: model<-(y~site*year+(0+repeat*year| bearID)) Which will estimate random slopes for year for all bears that were there at least twice, but not for others, where the term before the | becomes 0 (and nothing happens)). Best, Ren? Pps: You can tell your colleagues that: the Model-intercept is the only direct mean that the model estimates directly (i.e. the reference cell) and all other deviations (including other means) are linear combinations from that intercept (for any factor) ... And ... of course, the parameters still can be interpreted this way (as illustrated above) ... but you need to know some details how to do so :)) (you can impress them now...) The easiest way to let a function reconstruct the model outputs is using emmeans() e.g. emmeans(model1, ~Site) should give the marginal estimates of the - site main effect - (site 1 and site 2 means) on the log scale and emmeans(model1, ~Site, type = "response") will give the estimates on the actual response (probability) scale. I find this often very helpful (also for plotting). Am So., 9. Juni 2019 um 16:57 Uhr schrieb Ren? <bimonosom at gmail.com <mailto:bimonosom at gmail.com> >: Hi, I don't know if this adds anything new but the most direct answers that come into my mind would be. 1) It seems you use dummy coding, and this defines the interpretation of the estimated coefficients, which would be different from (often preferred because more easy to interpret) effect / or contrast coding (dummy coding has some 'fitting' advantages which are mainly discussed with respected to centered vs. non-centered likelihood (or least-square-mean) estimation processes, which might be insightful for you to look up in the internet); but any coding design you use will eventually simply try to estimate cell-means (in your case on a log scale), and you need to check how to get these cell means out of your coefficients (via back-transformation). One way of doing this is by using marginal predictions, as Daniel points out. 2) For another (technical) illustration: a test-design matrix as yours with (e.g.) 2 feeding sites and 2 years, then it would be a 2(site 1 vs. site 2) by 2(year 1 vs year 2) independent measures design; or 2 x 2 for short, which could be simply expressed by 4 probabilities or by using means on a log scale, one mean for each of the design-cells, which would be the "centered" variant of estimation; but usually dummy coding implies a non-centered (but mathematically equivalent - standard) coding: If the model is: y = site+year (ignoring random effects now), then cellmean(site1:year1) = Model_Intercept cellmean(site1:year2) = Model_Intercept + year2 cellmean(site2:year1) = Model_Intercept + site2 cellmean(site2:year2) = Model_Intercept + site2 + year2 mean(site1) = (2*Model_intercept + year2)/2 mean(site2) = ( 2(Model_intercept + site2)+year2))/2 and so on... (Where intercept in most estimation methods is by default is defined in reference to the first level of the first predictor in the equation; thus site1 (+year1, which is 0 in this type of coding); but the reference point can be changed manually) If the model is: y=site+year+site:year, then cellmean(site1:year1) = Model_Intercept cellmean(site1:year2) = Model_Intercept + year2 cellmean(site2:year1) = Model_Intercept + site2 cellmean(site2:year2) = Model_Intercept+site2+ year2 + site2:year2 Where only the fourth equation changes, which nontheless can have a huge impact on the estimation of the other parameters (usually R outputs the reference levels for the intercept and the coefficients, which you can easily identify) In case there are more sites than two... e.g.. 4 of them, then: cellmean(site1:year1) = Model_Intercept cellmean(site2:year1) = Model_Intercept + site2 cellmean(site3:year1) = Model_Intercept + site3 cellmean(site4:year1) = Model_Intercept + site4 You might get the gist :) Finally, if you actually want to test for an overall interaction in this way (or main effects), looking at these coefficients is not meaningful, which you can tell by just looking at the formulas above..., So you might want to do it differently (correctly), namely by using likelihood ratio tests: (in R like coding) Model1<- y=site+year+site:year vs Model2<- y=site+year with anova(Model1,Model2) (I think aov() should work as well) If the interaction of both variables is significant (i.e. the anova() output gives a * for the comparison between Model 1 and Model 2... :))) then the interaction effect explains some 'significant' amount of variance. (If there is no *, you can consider the models as equal in terms of explained variance). Same for other effects (e.g. full model vs. model a specific main effect). Maybe Check whether the "afex::mixed" function which does this for you in a sensible way (there are different ways of doing LRT tests...) ;)) Having done this in the first place, is often viewed as prerequisite for 'digging' into the model estimates (as discussed above) to find out, what significant then actually means in terms of 'mean-changes' :) Hope this helps, Best, Ren? Am So., 9. Juni 2019 um 12:45 Uhr schrieb <d.luedecke at uke.de <mailto:d.luedecke at uke.de> >: Dear Patricia, when you include an interaction, your assumption is that the relationship between an independent X1 and the dependent variable Y varies *depending on the values of another independent variable X2*. Indeed, for logistic regression models (as well as for many models with non-Gaussian families), the interpretation of interaction terms can be tricky. In such cases, I would recommend to compute (at least additionally) marginal effects, which give you an intuitive output of your results. You can do so e.g. with the "ggeffects" package (https://strengejacke.github.io/ggeffects/), and there is also an example for a logistic mixed effects model (https://strengejacke.github.io/ggeffects/articles/practical_logisticmixedmodel.html), which might help you. In your case, the code would be ggpredict(M1, c("feed", "year")) for the model with interaction. If you want to plot the results, simply call me <- ggpredict(M1, c("feed", "year")) plot(me) A comment on your model: I'm not sure, but if you compare subjects (or feeding sites) at two time points, you might want to model the auto-correlation of subjects / feeding site ("repeated measure") using your time variable as random slope: M1 <- glmer((bear_pres ~ feed * year + (1 + year | Feeding.site), family = binomial, data = df10) Computing marginal effects than would be the same function call: ggpredict(M1, c("feed", "year")) Best Daniel -----Urspr?ngliche Nachricht----- Von: R-sig-mixed-models <r-sig-mixed-models-bounces at r-project.org <mailto:r-sig-mixed-models-bounces at r-project.org> > Im Auftrag von Patricia Graf Gesendet: Sonntag, 9. Juni 2019 09:17 An: r-sig-mixed-models at r-project.org <mailto:r-sig-mixed-models at r-project.org> Betreff: [R-sig-ME] Mixed model interpretation with interaction Hello, I have a few questions concering the interpretation of a GLMM output table when the model includes an interaction. We want to analyse bear presence at feeding sites (bear_pres) related to the year (two years: 2016, 2017) and the feed supplied at feeding sites (carrion, maize). So the response is binary (0 = no bear present, 1 = bear present within 5-min intervals over the whole day) and both predictors are categorical, we include feeding site ID as random factor. The model includes some other variables too but for simplicity I just use those two variables for explanation. 1) As I understand, in a model without interaction, the interpretation of the results would be as follows: M1 <- glmer((bear_pres ~ feed + year + (1|Feeding.site), family=binomial, data=df10) Fixed effects: Estimate Std. Error z value Pr(>|z|) (Intercept) -4.58524 0.08529 -53.76 <2e-16 ***the intercept is bear presence at maize sites in 2016 feedcarrion 0.39178 0.02139 18.32 <2e-16 ***bear presence at feeding sites in 2017 compared to 2016 year2017 0.23027 0.01978 11.64 <2e-16 ***bear presence at carrion feeding sites compared to maize feeding sites Is this interpretation right? 2) To my knowledge, the output changes when you include an interaction: M2<- glmer(bear_pres ~ year*feed + (1|Feeding.site), family=binomial, data=df10) Fixed effects: Estimate Std. Error z value Pr(>|z|) (Intercept) -4.36413 0.10730 -40.67 < 2e-16 ***the intercept is bear presence at maize sites in 2016 (baseline) year2017 -0.18010 0.05119 -3.52 0.000434 ***difference in bear presence in 2017 compared to 2016 for maize feedcarrion -0.02933 0.05318 -0.55 0.581222 difference in bear presence at carrion sites compared to maize sites in 2016 year2017:feedcarrion 0.85275 0.09953 8.57 < 2e-16 ***difference in bear presence at carrion sites 2017 and the sum of ?0+ ?1+ ?2 So to my questions: Is this interpretation right? What is the coding of the model so it does produce this output, e.g. why is the year not comparing 2016 to 2017 anymore as in the model without the interaction? Or why doesn?t the model still use the two food types for comparison? As I understand, when you include an intercation between the two binary dummy-coded categorical variables, the interpretation of what was main effects before (year, carrion) changes, and so do the betas (these are called ?simple effects? afterwards). In my group, there is a strong believe that in M2, the year still compares the two years (and so does feed), it?s just the coefficient cannot be interpreted anymore. Also, there is a believe that the interaction term compares to feedmaize in the year 2016. If my interpreation is correct, I need some background on how the algorithm works, how simple effects evolve and why the interaction should be interpreted as in the output table of M2. Thank you for your help in advance! _______________________________________________ R-sig-mixed-models at r-project.org <mailto:R-sig-mixed-models at r-project.org> mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models -- _____________________________________________________________________ Universit?tsklinikum Hamburg-Eppendorf; K?rperschaft des ?ffentlichen Rechts; Gerichtsstand: Hamburg | www.uke.de <http://www.uke.de> Vorstandsmitglieder: Prof. Dr. Burkhard G?ke (Vorsitzender), Prof. Dr. Dr. Uwe Koch-Gromus, Joachim Pr?l?, Marya Verdel _____________________________________________________________________ SAVE PAPER - THINK BEFORE PRINTING _______________________________________________ R-sig-mixed-models at r-project.org <mailto:R-sig-mixed-models at r-project.org> mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models -- _____________________________________________________________________ Universit?tsklinikum Hamburg-Eppendorf; K?rperschaft des ?ffentlichen Rechts; Gerichtsstand: Hamburg | www.uke.de Vorstandsmitglieder: Prof. Dr. Burkhard G?ke (Vorsitzender), Prof. Dr. Dr. Uwe Koch-Gromus, Joachim Pr?l?, Marya Verdel _____________________________________________________________________ SAVE PAPER - THINK BEFORE PRINTING
@Emmanuel Good point, and just to add some clarity from my side: these mean formulas are, indeed, not a descriptive statistic, but estimated marginal means, which reflect the assumption that there is a "true" population mean regardless of the number of observations in the design cells (if they are not weighted within the model, which might be worth thinking about in terms of related problems of variance assumptions in likelihood ratio testing; see internet discussions about type 1 vs. type 3 LRT's with unbalanced sample sizes). Am So., 9. Juni 2019 um 20:26 Uhr schrieb <d.luedecke at uke.de>:
If you have multiple (repeated) measurements from both bears and feeding site, you may even have a nested or cross classified design. In such case, bears might be nested within feeding sites, and both bear and feeding site might be modelled as random intercept. Here?s a very short gist showing the difference between nested and cross classified design and how to write this in lme4-notation: http://htmlpreview.github.io/?https://github.com/strengejacke/mixed-models-snippets/blob/master/nested_fully-crossed_cross-classified_models.html Best Daniel *Von:* Ren? <bimonosom at gmail.com> *Gesendet:* Sonntag, 9. Juni 2019 17:29 *An:* d.luedecke at uke.de *Cc:* Patricia Graf <patricia.graf03 at gmail.com>; r-sig-mixed-models < r-sig-mixed-models at r-project.org> *Betreff:* Re: [R-sig-ME] Mixed model interpretation with interaction Ps: I also agree with Daniel to take care of repeated measurements of the same bears coming to the sites in both years. However, the main problem I guess, will be that not every bear comes back in the second year. This means, having random slopes for bears that were observed only once, will bias the effect estimate (i.e. the random slopes for year will not be separable from the fixed effect of year). A solution to this, however would be, to use an extra variable (lets call it 'repeat') that codes, whether a bear has there in both years (=1) or not (=0; numeric coding - not factored). Then you the following should work: model<-(y~site*year+(0+repeat*year| bearID)) Which will estimate random slopes for year for all bears that were there at least twice, but not for others, where the term before the | becomes 0 (and nothing happens)). Best, Ren? Pps: You can tell your colleagues that: the Model-intercept is the only direct mean that the model estimates directly (i.e. the reference cell) and all other deviations (including other means) are linear combinations from that intercept (for any factor) ... And ... of course, the parameters still can be interpreted this way (as illustrated above) ... but you need to know some details how to do so :)) (you can impress them now...) The easiest way to let a function reconstruct the model outputs is using emmeans() e.g. emmeans(model1, ~Site) should give the marginal estimates of the - site main effect - (site 1 and site 2 means) on the log scale and emmeans(model1, ~Site, type = "response") will give the estimates on the actual response (probability) scale. I find this often very helpful (also for plotting). Am So., 9. Juni 2019 um 16:57 Uhr schrieb Ren? <bimonosom at gmail.com>: Hi, I don't know if this adds anything new but the most direct answers that come into my mind would be. 1) It seems you use dummy coding, and this defines the interpretation of the estimated coefficients, which would be different from (often preferred because more easy to interpret) effect / or contrast coding (dummy coding has some 'fitting' advantages which are mainly discussed with respected to centered vs. non-centered likelihood (or least-square-mean) estimation processes, which might be insightful for you to look up in the internet); but any coding design you use will eventually simply try to estimate cell-means (in your case on a log scale), and you need to check how to get these cell means out of your coefficients (via back-transformation). One way of doing this is by using marginal predictions, as Daniel points out. 2) For another (technical) illustration: a test-design matrix as yours with (e.g.) 2 feeding sites and 2 years, then it would be a 2(site 1 vs. site 2) by 2(year 1 vs year 2) independent measures design; or 2 x 2 for short, which could be simply expressed by 4 probabilities or by using means on a log scale, one mean for each of the design-cells, which would be the "centered" variant of estimation; but usually dummy coding implies a non-centered (but mathematically equivalent - standard) coding: If the model is: y = site+year (ignoring random effects now), then cellmean(site1:year1) = Model_Intercept cellmean(site1:year2) = Model_Intercept + year2 cellmean(site2:year1) = Model_Intercept + site2 cellmean(site2:year2) = Model_Intercept + site2 + year2 mean(site1) = (2*Model_intercept + year2)/2 mean(site2) = ( 2(Model_intercept + site2)+year2))/2 and so on... (Where intercept in most estimation methods is by default is defined in reference to the first level of the first predictor in the equation; thus site1 (+year1, which is 0 in this type of coding); but the reference point can be changed manually) If the model is: y=site+year+site:year, then cellmean(site1:year1) = Model_Intercept cellmean(site1:year2) = Model_Intercept + year2 cellmean(site2:year1) = Model_Intercept + site2 cellmean(site2:year2) = Model_Intercept+site2+ year2 + site2:year2 Where only the fourth equation changes, which nontheless can have a huge impact on the estimation of the other parameters (usually R outputs the reference levels for the intercept and the coefficients, which you can easily identify) In case there are more sites than two... e.g.. 4 of them, then: cellmean(site1:year1) = Model_Intercept cellmean(site2:year1) = Model_Intercept + site2 cellmean(site3:year1) = Model_Intercept + site3 cellmean(site4:year1) = Model_Intercept + site4 You might get the gist :) Finally, if you actually want to test for an overall interaction in this way (or main effects), looking at these coefficients is not meaningful, which you can tell by just looking at the formulas above..., So you might want to do it differently (correctly), namely by using likelihood ratio tests: (in R like coding) Model1<- y=site+year+site:year vs Model2<- y=site+year with anova(Model1,Model2) (I think aov() should work as well) If the interaction of both variables is significant (i.e. the anova() output gives a * for the comparison between Model 1 and Model 2... :))) then the interaction effect explains some 'significant' amount of variance. (If there is no *, you can consider the models as equal in terms of explained variance). Same for other effects (e.g. full model vs. model a specific main effect). Maybe Check whether the "afex::mixed" function which does this for you in a sensible way (there are different ways of doing LRT tests...) ;)) Having done this in the first place, is often viewed as prerequisite for 'digging' into the model estimates (as discussed above) to find out, what significant then actually means in terms of 'mean-changes' :) Hope this helps, Best, Ren? Am So., 9. Juni 2019 um 12:45 Uhr schrieb <d.luedecke at uke.de>: Dear Patricia, when you include an interaction, your assumption is that the relationship between an independent X1 and the dependent variable Y varies *depending on the values of another independent variable X2*. Indeed, for logistic regression models (as well as for many models with non-Gaussian families), the interpretation of interaction terms can be tricky. In such cases, I would recommend to compute (at least additionally) marginal effects, which give you an intuitive output of your results. You can do so e.g. with the "ggeffects" package ( https://strengejacke.github.io/ggeffects/), and there is also an example for a logistic mixed effects model ( https://strengejacke.github.io/ggeffects/articles/practical_logisticmixedmodel.html), which might help you. In your case, the code would be ggpredict(M1, c("feed", "year")) for the model with interaction. If you want to plot the results, simply call me <- ggpredict(M1, c("feed", "year")) plot(me) A comment on your model: I'm not sure, but if you compare subjects (or feeding sites) at two time points, you might want to model the auto-correlation of subjects / feeding site ("repeated measure") using your time variable as random slope: M1 <- glmer((bear_pres ~ feed * year + (1 + year | Feeding.site), family = binomial, data = df10) Computing marginal effects than would be the same function call: ggpredict(M1, c("feed", "year")) Best Daniel -----Urspr?ngliche Nachricht----- Von: R-sig-mixed-models <r-sig-mixed-models-bounces at r-project.org> Im Auftrag von Patricia Graf Gesendet: Sonntag, 9. Juni 2019 09:17 An: r-sig-mixed-models at r-project.org Betreff: [R-sig-ME] Mixed model interpretation with interaction Hello, I have a few questions concering the interpretation of a GLMM output table when the model includes an interaction. We want to analyse bear presence at feeding sites (bear_pres) related to the year (two years: 2016, 2017) and the feed supplied at feeding sites (carrion, maize). So the response is binary (0 = no bear present, 1 = bear present within 5-min intervals over the whole day) and both predictors are categorical, we include feeding site ID as random factor. The model includes some other variables too but for simplicity I just use those two variables for explanation. 1) As I understand, in a model without interaction, the interpretation of the results would be as follows: M1 <- glmer((bear_pres ~ feed + year + (1|Feeding.site), family=binomial, data=df10) Fixed effects: Estimate Std. Error z value Pr(>|z|) (Intercept) -4.58524 0.08529 -53.76 <2e-16 ***the intercept is bear presence at maize sites in 2016 feedcarrion 0.39178 0.02139 18.32 <2e-16 ***bear presence at feeding sites in 2017 compared to 2016 year2017 0.23027 0.01978 11.64 <2e-16 ***bear presence at carrion feeding sites compared to maize feeding sites Is this interpretation right? 2) To my knowledge, the output changes when you include an interaction: M2<- glmer(bear_pres ~ year*feed + (1|Feeding.site), family=binomial, data=df10) Fixed effects: Estimate Std. Error z value Pr(>|z|) (Intercept) -4.36413 0.10730 -40.67 < 2e-16 ***the intercept is bear presence at maize sites in 2016 (baseline) year2017 -0.18010 0.05119 -3.52 0.000434 ***difference in bear presence in 2017 compared to 2016 for maize feedcarrion -0.02933 0.05318 -0.55 0.581222 difference in bear presence at carrion sites compared to maize sites in 2016 year2017:feedcarrion 0.85275 0.09953 8.57 < 2e-16 ***difference in bear presence at carrion sites 2017 and the sum of ?0+ ?1+ ?2 So to my questions: Is this interpretation right? What is the coding of the model so it does produce this output, e.g. why is the year not comparing 2016 to 2017 anymore as in the model without the interaction? Or why doesn?t the model still use the two food types for comparison? As I understand, when you include an intercation between the two binary dummy-coded categorical variables, the interpretation of what was main effects before (year, carrion) changes, and so do the betas (these are called ?simple effects? afterwards). In my group, there is a strong believe that in M2, the year still compares the two years (and so does feed), it?s just the coefficient cannot be interpreted anymore. Also, there is a believe that the interaction term compares to feedmaize in the year 2016. If my interpreation is correct, I need some background on how the algorithm works, how simple effects evolve and why the interaction should be interpreted as in the output table of M2. Thank you for your help in advance! [[alternative HTML version deleted]]
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models -- _____________________________________________________________________ Universit?tsklinikum Hamburg-Eppendorf; K?rperschaft des ?ffentlichen Rechts; Gerichtsstand: Hamburg | www.uke.de Vorstandsmitglieder: Prof. Dr. Burkhard G?ke (Vorsitzender), Prof. Dr. Dr. Uwe Koch-Gromus, Joachim Pr?l?, Marya Verdel _____________________________________________________________________ SAVE PAPER - THINK BEFORE PRINTING _______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models ------------------------------ Universit?tsklinikum Hamburg-Eppendorf; K?rperschaft des ?ffentlichen Rechts; Gerichtsstand: Hamburg | www.uke.de Vorstandsmitglieder: Prof. Dr. Burkhard G?ke (Vorsitzender), Prof. Dr. Dr. Uwe Koch-Gromus, Joachim Pr?l?, Marya Verdel ------------------------------ SAVE PAPER - THINK BEFORE PRINTING