Dear all, I run into something I don't understand: I update a model with some terms; none of the terms is significant; but the model suddenly fits A LOT better . . . The background: I am running a model to test a relatively simple hypothesis: that an intervention aiming to reduce cannabis use is effective. It's a repeated measures design where we measured cannabis use of each student before and after the intervention. In addition to having repeated measures, students are nested in schools. A simple plot of the percentage of cannabis users before and after the intervention, in the control and the intervention groups, is at http://sciencerep.org/files/7/plot.png (this plot ignores the schools). This is the datafile: <R CODE> ### Load data dat.long <- read.table("http://sciencerep.org/files/7/the%20cannabis%20show%20-%20data%20in%20long%20format.tsv", header=TRUE, sep = "\t"); ### Set 'participant' as factor dat.long$participant <- factor(dat.long$id); head(dat.long); </R CODE> This is what the head looks like: id moment school cannabisShow gender age usedCannabis_bi participant 1 1 before Zuidoost Intervention 2 NA NA 1 2 2 before Zuidoost Intervention 2 NA 0 2 3 3 before Zuidoost Intervention 1 NA 1 3 4 4 before Noord Intervention NA NA NA 4 5 5 before Noord Intervention NA NA 1 5 6 6 before Noord Intervention 1 NA NA 6 'school' has 8 levels; 'moment' has 2 levels ('before' and 'after'); 'cannabisShow' has 2 levels, 'Intervention' and 'Control'; 'usedCannabis_bi' has 2 levels, 0 and 1; and participants is the participant identifyer. I run a null model and a 'real' model, comparing the fit. These are the formulations I use: <R CODE> rep_measures.1.null <- lmer(formula = usedCannabis_bi ~ 1 + moment + (1 + moment | school / participant), family=binomial(link = "logit"), data=dat.long); rep_measures.1.model <- update(rep_measures.1.null, .~. + moment*cannabisShow); rep_measures.1.null; rep_measures.1.model; anova(rep_measures.1.null, rep_measures.1.model); </R CODE> The second model, where I introduce the interaction between measurement moment and whether participants received the intervention (this should reflect an effect of the intervention), fits considerably better than the original model. But, the interaction is not significant. In fact, none of the fixed effects is - so I added terms to the model, none of these terms significantly contributes to the prediction of cannabis use, yet the model fits a lot better. This seems to be a paradox. Could anybody maybe explain how this is possible? I also looked at the situation where I impose fixed intercepts and slopes on the participant level (so intercepts and slopes could only vary per school): <R CODE> rep_measures.2.null <- lmer(formula = usedCannabis_bi ~ 1 + moment + (1 + moment | school), family=binomial(link = "logit"), data=dat.long); rep_measures.2.model <- update(rep_measures.2.null, .~. + moment*cannabisShow); rep_measures.2.null; rep_measures.2.model; anova(rep_measures.2.null, rep_measures.2.model); </R CODE> Now the interaction between 'measurement moment' and 'intervention' is significant, as I expected; but the improvement in fit between the null model and the 'full model' is much, much smaller. This is very counter-intuitive to me - I have the feeling I'm missing something basic, but I have no idea what. Any help is much appreciated! Thank you very much in advance, kind regards, Gjalt-Jorn PS: the file with the analyses is at http://sciencerep.org/files/7/the%20cannabis%20show%20-%20analyses%20for%20mailing%20list.r
lmer: effects of forcing fixed intercepts and slopes
7 messages · ONKELINX, Thierry, Steven J. Pierce, Seth Bigelow +1 more
Dear Gjalt-Jorn, Your null model is too complex for your data. Having only one measurement per participant per moment, you cannot fit a random 'slope' along moment per participant. Note the perfect correlation in your null model for the nested random effect. Even at the school levels, the amount of data is not that larger and you end up with near perfect correlations in this random effect. So I would advise to drop moment as a random slope. Don't forget that the summary of a model is testing different hypotheses than an LRT between two models! You might do some reading on that topic or get some local statistical advise. Best regards, Thierry ir. Thierry Onkelinx Instituut voor natuur- en bosonderzoek / Research Institute for Nature and Forest team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance Kliniekstraat 25 1070 Anderlecht Belgium + 32 2 525 02 51 + 32 54 43 61 85 Thierry.Onkelinx at inbo.be www.inbo.be To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher The plural of anecdote is not data. ~ Roger Brinner The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data. ~ John Tukey -----Oorspronkelijk bericht----- Van: r-sig-mixed-models-bounces at r-project.org [mailto:r-sig-mixed-models-bounces at r-project.org] Namens Gjalt-Jorn Peters Verzonden: dinsdag 6 november 2012 16:23 Aan: r-sig-mixed-models at r-project.org Onderwerp: [R-sig-ME] lmer: effects of forcing fixed intercepts and slopes Dear all, I run into something I don't understand: I update a model with some terms; none of the terms is significant; but the model suddenly fits A LOT better . . . The background: I am running a model to test a relatively simple hypothesis: that an intervention aiming to reduce cannabis use is effective. It's a repeated measures design where we measured cannabis use of each student before and after the intervention. In addition to having repeated measures, students are nested in schools. A simple plot of the percentage of cannabis users before and after the intervention, in the control and the intervention groups, is at http://sciencerep.org/files/7/plot.png (this plot ignores the schools). This is the datafile: <R CODE> ### Load data dat.long <- read.table("http://sciencerep.org/files/7/the%20cannabis%20show%20-%20data%20in%20long%20format.tsv", header=TRUE, sep = "\t"); ### Set 'participant' as factor dat.long$participant <- factor(dat.long$id); head(dat.long); </R CODE> This is what the head looks like: id moment school cannabisShow gender age usedCannabis_bi participant 1 1 before Zuidoost Intervention 2 NA NA 1 2 2 before Zuidoost Intervention 2 NA 0 2 3 3 before Zuidoost Intervention 1 NA 1 3 4 4 before Noord Intervention NA NA NA 4 5 5 before Noord Intervention NA NA 1 5 6 6 before Noord Intervention 1 NA NA 6 'school' has 8 levels; 'moment' has 2 levels ('before' and 'after'); 'cannabisShow' has 2 levels, 'Intervention' and 'Control'; 'usedCannabis_bi' has 2 levels, 0 and 1; and participants is the participant identifyer. I run a null model and a 'real' model, comparing the fit. These are the formulations I use: <R CODE> rep_measures.1.null <- lmer(formula = usedCannabis_bi ~ 1 + moment + (1 + moment | school / participant), family=binomial(link = "logit"), data=dat.long); rep_measures.1.model <- update(rep_measures.1.null, .~. + moment*cannabisShow); rep_measures.1.null; rep_measures.1.model; anova(rep_measures.1.null, rep_measures.1.model); </R CODE> The second model, where I introduce the interaction between measurement moment and whether participants received the intervention (this should reflect an effect of the intervention), fits considerably better than the original model. But, the interaction is not significant. In fact, none of the fixed effects is - so I added terms to the model, none of these terms significantly contributes to the prediction of cannabis use, yet the model fits a lot better. This seems to be a paradox. Could anybody maybe explain how this is possible? I also looked at the situation where I impose fixed intercepts and slopes on the participant level (so intercepts and slopes could only vary per school): <R CODE> rep_measures.2.null <- lmer(formula = usedCannabis_bi ~ 1 + moment + (1 + moment | school), family=binomial(link = "logit"), data=dat.long); rep_measures.2.model <- update(rep_measures.2.null, .~. + moment*cannabisShow); rep_measures.2.null; rep_measures.2.model; anova(rep_measures.2.null, rep_measures.2.model); </R CODE> Now the interaction between 'measurement moment' and 'intervention' is significant, as I expected; but the improvement in fit between the null model and the 'full model' is much, much smaller. This is very counter-intuitive to me - I have the feeling I'm missing something basic, but I have no idea what. Any help is much appreciated! Thank you very much in advance, kind regards, Gjalt-Jorn PS: the file with the analyses is at http://sciencerep.org/files/7/the%20cannabis%20show%20-%20analyses%20for%20mailing%20list.r _______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models * * * * * * * * * * * * * D I S C L A I M E R * * * * * * * * * * * * * Dit bericht en eventuele bijlagen geven enkel de visie van de schrijver weer en binden het INBO onder geen enkel beding, zolang dit bericht niet bevestigd is door een geldig ondertekend document. The views expressed in this message and any annex are purely those of the writer and may not be regarded as stating an official position of INBO, as long as the message is not confirmed by a duly signed document.
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-sig-mixed-models/attachments/20121106/71edece5/attachment.pl>
Mixed models are not that scary. I would recommend to read Zuur et al (2009). It was written with 'mainstream researchers' (in ecology) in mind. It start with simple linear models and gradually adds complexity (glm, gam, lmm, glmm, gamm, ...)
@BOOK{ZuurMixedModels,
title = {{M}ixed {E}ffects {M}odels and {E}xtensions in {E}cology with {R}},
publisher = {Springer New York},
year = {2009},
author = {Zuur, Alain F. and Ieno, Elena N. and Walker, Neil J. and Saveliev, Anatoly A. and Smith, Graham M.},
doi = {10.1007/978-0-387-87458-6}
}
ir. Thierry Onkelinx
Instituut voor natuur- en bosonderzoek / Research Institute for Nature and Forest
team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance
Kliniekstraat 25
1070 Anderlecht
Belgium
+ 32 2 525 02 51
+ 32 54 43 61 85
Thierry.Onkelinx at inbo.be
www.inbo.be
To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of.
~ Sir Ronald Aylmer Fisher
The plural of anecdote is not data.
~ Roger Brinner
The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data.
~ John Tukey
-----Oorspronkelijk bericht-----
Van: r-sig-mixed-models-bounces at r-project.org [mailto:r-sig-mixed-models-bounces at r-project.org] Namens Gjalt-Jorn Peters
Verzonden: dinsdag 6 november 2012 21:42
Aan: r-sig-mixed-models at r-project.org
Onderwerp: Re: [R-sig-ME] lmer: effects of forcing fixed intercepts and slopes
Dear list,
Thierry, great, thank you very much for your quick reply! I will drop moment as a random slope, and read up on the different hypotheses that are being tested.
I have one more question. Basically, I have no background in multilevel (as you may have guessed :-)). The reason I'm 'in over my head' like this, is because I basically want to 'use the proper analysis' for my data, and the only method is apparently mixed models. "All I want" is the simplest' statistically decent, way to test whether cannabis use at the second measurement moment is different in the group that received that intervention as compared to the group that didn't.
However, when I try to learn about mixed models, the sources I encounter approach the modelling practice very differently. They seem to be about much more advanced issues; whether random intercepts and slopes should be included, and for which variables, etc (to stick to those issues that I at least kind of understand). Apparently, either mixed models are only used by people who are statistically much more advanced (i.e. there's a gap between 'mainstream researchers' and the people who understand and use mixed models), or in fact these sources _do_ discuss the same things, but in mixed models the terminology just differs a lot from what you encounter in more basic statistical textbooks.
I basically have the idea that although my requirements are very basic, I have to learn lots of dark arcane issues to be able to do this properly. This is kind of 'scary', as, for example, matrix algebra is, well, scary :-)
What do people here think of this? Is mixed models just something you should avoid unless you're able & willing to really delve into its statistical innards?
Again, thank you very much, kind regards,
Gjalt-Jorn
On 06-11-2012 17:25, ONKELINX, Thierry wrote:
Dear Gjalt-Jorn, Your null model is too complex for your data. Having only one measurement per participant per moment, you cannot fit a random 'slope' along moment per participant. Note the perfect correlation in your null model for the nested random effect. Even at the school levels, the amount of data is not that larger and you end up with near perfect correlations in this random effect. So I would advise to drop moment as a random slope. Don't forget that the summary of a model is testing different hypotheses than an LRT between two models! You might do some reading on that topic or get some local statistical advise. Best regards, Thierry ir. Thierry Onkelinx Instituut voor natuur- en bosonderzoek / Research Institute for Nature and Forest team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance Kliniekstraat 25 1070 Anderlecht Belgium + 32 2 525 02 51 + 32 54 43 61 85 Thierry.Onkelinx at inbo.be www.inbo.be To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher The plural of anecdote is not data. ~ Roger Brinner The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data. ~ John Tukey -----Oorspronkelijk bericht----- Van: r-sig-mixed-models-bounces at r-project.org [mailto:r-sig-mixed-models-bounces at r-project.org] Namens Gjalt-Jorn Peters Verzonden: dinsdag 6 november 2012 16:23 Aan: r-sig-mixed-models at r-project.org Onderwerp: [R-sig-ME] lmer: effects of forcing fixed intercepts and slopes Dear all, I run into something I don't understand: I update a model with some terms; none of the terms is significant; but the model suddenly fits A LOT better . . . The background: I am running a model to test a relatively simple hypothesis: that an intervention aiming to reduce cannabis use is effective. It's a repeated measures design where we measured cannabis use of each student before and after the intervention. In addition to having repeated measures, students are nested in schools. A simple plot of the percentage of cannabis users before and after the intervention, in the control and the intervention groups, is at http://sciencerep.org/files/7/plot.png (this plot ignores the schools). This is the datafile: <R CODE> ### Load data dat.long <- read.table("http://sciencerep.org/files/7/the%20cannabis%20show%20-%20 data%20in%20long%20format.tsv", header=TRUE, sep = "\t"); ### Set 'participant' as factor dat.long$participant <- factor(dat.long$id); head(dat.long); </R CODE> This is what the head looks like: id moment school cannabisShow gender age usedCannabis_bi participant 1 1 before Zuidoost Intervention 2 NA NA 1 2 2 before Zuidoost Intervention 2 NA 0 2 3 3 before Zuidoost Intervention 1 NA 1 3 4 4 before Noord Intervention NA NA NA 4 5 5 before Noord Intervention NA NA 1 5 6 6 before Noord Intervention 1 NA NA 6 'school' has 8 levels; 'moment' has 2 levels ('before' and 'after'); 'cannabisShow' has 2 levels, 'Intervention' and 'Control'; 'usedCannabis_bi' has 2 levels, 0 and 1; and participants is the participant identifyer. I run a null model and a 'real' model, comparing the fit. These are the formulations I use: <R CODE> rep_measures.1.null <- lmer(formula = usedCannabis_bi ~ 1 + moment + (1 + moment | school / participant), family=binomial(link = "logit"), data=dat.long); rep_measures.1.model <- update(rep_measures.1.null, .~. + moment*cannabisShow); rep_measures.1.null; rep_measures.1.model; anova(rep_measures.1.null, rep_measures.1.model); </R CODE> The second model, where I introduce the interaction between measurement moment and whether participants received the intervention (this should reflect an effect of the intervention), fits considerably better than the original model. But, the interaction is not significant. In fact, none of the fixed effects is - so I added terms to the model, none of these terms significantly contributes to the prediction of cannabis use, yet the model fits a lot better. This seems to be a paradox. Could anybody maybe explain how this is possible? I also looked at the situation where I impose fixed intercepts and slopes on the participant level (so intercepts and slopes could only vary per school): <R CODE> rep_measures.2.null <- lmer(formula = usedCannabis_bi ~ 1 + moment + (1 + moment | school), family=binomial(link = "logit"), data=dat.long); rep_measures.2.model <- update(rep_measures.2.null, .~. + moment*cannabisShow); rep_measures.2.null; rep_measures.2.model; anova(rep_measures.2.null, rep_measures.2.model); </R CODE> Now the interaction between 'measurement moment' and 'intervention' is significant, as I expected; but the improvement in fit between the null model and the 'full model' is much, much smaller. This is very counter-intuitive to me - I have the feeling I'm missing something basic, but I have no idea what. Any help is much appreciated! Thank you very much in advance, kind regards, Gjalt-Jorn PS: the file with the analyses is at http://sciencerep.org/files/7/the%20cannabis%20show%20-%20analyses%20f or%20mailing%20list.r
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models * * * * * * * * * * * * * D I S C L A I M E R * * * * * * * * * * * * * Dit bericht en eventuele bijlagen geven enkel de visie van de schrijver weer en binden het INBO onder geen enkel beding, zolang dit bericht niet bevestigd is door een geldig ondertekend document. The views expressed in this message and any annex are purely those of the writer and may not be regarded as stating an official position of INBO, as long as the message is not confirmed by a duly signed document.
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models * * * * * * * * * * * * * D I S C L A I M E R * * * * * * * * * * * * * Dit bericht en eventuele bijlagen geven enkel de visie van de schrijver weer en binden het INBO onder geen enkel beding, zolang dit bericht niet bevestigd is door een geldig ondertekend document. The views expressed in this message and any annex are purely those of the writer and may not be regarded as stating an official position of INBO, as long as the message is not confirmed by a duly signed document.
For good intro coverage of how to use mixed models specifically for longitudinal data, I recommend Singer & Willett (2003). It's a pretty reader-friendly book. It doesn?t specifically cover the case of a binary dependent variable, but that extension isn?t too hard if you already understand the link between regression and logistic regression. Singer, J. D., & Willett, J. B. (2003). Applied longitudinal data analysis: Modeling change and event occurrence. New York, NY: Oxford University Press. Below is series of tutorial articles on multilevel modeling (i.e., one application of mixed models). It's best to read them in the order listed because each builds on the previous paper in the list. These are written about cross-sectional analyses, but the foundations for cross-sectional and longitudinal analyses with mixed models are pretty much the same. Just remember that observations nested within person is very similar conceptually to persons nested within neighborhood. The last paper in the series specifically focuses on logistic variations of these models. Merlo, J. (2003). Multilevel analytical approaches in social epidemiology: Measures of health variation compared with traditional measures of association. Journal of Epidemiology and Community Health, 57(8), 550-552. http://jech.bmj.com/cgi/content/extract/57/8/550 Merlo, J., Chaix, B., Yang, M., Lynch, J., & R?stam, L. (2005). A brief conceptual tutorial of multilevel analysis in social epidemiology: linking the statistical concept of clustering to the idea of contextual phenomenon. Journal of Epidemiology and Community Health, 59(6), 443-449. http://jech.bmj.com/cgi/content/abstract/59/6/443 Merlo, J., Yang, M., Chaix, B., Lynch, J., & R?stam, L. (2005). A brief conceptual tutorial on multilevel analysis in social epidemiology: investigating contextual phenomena in different groups of people. Journal of Epidemiology and Community Health, 59(9), 729-736. http://jech.bmj.com/cgi/content/abstract/59/9/729 Merlo, J., Chaix, B., Yang, M., Lynch, J., & R?stam, L. (2005). A brief conceptual tutorial on multilevel analysis in social epidemiology: interpreting neighbourhood differences and the effect of neighbourhood characteristics on individual health. Journal of Epidemiology and Community Health, 59(12), 1022-1029. http://jech.bmj.com/cgi/content/abstract/59/12/1022 Merlo, J., Chaix, B., Ohlsson, H., Beckman, A., Johnell, K., Hjerpe, P., et al. (2006). A brief conceptual tutorial of multilevel analysis in social epidemiology: using measures of clustering in multilevel logistic regression to investigate contextual phenomena. Journal of Epidemiology and Community Health, 60(4), 290-297. http://jech.bmj.com/cgi/content/abstract/60/4/290 As an alternative to a mixed model, you could also consider using GEE: Hanley, J. A., Negassa, A., deB. Edwardes, M. D., & Forrester, J. E. (2003). Statistical analysis of correlated data using generalized estimating equations: An orientation. American Journal of Epidemiology, 157(4), 364-375. doi: 10.1093/aje/kwf215 http://aje.oxfordjournals.org/cgi/content/full/157/4/364 Ballinger, G. A. (2004). Using generalized estimating equations for longitudinal data analysis. Organizational Research Methods, 7(2), 127-150. doi: 10.1177/1094428104263672 http://orm.sagepub.com/content/7/2/127 Steven J. Pierce, Ph.D. Associate Director Center for Statistical Training & Consulting (CSTAT) Michigan State University Giltner Hall 293 Farm Lane, Room 178 East Lansing, MI 48824 Office Phone: (517) 353-9288 Office Fax: (517) 353-9307 E-mail: pierces1 at msu.edu Web: http://www.cstat.msu.edu -----Original Message----- From: Gjalt-Jorn Peters [mailto:gjalt-jorn at behaviorchange.eu] Sent: Tuesday, November 06, 2012 3:42 PM To: r-sig-mixed-models at r-project.org Subject: Re: [R-sig-ME] lmer: effects of forcing fixed intercepts and slopes Dear list, Thierry, great, thank you very much for your quick reply! I will drop moment as a random slope, and read up on the different hypotheses that are being tested. I have one more question. Basically, I have no background in multilevel (as you may have guessed :-)). The reason I'm 'in over my head' like this, is because I basically want to 'use the proper analysis' for my data, and the only method is apparently mixed models. "All I want" is the simplest' statistically decent, way to test whether cannabis use at the second measurement moment is different in the group that received that intervention as compared to the group that didn't. However, when I try to learn about mixed models, the sources I encounter approach the modelling practice very differently. They seem to be about much more advanced issues; whether random intercepts and slopes should be included, and for which variables, etc (to stick to those issues that I at least kind of understand). Apparently, either mixed models are only used by people who are statistically much more advanced (i.e. there's a gap between 'mainstream researchers' and the people who understand and use mixed models), or in fact these sources _do_ discuss the same things, but in mixed models the terminology just differs a lot from what you encounter in more basic statistical textbooks. I basically have the idea that although my requirements are very basic, I have to learn lots of dark arcane issues to be able to do this properly. This is kind of 'scary', as, for example, matrix algebra is, well, scary :-) What do people here think of this? Is mixed models just something you should avoid unless you're able & willing to really delve into its statistical innards? Again, thank you very much, kind regards, Gjalt-Jorn
On 06-11-2012 17:25, ONKELINX, Thierry wrote:
Dear Gjalt-Jorn, Your null model is too complex for your data. Having only one measurement
per participant per moment, you cannot fit a random 'slope' along moment per participant. Note the perfect correlation in your null model for the nested random effect.
Even at the school levels, the amount of data is not that larger and you
end up with near perfect correlations in this random effect. So I would advise to drop moment as a random slope.
Don't forget that the summary of a model is testing different hypotheses
than an LRT between two models! You might do some reading on that topic or get some local statistical advise.
Best regards, Thierry ir. Thierry Onkelinx Instituut voor natuur- en bosonderzoek / Research Institute for Nature and
Forest
team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance Kliniekstraat 25 1070 Anderlecht Belgium + 32 2 525 02 51 + 32 54 43 61 85 Thierry.Onkelinx at inbo.be www.inbo.be To call in the statistician after the experiment is done may be no more
than asking him to perform a post-mortem examination: he may be able to say what the experiment died of.
~ Sir Ronald Aylmer Fisher The plural of anecdote is not data. ~ Roger Brinner The combination of some data and an aching desire for an answer does not
ensure that a reasonable answer can be extracted from a given body of data.
~ John Tukey -----Oorspronkelijk bericht----- Van: r-sig-mixed-models-bounces at r-project.org
[mailto:r-sig-mixed-models-bounces at r-project.org] Namens Gjalt-Jorn Peters
Verzonden: dinsdag 6 november 2012 16:23 Aan: r-sig-mixed-models at r-project.org Onderwerp: [R-sig-ME] lmer: effects of forcing fixed intercepts and slopes Dear all, I run into something I don't understand: I update a model with some terms;
none of the terms is significant; but the model suddenly fits A LOT better . . .
The background: I am running a model to test a relatively simple hypothesis: that an intervention aiming to reduce cannabis use is
effective. It's a repeated measures design where we measured cannabis use of each student before and after the intervention. In addition to having repeated measures, students are nested in schools. A simple plot of the percentage of cannabis users before and after the intervention, in the control and the intervention groups, is at http://sciencerep.org/files/7/plot.png (this plot ignores the schools).
This is the datafile:
<R CODE>
### Load data
dat.long <-
read.table("http://sciencerep.org/files/7/the%20cannabis%20show%20-%20data%2
0in%20long%20format.tsv",
header=TRUE, sep = "\t");
### Set 'participant' as factor
dat.long$participant <- factor(dat.long$id);
head(dat.long);
</R CODE>
This is what the head looks like:
id moment school cannabisShow gender age usedCannabis_bi participant
1 1 before Zuidoost Intervention 2 NA NA 1
2 2 before Zuidoost Intervention 2 NA 0 2
3 3 before Zuidoost Intervention 1 NA 1 3
4 4 before Noord Intervention NA NA NA 4
5 5 before Noord Intervention NA NA 1 5
6 6 before Noord Intervention 1 NA NA 6
'school' has 8 levels;
'moment' has 2 levels ('before' and 'after'); 'cannabisShow' has 2 levels,
'Intervention' and 'Control'; 'usedCannabis_bi' has 2 levels, 0 and 1; and participants is the participant identifyer.
I run a null model and a 'real' model, comparing the fit. These are the
formulations I use:
<R CODE>
rep_measures.1.null <- lmer(formula = usedCannabis_bi ~
1 + moment + (1 + moment | school /
participant),
family=binomial(link = "logit"),
data=dat.long);
rep_measures.1.model <- update(rep_measures.1.null, .~. +
moment*cannabisShow);
rep_measures.1.null;
rep_measures.1.model;
anova(rep_measures.1.null, rep_measures.1.model); </R CODE>
The second model, where I introduce the interaction between measurement
moment and whether participants received the intervention (this should reflect an effect of the intervention), fits considerably better than the original model. But, the interaction is not significant. In fact, none of the fixed effects is - so I added terms to the model, none of these terms significantly contributes to the prediction of cannabis use, yet the model fits a lot better.
This seems to be a paradox. Could anybody maybe explain how this is
possible?
I also looked at the situation where I impose fixed intercepts and slopes
on the participant level (so intercepts and slopes could only vary per school):
<R CODE>
rep_measures.2.null <- lmer(formula = usedCannabis_bi ~
1 + moment + (1 + moment | school),
family=binomial(link = "logit"),
data=dat.long);
rep_measures.2.model <- update(rep_measures.2.null, .~. +
moment*cannabisShow);
rep_measures.2.null;
rep_measures.2.model;
anova(rep_measures.2.null, rep_measures.2.model); </R CODE>
Now the interaction between 'measurement moment' and 'intervention' is
significant, as I expected; but the improvement in fit between the null model and the 'full model' is much, much smaller.
This is very counter-intuitive to me - I have the feeling I'm missing
something basic, but I have no idea what. Any help is much appreciated!
Thank you very much in advance, kind regards, Gjalt-Jorn PS: the file with the analyses is at
http://sciencerep.org/files/7/the%20cannabis%20show%20-%20analyses%20for%20m ailing%20list.r
_______________________________________________ R-sig-mixed-models at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
* * * * * * * * * * * * * D I S C L A I M E R * * * * * * * * * * * * * Dit bericht en eventuele bijlagen geven enkel de visie van de schrijver
weer en binden het INBO onder geen enkel beding, zolang dit bericht niet bevestigd is door een geldig ondertekend document.
The views expressed in this message and any annex are purely those of the
writer and may not be regarded as stating an official position of INBO, as long as the message is not confirmed by a duly signed document.
Let me slip in a word of praise for Simon Wood's book, 'Generalized Additive
Models', particularly chapter 6 on mixed models. The man is a genius for
explaining statistics, and his introduction to mixed models is the clearest
I've come across. It's canonical for me!
--Seth
-----Original Message-----
From: r-sig-mixed-models-bounces at r-project.org
[mailto:r-sig-mixed-models-bounces at r-project.org] On Behalf Of ONKELINX,
Thierry
Sent: Wednesday, November 07, 2012 4:01 AM
To: Gjalt-Jorn Peters; r-sig-mixed-models at r-project.org
Subject: Re: [R-sig-ME] lmer: effects of forcing fixed intercepts and slopes
Mixed models are not that scary. I would recommend to read Zuur et al
(2009). It was written with 'mainstream researchers' (in ecology) in mind.
It start with simple linear models and gradually adds complexity (glm, gam,
lmm, glmm, gamm, ...)
@BOOK{ZuurMixedModels,
title = {{M}ixed {E}ffects {M}odels and {E}xtensions in {E}cology with
{R}},
publisher = {Springer New York},
year = {2009},
author = {Zuur, Alain F. and Ieno, Elena N. and Walker, Neil J. and
Saveliev, Anatoly A. and Smith, Graham M.},
doi = {10.1007/978-0-387-87458-6}
}
ir. Thierry Onkelinx
Instituut voor natuur- en bosonderzoek / Research Institute for Nature and
Forest
team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance
Kliniekstraat 25
1070 Anderlecht
Belgium
+ 32 2 525 02 51
+ 32 54 43 61 85
Thierry.Onkelinx at inbo.be
www.inbo.be
To call in the statistician after the experiment is done may be no more than
asking him to perform a post-mortem examination: he may be able to say what
the experiment died of.
~ Sir Ronald Aylmer Fisher
The plural of anecdote is not data.
~ Roger Brinner
The combination of some data and an aching desire for an answer does not
ensure that a reasonable answer can be extracted from a given body of data.
~ John Tukey
-----Oorspronkelijk bericht-----
Van: r-sig-mixed-models-bounces at r-project.org
[mailto:r-sig-mixed-models-bounces at r-project.org] Namens Gjalt-Jorn Peters
Verzonden: dinsdag 6 november 2012 21:42
Aan: r-sig-mixed-models at r-project.org
Onderwerp: Re: [R-sig-ME] lmer: effects of forcing fixed intercepts and
slopes
Dear list,
Thierry, great, thank you very much for your quick reply! I will drop moment
as a random slope, and read up on the different hypotheses that are being
tested.
I have one more question. Basically, I have no background in multilevel (as
you may have guessed :-)). The reason I'm 'in over my head' like this, is
because I basically want to 'use the proper analysis' for my data, and the
only method is apparently mixed models. "All I want" is the simplest'
statistically decent, way to test whether cannabis use at the second
measurement moment is different in the group that received that intervention
as compared to the group that didn't.
However, when I try to learn about mixed models, the sources I encounter
approach the modelling practice very differently. They seem to be about much
more advanced issues; whether random intercepts and slopes should be
included, and for which variables, etc (to stick to those issues that I at
least kind of understand). Apparently, either mixed models are only used by
people who are statistically much more advanced (i.e. there's a gap between
'mainstream researchers' and the people who understand and use mixed
models), or in fact these sources _do_ discuss the same things, but in mixed
models the terminology just differs a lot from what you encounter in more
basic statistical textbooks.
I basically have the idea that although my requirements are very basic, I
have to learn lots of dark arcane issues to be able to do this properly.
This is kind of 'scary', as, for example, matrix algebra is, well, scary :-)
What do people here think of this? Is mixed models just something you should
avoid unless you're able & willing to really delve into its statistical
innards?
Again, thank you very much, kind regards,
Gjalt-Jorn
On 06-11-2012 17:25, ONKELINX, Thierry wrote:
Dear Gjalt-Jorn, Your null model is too complex for your data. Having only one measurement
per participant per moment, you cannot fit a random 'slope' along moment per participant. Note the perfect correlation in your null model for the nested random effect.
Even at the school levels, the amount of data is not that larger and you
end up with near perfect correlations in this random effect. So I would advise to drop moment as a random slope.
Don't forget that the summary of a model is testing different hypotheses
than an LRT between two models! You might do some reading on that topic or get some local statistical advise.
Best regards, Thierry ir. Thierry Onkelinx Instituut voor natuur- en bosonderzoek / Research Institute for Nature and Forest team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance Kliniekstraat 25 1070 Anderlecht Belgium + 32 2 525 02 51 + 32 54 43 61 85 Thierry.Onkelinx at inbo.be www.inbo.be To call in the statistician after the experiment is done may be no more
than asking him to perform a post-mortem examination: he may be able to say what the experiment died of.
~ Sir Ronald Aylmer Fisher The plural of anecdote is not data. ~ Roger Brinner The combination of some data and an aching desire for an answer does not
ensure that a reasonable answer can be extracted from a given body of data.
~ John Tukey -----Oorspronkelijk bericht----- Van: r-sig-mixed-models-bounces at r-project.org [mailto:r-sig-mixed-models-bounces at r-project.org] Namens Gjalt-Jorn Peters Verzonden: dinsdag 6 november 2012 16:23 Aan: r-sig-mixed-models at r-project.org Onderwerp: [R-sig-ME] lmer: effects of forcing fixed intercepts and slopes Dear all, I run into something I don't understand: I update a model with some terms;
none of the terms is significant; but the model suddenly fits A LOT better . . .
The background: I am running a model to test a relatively simple hypothesis: that an intervention aiming to reduce cannabis use is
effective. It's a repeated measures design where we measured cannabis use of each student before and after the intervention. In addition to having repeated measures, students are nested in schools. A simple plot of the percentage of cannabis users before and after the intervention, in the control and the intervention groups, is at http://sciencerep.org/files/7/plot.png (this plot ignores the schools).
This is the datafile:
<R CODE>
### Load data
dat.long <-
read.table("http://sciencerep.org/files/7/the%20cannabis%20show%20-%20
data%20in%20long%20format.tsv",
header=TRUE, sep = "\t");
### Set 'participant' as factor
dat.long$participant <- factor(dat.long$id);
head(dat.long);
</R CODE>
This is what the head looks like:
id moment school cannabisShow gender age usedCannabis_bi participant
1 1 before Zuidoost Intervention 2 NA NA 1
2 2 before Zuidoost Intervention 2 NA 0 2
3 3 before Zuidoost Intervention 1 NA 1 3
4 4 before Noord Intervention NA NA NA 4
5 5 before Noord Intervention NA NA 1 5
6 6 before Noord Intervention 1 NA NA 6
'school' has 8 levels;
'moment' has 2 levels ('before' and 'after'); 'cannabisShow' has 2 levels,
'Intervention' and 'Control'; 'usedCannabis_bi' has 2 levels, 0 and 1; and participants is the participant identifyer.
I run a null model and a 'real' model, comparing the fit. These are the
formulations I use:
<R CODE>
rep_measures.1.null <- lmer(formula = usedCannabis_bi ~
1 + moment + (1 + moment | school /
participant),
family=binomial(link = "logit"),
data=dat.long);
rep_measures.1.model <- update(rep_measures.1.null, .~. +
moment*cannabisShow);
rep_measures.1.null;
rep_measures.1.model;
anova(rep_measures.1.null, rep_measures.1.model); </R CODE>
The second model, where I introduce the interaction between measurement
moment and whether participants received the intervention (this should reflect an effect of the intervention), fits considerably better than the original model. But, the interaction is not significant. In fact, none of the fixed effects is - so I added terms to the model, none of these terms significantly contributes to the prediction of cannabis use, yet the model fits a lot better.
This seems to be a paradox. Could anybody maybe explain how this is
possible?
I also looked at the situation where I impose fixed intercepts and slopes
on the participant level (so intercepts and slopes could only vary per school):
<R CODE>
rep_measures.2.null <- lmer(formula = usedCannabis_bi ~
1 + moment + (1 + moment | school),
family=binomial(link = "logit"),
data=dat.long);
rep_measures.2.model <- update(rep_measures.2.null, .~. +
moment*cannabisShow);
rep_measures.2.null;
rep_measures.2.model;
anova(rep_measures.2.null, rep_measures.2.model); </R CODE>
Now the interaction between 'measurement moment' and 'intervention' is
significant, as I expected; but the improvement in fit between the null model and the 'full model' is much, much smaller.
This is very counter-intuitive to me - I have the feeling I'm missing
something basic, but I have no idea what. Any help is much appreciated!
Thank you very much in advance, kind regards, Gjalt-Jorn PS: the file with the analyses is at http://sciencerep.org/files/7/the%20cannabis%20show%20-%20analyses%20f or%20mailing%20list.r
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models * * * * * * * * * * * * * D I S C L A I M E R * * * * * * * * * * * * * Dit bericht en eventuele bijlagen geven enkel de visie van de schrijver
weer en binden het INBO onder geen enkel beding, zolang dit bericht niet bevestigd is door een geldig ondertekend document.
The views expressed in this message and any annex are purely those of the
writer and may not be regarded as stating an official position of INBO, as long as the message is not confirmed by a duly signed document. _______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models * * * * * * * * * * * * * D I S C L A I M E R * * * * * * * * * * * * * Dit bericht en eventuele bijlagen geven enkel de visie van de schrijver weer en binden het INBO onder geen enkel beding, zolang dit bericht niet bevestigd is door een geldig ondertekend document. The views expressed in this message and any annex are purely those of the writer and may not be regarded as stating an official position of INBO, as long as the message is not confirmed by a duly signed document. _______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
Dear list, Thierry, Steven and Seth: thank you so much for your references! These provide an excellent starting point. Steven, the articles references are great to get started until I obtained one or more of the books. Regardless of how hard mixed models will prove, let it never be said that the community isn't helpful :-) Again, thank you very much, kind regards, Gjalt-Jorn
On 07-11-2012 15:22, Seth W. Bigelow wrote:
Let me slip in a word of praise for Simon Wood's book, 'Generalized Additive
Models', particularly chapter 6 on mixed models. The man is a genius for
explaining statistics, and his introduction to mixed models is the clearest
I've come across. It's canonical for me!
--Seth
-----Original Message-----
From: r-sig-mixed-models-bounces at r-project.org
[mailto:r-sig-mixed-models-bounces at r-project.org] On Behalf Of ONKELINX,
Thierry
Sent: Wednesday, November 07, 2012 4:01 AM
To: Gjalt-Jorn Peters; r-sig-mixed-models at r-project.org
Subject: Re: [R-sig-ME] lmer: effects of forcing fixed intercepts and slopes
Mixed models are not that scary. I would recommend to read Zuur et al
(2009). It was written with 'mainstream researchers' (in ecology) in mind.
It start with simple linear models and gradually adds complexity (glm, gam,
lmm, glmm, gamm, ...)
@BOOK{ZuurMixedModels,
title = {{M}ixed {E}ffects {M}odels and {E}xtensions in {E}cology with
{R}},
publisher = {Springer New York},
year = {2009},
author = {Zuur, Alain F. and Ieno, Elena N. and Walker, Neil J. and
Saveliev, Anatoly A. and Smith, Graham M.},
doi = {10.1007/978-0-387-87458-6}
}
ir. Thierry Onkelinx
Instituut voor natuur- en bosonderzoek / Research Institute for Nature and
Forest
team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance
Kliniekstraat 25
1070 Anderlecht
Belgium
+ 32 2 525 02 51
+ 32 54 43 61 85
Thierry.Onkelinx at inbo.be
www.inbo.be
To call in the statistician after the experiment is done may be no more than
asking him to perform a post-mortem examination: he may be able to say what
the experiment died of.
~ Sir Ronald Aylmer Fisher
The plural of anecdote is not data.
~ Roger Brinner
The combination of some data and an aching desire for an answer does not
ensure that a reasonable answer can be extracted from a given body of data.
~ John Tukey
-----Oorspronkelijk bericht-----
Van: r-sig-mixed-models-bounces at r-project.org
[mailto:r-sig-mixed-models-bounces at r-project.org] Namens Gjalt-Jorn Peters
Verzonden: dinsdag 6 november 2012 21:42
Aan: r-sig-mixed-models at r-project.org
Onderwerp: Re: [R-sig-ME] lmer: effects of forcing fixed intercepts and
slopes
Dear list,
Thierry, great, thank you very much for your quick reply! I will drop moment
as a random slope, and read up on the different hypotheses that are being
tested.
I have one more question. Basically, I have no background in multilevel (as
you may have guessed :-)). The reason I'm 'in over my head' like this, is
because I basically want to 'use the proper analysis' for my data, and the
only method is apparently mixed models. "All I want" is the simplest'
statistically decent, way to test whether cannabis use at the second
measurement moment is different in the group that received that intervention
as compared to the group that didn't.
However, when I try to learn about mixed models, the sources I encounter
approach the modelling practice very differently. They seem to be about much
more advanced issues; whether random intercepts and slopes should be
included, and for which variables, etc (to stick to those issues that I at
least kind of understand). Apparently, either mixed models are only used by
people who are statistically much more advanced (i.e. there's a gap between
'mainstream researchers' and the people who understand and use mixed
models), or in fact these sources _do_ discuss the same things, but in mixed
models the terminology just differs a lot from what you encounter in more
basic statistical textbooks.
I basically have the idea that although my requirements are very basic, I
have to learn lots of dark arcane issues to be able to do this properly.
This is kind of 'scary', as, for example, matrix algebra is, well, scary :-)
What do people here think of this? Is mixed models just something you should
avoid unless you're able & willing to really delve into its statistical
innards?
Again, thank you very much, kind regards,
Gjalt-Jorn
On 06-11-2012 17:25, ONKELINX, Thierry wrote:
Dear Gjalt-Jorn, Your null model is too complex for your data. Having only one measurement
per participant per moment, you cannot fit a random 'slope' along moment per participant. Note the perfect correlation in your null model for the nested random effect.
Even at the school levels, the amount of data is not that larger and you
end up with near perfect correlations in this random effect. So I would advise to drop moment as a random slope.
Don't forget that the summary of a model is testing different hypotheses
than an LRT between two models! You might do some reading on that topic or get some local statistical advise.
Best regards, Thierry ir. Thierry Onkelinx Instituut voor natuur- en bosonderzoek / Research Institute for Nature and Forest team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance Kliniekstraat 25 1070 Anderlecht Belgium + 32 2 525 02 51 + 32 54 43 61 85 Thierry.Onkelinx at inbo.be www.inbo.be To call in the statistician after the experiment is done may be no more
than asking him to perform a post-mortem examination: he may be able to say what the experiment died of.
~ Sir Ronald Aylmer Fisher The plural of anecdote is not data. ~ Roger Brinner The combination of some data and an aching desire for an answer does not
ensure that a reasonable answer can be extracted from a given body of data.
~ John Tukey -----Oorspronkelijk bericht----- Van: r-sig-mixed-models-bounces at r-project.org [mailto:r-sig-mixed-models-bounces at r-project.org] Namens Gjalt-Jorn Peters Verzonden: dinsdag 6 november 2012 16:23 Aan: r-sig-mixed-models at r-project.org Onderwerp: [R-sig-ME] lmer: effects of forcing fixed intercepts and slopes Dear all, I run into something I don't understand: I update a model with some terms;
none of the terms is significant; but the model suddenly fits A LOT better . . .
The background: I am running a model to test a relatively simple hypothesis: that an intervention aiming to reduce cannabis use is
effective. It's a repeated measures design where we measured cannabis use of each student before and after the intervention. In addition to having repeated measures, students are nested in schools. A simple plot of the percentage of cannabis users before and after the intervention, in the control and the intervention groups, is at http://sciencerep.org/files/7/plot.png (this plot ignores the schools).
This is the datafile:
<R CODE>
### Load data
dat.long <-
read.table("http://sciencerep.org/files/7/the%20cannabis%20show%20-%20
data%20in%20long%20format.tsv",
header=TRUE, sep = "\t");
### Set 'participant' as factor
dat.long$participant <- factor(dat.long$id);
head(dat.long);
</R CODE>
This is what the head looks like:
id moment school cannabisShow gender age usedCannabis_bi participant
1 1 before Zuidoost Intervention 2 NA NA 1
2 2 before Zuidoost Intervention 2 NA 0 2
3 3 before Zuidoost Intervention 1 NA 1 3
4 4 before Noord Intervention NA NA NA 4
5 5 before Noord Intervention NA NA 1 5
6 6 before Noord Intervention 1 NA NA 6
'school' has 8 levels;
'moment' has 2 levels ('before' and 'after'); 'cannabisShow' has 2 levels,
'Intervention' and 'Control'; 'usedCannabis_bi' has 2 levels, 0 and 1; and participants is the participant identifyer.
I run a null model and a 'real' model, comparing the fit. These are the
formulations I use:
<R CODE>
rep_measures.1.null <- lmer(formula = usedCannabis_bi ~
1 + moment + (1 + moment | school /
participant),
family=binomial(link = "logit"),
data=dat.long);
rep_measures.1.model <- update(rep_measures.1.null, .~. +
moment*cannabisShow);
rep_measures.1.null;
rep_measures.1.model;
anova(rep_measures.1.null, rep_measures.1.model); </R CODE>
The second model, where I introduce the interaction between measurement
moment and whether participants received the intervention (this should reflect an effect of the intervention), fits considerably better than the original model. But, the interaction is not significant. In fact, none of the fixed effects is - so I added terms to the model, none of these terms significantly contributes to the prediction of cannabis use, yet the model fits a lot better.
This seems to be a paradox. Could anybody maybe explain how this is
possible?
I also looked at the situation where I impose fixed intercepts and slopes
on the participant level (so intercepts and slopes could only vary per school):
<R CODE>
rep_measures.2.null <- lmer(formula = usedCannabis_bi ~
1 + moment + (1 + moment | school),
family=binomial(link = "logit"),
data=dat.long);
rep_measures.2.model <- update(rep_measures.2.null, .~. +
moment*cannabisShow);
rep_measures.2.null;
rep_measures.2.model;
anova(rep_measures.2.null, rep_measures.2.model); </R CODE>
Now the interaction between 'measurement moment' and 'intervention' is
significant, as I expected; but the improvement in fit between the null model and the 'full model' is much, much smaller.
This is very counter-intuitive to me - I have the feeling I'm missing
something basic, but I have no idea what. Any help is much appreciated!
Thank you very much in advance, kind regards, Gjalt-Jorn PS: the file with the analyses is at http://sciencerep.org/files/7/the%20cannabis%20show%20-%20analyses%20f or%20mailing%20list.r
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models * * * * * * * * * * * * * D I S C L A I M E R * * * * * * * * * * * * * Dit bericht en eventuele bijlagen geven enkel de visie van de schrijver
weer en binden het INBO onder geen enkel beding, zolang dit bericht niet bevestigd is door een geldig ondertekend document.
The views expressed in this message and any annex are purely those of the
writer and may not be regarded as stating an official position of INBO, as
long as the message is not confirmed by a duly signed document.
[[alternative HTML version deleted]]
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models * * * * * * * * * * * * * D I S C L A I M E R * * * * * * * * * * * * * Dit bericht en eventuele bijlagen geven enkel de visie van de schrijver weer en binden het INBO onder geen enkel beding, zolang dit bericht niet bevestigd is door een geldig ondertekend document. The views expressed in this message and any annex are purely those of the writer and may not be regarded as stating an official position of INBO, as long as the message is not confirmed by a duly signed document. _______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models _______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models