Hi, I am wondering if it is still meaningful to run a mixed model if a longitudinal dataset has only 2 time points (baseline and week 4)? Would it be more appropriate to simply take the difference between the 2 time points and run ANOVA (ANCOVA) on the difference? what about still running mixed model on the difference of the 2 time points, but adding baseline measurement as a random factor? Thanks for sharing your thoughts. John
longitudinal with 2 time points
15 messages · John Maindonald, Charles E. (Ted) Wright, array chip +2 more
All these are possibilities, except maybe making baseline measurement a random factor. This would make sense only if data divide into groups, and you want the baseline effect to vary randomly from group to group. That may limit your ability to estimate parameters that are of interest. In most circumstances that I am familiar with, it makes better sense to treat baseline effect as fixed. John. John Maindonald email: john.maindonald at anu.edu.au phone : +61 2 (6125)3473 fax : +61 2(6125)5549 Centre for Mathematics & Its Applications, Room 1194, John Dedman Mathematical Sciences Building (Building 27) Australian National University, Canberra ACT 0200. http://www.maths.anu.edu.au/~johnm
On 11/08/2010, at 8:11 AM, array chip wrote:
Hi, I am wondering if it is still meaningful to run a mixed model if a longitudinal dataset has only 2 time points (baseline and week 4)? Would it be more appropriate to simply take the difference between the 2 time points and run ANOVA (ANCOVA) on the difference? what about still running mixed model on the difference of the 2 time points, but adding baseline measurement as a random factor? Thanks for sharing your thoughts. John
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
Keep in mind that running an ANOVA on the difference is not the same thing as using the baseline data as a covariate in an ANOVA on the Week 4 data. Essentially the ANOVA on the differences is like the ANCOVA with the slope constrained to be 1. Ted Wright
On Wed, 11 Aug 2010, John Maindonald wrote:
All these are possibilities, except maybe making baseline measurement a random factor. This would make sense only if data divide into groups, and you want the baseline effect to vary randomly from group to group. That may limit your ability to estimate parameters that are of interest. In most circumstances that I am familiar with, it makes better sense to treat baseline effect as fixed. John. John Maindonald email: john.maindonald at anu.edu.au phone : +61 2 (6125)3473 fax : +61 2(6125)5549 Centre for Mathematics & Its Applications, Room 1194, John Dedman Mathematical Sciences Building (Building 27) Australian National University, Canberra ACT 0200. http://www.maths.anu.edu.au/~johnm On 11/08/2010, at 8:11 AM, array chip wrote:
Hi, I am wondering if it is still meaningful to run a mixed model if a longitudinal dataset has only 2 time points (baseline and week 4)? Would it be more appropriate to simply take the difference between the 2 time points and run ANOVA (ANCOVA) on the difference? what about still running mixed model on the difference of the 2 time points, but adding baseline measurement as a random factor? Thanks for sharing your thoughts. John
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
Hi, I'll throw in a reference that covers some of these issues: Statistics Notes Analysing controlled trials with baseline and follow up measurements Vickers and Altman BMJ. 2001 November 10; 323(7321): 1123?1124. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1121605/ The basic model specification would of course be: lm(4Wks ~ Baseline + Group) You will also want to test for an interaction between the baseline score and your grouping factor, in case the observed group (eg. treatment) effect is dependent upon the value of the baseline measurement. In this case, unlike in the above paper, you of course end up with crossing fitted regression lines, rather than parallel lines. HTH, Marc Schwartz
On Aug 11, 2010, at 7:34 AM, Charles E. (Ted) Wright wrote:
Keep in mind that running an ANOVA on the difference is not the same thing as using the baseline data as a covariate in an ANOVA on the Week 4 data. Essentially the ANOVA on the differences is like the ANCOVA with the slope constrained to be 1. Ted Wright On Wed, 11 Aug 2010, John Maindonald wrote:
All these are possibilities, except maybe making baseline measurement a random factor. This would make sense only if data divide into groups, and you want the baseline effect to vary randomly from group to group. That may limit your ability to estimate parameters that are of interest. In most circumstances that I am familiar with, it makes better sense to treat baseline effect as fixed. John. On 11/08/2010, at 8:11 AM, array chip wrote:
Hi, I am wondering if it is still meaningful to run a mixed model if a longitudinal dataset has only 2 time points (baseline and week 4)? Would it be more appropriate to simply take the difference between the 2 time points and run ANOVA (ANCOVA) on the difference? what about still running mixed model on the difference of the 2 time points, but adding baseline measurement as a random factor? Thanks for sharing your thoughts. John
Thank you John. I agree making baseline as a random factor is not a good idea. The data have treatment groups and age and gender for each subject. The purpose of the study is to investigate the treatment effect on the change of the study endpoint?(glucose level) between week 4?and baseline. I am thinking of several models/methods to analye the data: 1. mixed model with fixed time and random intercept: lmer(y ~ treatment + gender + age + time + (1|subject)??? where time = 0 or 4 2. mixed model with random intercept and random slope lmer(y ~ treatment + gender + age + time + (time|subject) 3. mixed?model with random intercept but no fixed time factor: lmer(y ~ treatment + gender + age + (1|subject) 4. calculate delta.y = difference of y between week 4?& baseline lm(delta.y ~ treatment + gender + age) 5. same as 4, but add baseline as a factor lm(delta.y ~ baseline.y + treatment + gender + age) My thinking on these 5 models are: model 1 and 2 have a limitation that they impose a linear relationship of y versus time, which may not be sensible with 2 time points. Model 3 simply treats baseline and week?4 as repeated measures, not imposing linear relationship. Model 4 & 5 are based on the difference between baseline and week 4, except that model 5 adds baseline as a covariate. The reason of adding baseline as covariate is based on assumption that the extent of the change of y between week 4 and baseline depends on the?level of baseline. Anyone has any suggestions on which one you would use? Thanks! John ----- Original Message ---- From: John Maindonald <john.maindonald at anu.edu.au> To: array chip <arrayprofile at yahoo.com> Cc: r-sig-mixed-models at r-project.org Sent: Wed, August 11, 2010 12:04:01 AM Subject: Re: [R-sig-ME] longitudinal with 2 time points All these are possibilities, except maybe making baseline measurement a random factor.? This would make sense only if data divide into groups, and you want the baseline effect to vary randomly from group to group.? That may limit your ability to estimate parameters that are of interest. In most circumstances that I am familiar with, it makes better sense to treat baseline effect as fixed. John. John Maindonald? ? ? ? ? ? email: john.maindonald at anu.edu.au phone : +61 2 (6125)3473? ? fax? : +61 2(6125)5549 Centre for Mathematics & Its Applications, Room 1194, John Dedman Mathematical Sciences Building (Building 27) Australian National University, Canberra ACT 0200. http://www.maths.anu.edu.au/~johnm
On 11/08/2010, at 8:11 AM, array chip wrote:
Hi, I am wondering if it is still meaningful to run a mixed model if a longitudinal dataset has only 2 time points (baseline and week 4)? Would it be
more appropriate to simply take the difference between the 2 time points and run ANOVA (ANCOVA) on the difference? what about still running mixed model on the difference of the 2 time points, but adding baseline measurement as a random factor? Thanks for sharing your thoughts. John
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
Thank you Ted for pointing this out. See my response to John's reply. What would you think of the model 5 where I used ANCOVA on the difference between week 5 & baseline and also included baseline as a covariate? Thanks John ----- Original Message ---- From: Charles E. (Ted) Wright <cewright at uci.edu> To: John Maindonald <john.maindonald at anu.edu.au> Cc: array chip <arrayprofile at yahoo.com>; r-sig-mixed-models at r-project.org Sent: Wed, August 11, 2010 5:34:21 AM Subject: Re: [R-sig-ME] longitudinal with 2 time points Keep in mind that running an ANOVA on the difference is not the same thing as using the baseline data as a covariate in an ANOVA on the Week 4 data. Essentially the ANOVA on the differences is like the ANCOVA with the slope constrained to be 1. Ted Wright
On Wed, 11 Aug 2010, John Maindonald wrote:
All these are possibilities, except maybe making baseline measurement a random factor.? This would make sense only if data divide into groups, and you want the baseline effect to vary randomly from group to group. That may limit your ability to estimate parameters that are of interest. In most circumstances that I am familiar with, it makes better sense to treat baseline effect as fixed. John. John Maindonald? ? ? ? ? ? email: john.maindonald at anu.edu.au phone : +61 2 (6125)3473? ? fax? : +61 2(6125)5549 Centre for Mathematics & Its Applications, Room 1194, John Dedman Mathematical Sciences Building (Building 27) Australian National University, Canberra ACT 0200. http://www.maths.anu.edu.au/~johnm On 11/08/2010, at 8:11 AM, array chip wrote:
Hi, I am wondering if it is still meaningful to run a mixed model if a longitudinal dataset has only 2 time points (baseline and week 4)? Would it
be
more appropriate to simply take the difference between the 2 time points and run ANOVA (ANCOVA) on the difference? what about still running mixed model on the difference of the 2 time points, but adding baseline measurement as a random factor? Thanks for sharing your thoughts. John
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
Hi Marc, Thanks for the reference. I will definitely read it. Please see my reponse to John's reply. Your model is another model I should add to the 5 models I proposed in that email. What's your overall thoughts on these different models? Thank you for sharing. John ----- Original Message ---- From: Marc Schwartz <marc_schwartz at me.com> To: Charles E. (Ted) Wright <cewright at uci.edu>; array chip <arrayprofile at yahoo.com> Cc: John Maindonald <john.maindonald at anu.edu.au>; r-sig-mixed-models at r-project.org Sent: Wed, August 11, 2010 6:20:13 AM Subject: Re: [R-sig-ME] longitudinal with 2 time points Hi, I'll throw in a reference that covers some of these issues: Statistics Notes Analysing controlled trials with baseline and follow up measurements Vickers and Altman BMJ. 2001 November 10; 323(7321): 1123?1124. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1121605/ The basic model specification would of course be: ? lm(4Wks ~ Baseline + Group) You will also want to test for an interaction between the baseline score and your grouping factor, in case the observed group (eg. treatment) effect is dependent upon the value of the baseline measurement. In this case, unlike in the above paper, you of course end up with crossing fitted regression lines, rather than parallel lines. HTH, Marc Schwartz
On Aug 11, 2010, at 7:34 AM, Charles E. (Ted) Wright wrote:
Keep in mind that running an ANOVA on the difference is not the same thing as using the baseline data as a covariate in an ANOVA on the Week 4 data. Essentially the ANOVA on the differences is like the ANCOVA with the slope constrained to be 1. Ted Wright On Wed, 11 Aug 2010, John Maindonald wrote:
All these are possibilities, except maybe making baseline measurement a random factor.? This would make sense only if data divide into groups, and you want the baseline effect to vary randomly from group to group. That may limit your ability to estimate parameters that are of interest. In most circumstances that I am familiar with, it makes better sense to treat baseline effect as fixed. John. On 11/08/2010, at 8:11 AM, array chip wrote:
Hi, I am wondering if it is still meaningful to run a mixed model if a longitudinal dataset has only 2 time points (baseline and week 4)? Would it
be
more appropriate to simply take the difference between the 2 time points and run ANOVA (ANCOVA) on the difference? what about still running mixed model on
the
difference of the 2 time points, but adding baseline measurement as a random factor? Thanks for sharing your thoughts. John
Hi John, If there are only two time points per subject I think model 2 should throw an error because the residual variance and (time|Subject) (co)variances cannot be uniquely estimated. You can get around this problem by moving the (time|Subject) term into the residual term and dropping it from the random terms using MCMCglmm or ASReml: MCMCglmm(y ~ treatment + gender + age + time, rcov=~ us(as.factor(time)):subject, ... This route was also suggested by Ben Bolker and John Maindonald for coping with negative variances. However, when I try: set.seed(1) subject<-gl(50,2) time<-gl(2,1,100) y<-rnorm(100) summary(lmer(y~time+(time|subject))) I get estimates of all terms and so may be they can be uniquely estimated (although it would surprise me a lot)? Jarrod
On 12 Aug 2010, at 06:33, array chip wrote:
Thank you Ted for pointing this out. See my response to John's reply. What would you think of the model 5 where I used ANCOVA on the difference between week 5 & baseline and also included baseline as a covariate? Thanks John ----- Original Message ---- From: Charles E. (Ted) Wright <cewright at uci.edu> To: John Maindonald <john.maindonald at anu.edu.au> Cc: array chip <arrayprofile at yahoo.com>; r-sig-mixed-models at r-project.org Sent: Wed, August 11, 2010 5:34:21 AM Subject: Re: [R-sig-ME] longitudinal with 2 time points Keep in mind that running an ANOVA on the difference is not the same thing as using the baseline data as a covariate in an ANOVA on the Week 4 data. Essentially the ANOVA on the differences is like the ANCOVA with the slope constrained to be 1. Ted Wright On Wed, 11 Aug 2010, John Maindonald wrote:
All these are possibilities, except maybe making baseline measurement a random factor. This would make sense only if data divide into groups, and you want the baseline effect to vary randomly from group to group. That may limit your ability to estimate parameters that are of interest. In most circumstances that I am familiar with, it makes better sense to treat baseline effect as fixed. John. John Maindonald email: john.maindonald at anu.edu.au phone : +61 2 (6125)3473 fax : +61 2(6125)5549 Centre for Mathematics & Its Applications, Room 1194, John Dedman Mathematical Sciences Building (Building 27) Australian National University, Canberra ACT 0200. http://www.maths.anu.edu.au/~johnm On 11/08/2010, at 8:11 AM, array chip wrote:
Hi, I am wondering if it is still meaningful to run a mixed model if a longitudinal dataset has only 2 time points (baseline and week 4)? Would it
be
more appropriate to simply take the difference between the 2 time points and run ANOVA (ANCOVA) on the difference? what about still running mixed model on the difference of the 2 time points, but adding baseline measurement as a random factor? Thanks for sharing your thoughts. John
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.
Hi John, If you read that article, you will see that your use of delta.y as the dependent variable does not make sense. Thus, I would re-express your model 5 as: lm(wk4.glucose ~ baseline.glucose + treatment + gender + age) and as noted, check for the interaction between baseline glucose and treatment: lm(wk4.glucose ~ baseline.glucose * treatment + gender + age) You might also want to consider using a spline function on age, presuming that age is hopefully measured as a continuous variable (eg. not ordinal groups). Since the ANCOVA based approach described in the paper is essentially an OLS linear regression, you can of course include the additional covariates for adjustment. If the interaction term p value is >0.1 (a common threshold), you can remove it and the beta coefficient and its CIs for the treatment factor is your estimated treatment effect relative to your control. For the presentation of the results, besides the obvious tabular summaries and the scatter/regression lines plot, include a series of plots showing selected baseline values and the treatment versus control predicted follow up values and CIs for the same baseline value in each plot. This visually shows the common estimated treatment effect for each baseline value, which will also tend to reveal regression to the mean. This presentation is especially helpful if the interaction term is retained, which therefore shows how the treatment effect varies and will reverse, over the range of the baseline values. You can select a series of clinically relevant values over the range of the observed baseline values, and/or by default, select a five number plus mean series over the observed baseline values. I don't see a role for a mixed effects model here, given that this is a pretty straightforward "change from baseline" type design, but there are many here with greater expertise than I. If this was a cross-over design, you have multiple measures of glucose for each patient at each time point, more than two time points, or a multi-center study, then a mixed effects model would make more sense to me. HTH, Marc
On Aug 12, 2010, at 12:39 AM, array chip wrote:
Hi Marc, Thanks for the reference. I will definitely read it. Please see my reponse to John's reply. Your model is another model I should add to the 5 models I proposed in that email. What's your overall thoughts on these different models? Thank you for sharing. John ----- Original Message ---- From: Marc Schwartz <marc_schwartz at me.com> To: Charles E. (Ted) Wright <cewright at uci.edu>; array chip <arrayprofile at yahoo.com> Cc: John Maindonald <john.maindonald at anu.edu.au>; r-sig-mixed-models at r-project.org Sent: Wed, August 11, 2010 6:20:13 AM Subject: Re: [R-sig-ME] longitudinal with 2 time points Hi, I'll throw in a reference that covers some of these issues: Statistics Notes Analysing controlled trials with baseline and follow up measurements Vickers and Altman BMJ. 2001 November 10; 323(7321): 1123?1124. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1121605/ The basic model specification would of course be: lm(4Wks ~ Baseline + Group) You will also want to test for an interaction between the baseline score and your grouping factor, in case the observed group (eg. treatment) effect is dependent upon the value of the baseline measurement. In this case, unlike in the above paper, you of course end up with crossing fitted regression lines, rather than parallel lines. HTH, Marc Schwartz On Aug 11, 2010, at 7:34 AM, Charles E. (Ted) Wright wrote:
Keep in mind that running an ANOVA on the difference is not the same thing as using the baseline data as a covariate in an ANOVA on the Week 4 data. Essentially the ANOVA on the differences is like the ANCOVA with the slope constrained to be 1. Ted Wright On Wed, 11 Aug 2010, John Maindonald wrote:
All these are possibilities, except maybe making baseline measurement a random factor. This would make sense only if data divide into groups, and you want the baseline effect to vary randomly from group to group. That may limit your ability to estimate parameters that are of interest. In most circumstances that I am familiar with, it makes better sense to treat baseline effect as fixed. John. On 11/08/2010, at 8:11 AM, array chip wrote:
Hi, I am wondering if it is still meaningful to run a mixed model if a longitudinal dataset has only 2 time points (baseline and week 4)? Would it
be
more appropriate to simply take the difference between the 2 time points and run ANOVA (ANCOVA) on the difference? what about still running mixed model on
the
difference of the 2 time points, but adding baseline measurement as a random factor? Thanks for sharing your thoughts. John
Marc, Thanks for sharing your insights. Let's take this?model as an example: ?lm(wk4.glucose ~ baseline.glucose + treatment + gender + age) Because the investigator is interested in knowing whether the?CHANGE of glucose in week 4 from baseline is different between treatment and control, Is it still legitimate to ask whether and?HOW can we test this hypothesis? I think the coefficient of the?treatment factor is only testing whether the week 4 glucose level is different between treatment and control, but not testing whether the?CHANGE of week 4 glucose level with respect to baseline is different between treatment and control. Thanks again for your suggestion. Yi ? ----- Original Message ---- From: Marc Schwartz <marc_schwartz at me.com> To: array chip <arrayprofile at yahoo.com> Cc: Charles E. (Ted) Wright <cewright at uci.edu>; John Maindonald <john.maindonald at anu.edu.au>; r-sig-mixed-models at r-project.org Sent: Thu, August 12, 2010 6:02:29 AM Subject: Re: [R-sig-ME] longitudinal with 2 time points Hi John, If you read that article, you will see that your use of delta.y as the dependent variable does not make sense. Thus, I would re-express your model 5 as: ? lm(wk4.glucose ~ baseline.glucose + treatment + gender + age) and as noted, check for the interaction between baseline glucose and treatment: ? lm(wk4.glucose ~ baseline.glucose * treatment + gender + age) You might also want to consider using a spline function on age, presuming that age is hopefully measured as a continuous variable (eg. not ordinal groups). Since the ANCOVA based approach described in the paper is essentially an OLS linear regression, you can of course include the additional covariates for adjustment. If the interaction term p value is >0.1 (a common threshold), you can remove it and the beta coefficient and its CIs for the treatment factor is your estimated treatment effect relative to your control. For the presentation of the results, besides the obvious tabular summaries and the scatter/regression lines plot, include a series of plots showing selected baseline values and the treatment versus control predicted follow up values and CIs for the same baseline value in each plot. This visually shows the common estimated treatment effect for each baseline value, which will also tend to reveal regression to the mean. This presentation is especially helpful if the interaction term is retained, which therefore shows how the treatment effect varies and will reverse, over the range of the baseline values. You can select a series of clinically relevant values over the range of the observed baseline values, and/or by default, select a five number plus mean series over the observed baseline values. I don't see a role for a mixed effects model here, given that this is a pretty straightforward "change from baseline" type design, but there are many here with greater expertise than I. If this was a cross-over design, you have multiple measures of glucose for each patient at each time point, more than two time points, or a multi-center study, then a mixed effects model would make more sense to me. HTH, Marc
On Aug 12, 2010, at 12:39 AM, array chip wrote:
Hi Marc, Thanks for the reference. I will definitely read it. Please see my reponse to John's reply. Your model is another model I should add to the 5 models I proposed in that email. What's your overall thoughts on these different
models?
Thank you for sharing. John ----- Original Message ---- From: Marc Schwartz <marc_schwartz at me.com> To: Charles E. (Ted) Wright <cewright at uci.edu>; array chip <arrayprofile at yahoo.com> Cc: John Maindonald <john.maindonald at anu.edu.au>; r-sig-mixed-models at r-project.org Sent: Wed, August 11, 2010 6:20:13 AM Subject: Re: [R-sig-ME] longitudinal with 2 time points Hi, I'll throw in a reference that covers some of these issues: Statistics Notes Analysing controlled trials with baseline and follow up measurements Vickers and Altman BMJ. 2001 November 10; 323(7321): 1123?1124. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1121605/ The basic model specification would of course be: ? lm(4Wks ~ Baseline + Group) You will also want to test for an interaction between the baseline score and your grouping factor, in case the observed group (eg. treatment) effect is dependent upon the value of the baseline measurement. In this case, unlike in the above paper, you of course end up with crossing fitted regression lines, rather than parallel lines. HTH, Marc Schwartz On Aug 11, 2010, at 7:34 AM, Charles E. (Ted) Wright wrote:
Keep in mind that running an ANOVA on the difference is not the same thing as
using the baseline data as a covariate in an ANOVA on the Week 4 data. Essentially the ANOVA on the differences is like the ANCOVA with the slope constrained to be 1. Ted Wright On Wed, 11 Aug 2010, John Maindonald wrote:
All these are possibilities, except maybe making baseline measurement a random factor.? This would make sense only if data divide into groups, and you want the baseline effect to vary randomly from group to group. That may limit your ability to estimate parameters that are of interest. In most circumstances that I am familiar with, it makes better sense to treat baseline effect as fixed. John. On 11/08/2010, at 8:11 AM, array chip wrote:
Hi, I am wondering if it is still meaningful to run a mixed model if a longitudinal dataset has only 2 time points (baseline and week 4)? Would it
be
more appropriate to simply take the difference between the 2 time points and
run ANOVA (ANCOVA) on the difference? what about still running mixed model on
the
difference of the 2 time points, but adding baseline measurement as a
random
factor? Thanks for sharing your thoughts. John
John, That you are asking this question indicates that either you have yet to read the article or that you need to re-read it, as you have not comprehended the content. The beta coefficient for treatment IS the difference in mean glucose change between baseline and 4 weeks **attributable to treatment**, after adjusting for any baseline differences in glucose between the two groups. That is also presuming that there is no interaction at baseline. For example, let's say that the beta for treatment is -20. Then, at 4 weeks, given the same baseline glucose level, we would predict that, on average, the treatment group will have a glucose level 20 mg/dl less than the control group. In the absence of an interaction, we would estimate the same average treatment difference at 4 weeks of 20 mg/dl whether the baseline glucose was 300 mg/dl or 100 mg/dl. However, given regression to the mean, we might reasonably expect the patient with a 300 mg/dl baseline level to have a greater mean reduction at 4 weeks as compared to the patient with a 100 mg/dl baseline level. We might also expect a patient with a glucose level at the low end of the baseline range (eg. 50 mg/dl) to experience an average increase in glucose level at 4 weeks, presuming that your inclusion/exclusion criteria permitted patients with below normal glucose levels. But the difference will still be, on average, 20 mg/dl between the two treatment groups. So the patient with a 300 mg/dl baseline level might have an average reduction to 200 mg/dl at 4 weeks on the control treatment, whereas the same patient on the active treatment would have an average reduction to 180 mg/dl (a difference of -20). The patient with a 100 mg/dl baseline level might have an average reduction to 90 mg/dl at 4 weeks on the control treatment, whereas the same patient on the active treatment would have an average reduction to 70 mg/dl (again, a difference of -20). The patient with a 50 mg/dl baseline level might have an average increase to 90 mg/dl at 4 weeks on the control treatment, whereas the same patient on the active treatment would have an average increase to 70 mg/dl (yet again, a difference of -20). So your conclusion would be that on average, between baseline and 4 weeks, glucose levels were reduced by 20 mg/dl more in the active treatment group relative to control. This difference is the vertical separation in the two parallel fitted regression lines as shown in the figure in the paper. So the method is answering exactly the question the investigator is asking. Marc
On Aug 13, 2010, at 1:02 AM, array chip wrote:
Marc, Thanks for sharing your insights. Let's take this model as an example: lm(wk4.glucose ~ baseline.glucose + treatment + gender + age) Because the investigator is interested in knowing whether the CHANGE of glucose in week 4 from baseline is different between treatment and control, Is it still legitimate to ask whether and HOW can we test this hypothesis? I think the coefficient of the treatment factor is only testing whether the week 4 glucose level is different between treatment and control, but not testing whether the CHANGE of week 4 glucose level with respect to baseline is different between treatment and control. Thanks again for your suggestion. Yi ----- Original Message ---- From: Marc Schwartz <marc_schwartz at me.com> To: array chip <arrayprofile at yahoo.com> Cc: Charles E. (Ted) Wright <cewright at uci.edu>; John Maindonald <john.maindonald at anu.edu.au>; r-sig-mixed-models at r-project.org Sent: Thu, August 12, 2010 6:02:29 AM Subject: Re: [R-sig-ME] longitudinal with 2 time points Hi John, If you read that article, you will see that your use of delta.y as the dependent variable does not make sense. Thus, I would re-express your model 5 as: lm(wk4.glucose ~ baseline.glucose + treatment + gender + age) and as noted, check for the interaction between baseline glucose and treatment: lm(wk4.glucose ~ baseline.glucose * treatment + gender + age) You might also want to consider using a spline function on age, presuming that age is hopefully measured as a continuous variable (eg. not ordinal groups). Since the ANCOVA based approach described in the paper is essentially an OLS linear regression, you can of course include the additional covariates for adjustment. If the interaction term p value is >0.1 (a common threshold), you can remove it and the beta coefficient and its CIs for the treatment factor is your estimated treatment effect relative to your control. For the presentation of the results, besides the obvious tabular summaries and the scatter/regression lines plot, include a series of plots showing selected baseline values and the treatment versus control predicted follow up values and CIs for the same baseline value in each plot. This visually shows the common estimated treatment effect for each baseline value, which will also tend to reveal regression to the mean. This presentation is especially helpful if the interaction term is retained, which therefore shows how the treatment effect varies and will reverse, over the range of the baseline values. You can select a series of clinically relevant values over the range of the observed baseline values, and/or by default, select a five number plus mean series over the observed baseline values. I don't see a role for a mixed effects model here, given that this is a pretty straightforward "change from baseline" type design, but there are many here with greater expertise than I. If this was a cross-over design, you have multiple measures of glucose for each patient at each time point, more than two time points, or a multi-center study, then a mixed effects model would make more sense to me. HTH, Marc On Aug 12, 2010, at 12:39 AM, array chip wrote:
Hi Marc, Thanks for the reference. I will definitely read it. Please see my reponse to John's reply. Your model is another model I should add to the 5 models I proposed in that email. What's your overall thoughts on these different
models?
Thank you for sharing. John ----- Original Message ---- From: Marc Schwartz <marc_schwartz at me.com> To: Charles E. (Ted) Wright <cewright at uci.edu>; array chip <arrayprofile at yahoo.com> Cc: John Maindonald <john.maindonald at anu.edu.au>; r-sig-mixed-models at r-project.org Sent: Wed, August 11, 2010 6:20:13 AM Subject: Re: [R-sig-ME] longitudinal with 2 time points Hi, I'll throw in a reference that covers some of these issues: Statistics Notes Analysing controlled trials with baseline and follow up measurements Vickers and Altman BMJ. 2001 November 10; 323(7321): 1123?1124. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1121605/ The basic model specification would of course be: lm(4Wks ~ Baseline + Group) You will also want to test for an interaction between the baseline score and your grouping factor, in case the observed group (eg. treatment) effect is dependent upon the value of the baseline measurement. In this case, unlike in the above paper, you of course end up with crossing fitted regression lines, rather than parallel lines. HTH, Marc Schwartz On Aug 11, 2010, at 7:34 AM, Charles E. (Ted) Wright wrote:
Keep in mind that running an ANOVA on the difference is not the same thing as
using the baseline data as a covariate in an ANOVA on the Week 4 data. Essentially the ANOVA on the differences is like the ANCOVA with the slope constrained to be 1. Ted Wright On Wed, 11 Aug 2010, John Maindonald wrote:
All these are possibilities, except maybe making baseline measurement a random factor. This would make sense only if data divide into groups, and you want the baseline effect to vary randomly from group to group. That may limit your ability to estimate parameters that are of interest. In most circumstances that I am familiar with, it makes better sense to treat baseline effect as fixed. John. On 11/08/2010, at 8:11 AM, array chip wrote:
Hi, I am wondering if it is still meaningful to run a mixed model if a longitudinal dataset has only 2 time points (baseline and week 4)? Would it
be
more appropriate to simply take the difference between the 2 time points and
run ANOVA (ANCOVA) on the difference? what about still running mixed model on
the
difference of the 2 time points, but adding baseline measurement as a
random
factor? Thanks for sharing your thoughts. John
10 days later
Hi Marc, I have to admit that I didn't get a chance to carefully read the article before my previous reply. So I want to wait till now to respond after finally I got a chance to read the article. Thanks for?your excellent explanation below. I agree that the coefficient for treatment is estimating?the extent of the difference between treatment and control?in?the CHANGE of glucose in week 4 from baseline. Now my dataset becomes a little bt more complicated: each glucose testing was done twice (blood was draw from left arm and right arm and tested separately. So for each patient, on each time point, there are 2 measurements (from left and right arm separately). So I think I should now include factor "arm" as a random effect: lmer(wk4.glucose ~ baseline.glucose + treatment + gender + age+ (1|subject/time)) What do you think of this model specification? ? Adiitionally, since I am using mixed model now, if I code a new variable ?time? (either 0 or 4) and new response variable ?y?, how do I specify a mixed model with 2 random effects, one with respect to ?time? variable (2 time points per subject per arm), the other with respect to ?arm? variable (2 arms per subject per time point)? ? Thanks a lot! ?John ----- Original Message ---- From: Marc Schwartz <marc_schwartz at me.com> To: array chip <arrayprofile at yahoo.com> Cc: r-sig-mixed-models at r-project.org Sent: Fri, August 13, 2010 7:24:59 AM Subject: Re: [R-sig-ME] longitudinal with 2 time points John, That you are asking this question indicates that either you have yet to read the article or that you need to re-read it, as you have not comprehended the content. The beta coefficient for treatment IS the difference in mean glucose change between baseline and 4 weeks **attributable to treatment**, after adjusting for any baseline differences in glucose between the two groups. That is also presuming that there is no interaction at baseline. For example, let's say that the beta for treatment is -20. Then, at 4 weeks, given the same baseline glucose level, we would predict that, on average, the treatment group will have a glucose level 20 mg/dl less than the control group. In the absence of an interaction, we would estimate the same average treatment difference at 4 weeks of 20 mg/dl whether the baseline glucose was 300 mg/dl or 100 mg/dl. However, given regression to the mean, we might reasonably expect the patient with a 300 mg/dl baseline level to have a greater mean reduction at 4 weeks as compared to the patient with a 100 mg/dl baseline level. We might also expect a patient with a glucose level at the low end of the baseline range (eg. 50 mg/dl) to experience an average increase in glucose level at 4 weeks, presuming that your inclusion/exclusion criteria permitted patients with below normal glucose levels. But the difference will still be, on average, 20 mg/dl between the two treatment groups. So the patient with a 300 mg/dl baseline level might have an average reduction to 200 mg/dl at 4 weeks on the control treatment, whereas the same patient on the active treatment would have an average reduction to 180 mg/dl (a difference of -20). The patient with a 100 mg/dl baseline level might have an average reduction to 90 mg/dl at 4 weeks on the control treatment, whereas the same patient on the active treatment would have an average reduction to 70 mg/dl (again, a difference of -20). The patient with a 50 mg/dl baseline level might have an average increase to 90 mg/dl at 4 weeks on the control treatment, whereas the same patient on the active treatment would have an average increase to 70 mg/dl (yet again, a difference of -20). So your conclusion would be that on average, between baseline and 4 weeks, glucose levels were reduced by 20 mg/dl more in the active treatment group relative to control. This difference is the vertical separation in the two parallel fitted regression lines as shown in the figure in the paper. So the method is answering exactly the question the investigator is asking. Marc
On Aug 13, 2010, at 1:02 AM, array chip wrote:
Marc, Thanks for sharing your insights. Let's take this model as an example: ? lm(wk4.glucose ~ baseline.glucose + treatment + gender + age) Because the investigator is interested in knowing whether the CHANGE of glucose in week 4 from baseline is different between treatment and control, Is it still legitimate to ask whether and HOW can we test this hypothesis? I think the coefficient of the treatment factor is only testing whether the week 4 glucose
level is different between treatment and control, but not testing whether the CHANGE of week 4 glucose level with respect to baseline is different between treatment and control. Thanks again for your suggestion. Yi ? ----- Original Message ---- From: Marc Schwartz <marc_schwartz at me.com> To: array chip <arrayprofile at yahoo.com> Cc: Charles E. (Ted) Wright <cewright at uci.edu>; John Maindonald <john.maindonald at anu.edu.au>; r-sig-mixed-models at r-project.org Sent: Thu, August 12, 2010 6:02:29 AM Subject: Re: [R-sig-ME] longitudinal with 2 time points Hi John, If you read that article, you will see that your use of delta.y as the dependent variable does not make sense. Thus, I would re-express your model 5 as: ? lm(wk4.glucose ~ baseline.glucose + treatment + gender + age) and as noted, check for the interaction between baseline glucose and
treatment:
? lm(wk4.glucose ~ baseline.glucose * treatment + gender + age) You might also want to consider using a spline function on age, presuming that
age is hopefully measured as a continuous variable (eg. not ordinal groups). Since the ANCOVA based approach described in the paper is essentially an OLS linear regression, you can of course include the additional covariates for adjustment. If the interaction term p value is >0.1 (a common threshold), you can remove it and the beta coefficient and its CIs for the treatment factor is
your estimated treatment effect relative to your control. For the presentation of the results, besides the obvious tabular summaries and
the scatter/regression lines plot, include a series of plots showing selected baseline values and the treatment versus control predicted follow up values and CIs for the same baseline value in each plot. This visually shows the common estimated treatment effect for each baseline value, which will also tend to reveal regression to the mean. This presentation is especially helpful if the interaction term is retained, which therefore shows how the treatment effect varies and will reverse, over the range of the baseline values. You can select a series of clinically relevant values over the range of the observed baseline values, and/or by default, select a five number plus mean series over the observed baseline values. I don't see a role for a mixed effects model here, given that this is a pretty
straightforward "change from baseline" type design, but there are many here with greater expertise than I. If this was a cross-over design, you have multiple measures of glucose for each patient at each time point, more than two time points, or a multi-center study, then a mixed effects model would make more sense to me. HTH, Marc On Aug 12, 2010, at 12:39 AM, array chip wrote:
Hi Marc, Thanks for the reference. I will definitely read it. Please see my reponse to
John's reply. Your model is another model I should add to the 5 models I proposed in that email. What's your overall thoughts on these different
models?
Thank you for sharing. John ----- Original Message ---- From: Marc Schwartz <marc_schwartz at me.com> To: Charles E. (Ted) Wright <cewright at uci.edu>; array chip <arrayprofile at yahoo.com> Cc: John Maindonald <john.maindonald at anu.edu.au>; r-sig-mixed-models at r-project.org Sent: Wed, August 11, 2010 6:20:13 AM Subject: Re: [R-sig-ME] longitudinal with 2 time points Hi, I'll throw in a reference that covers some of these issues: Statistics Notes Analysing controlled trials with baseline and follow up measurements Vickers and Altman BMJ. 2001 November 10; 323(7321): 1123?1124. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1121605/ The basic model specification would of course be: ? lm(4Wks ~ Baseline + Group) You will also want to test for an interaction between the baseline score and your grouping factor, in case the observed group (eg. treatment) effect is dependent upon the value of the baseline measurement. In this case, unlike in
the above paper, you of course end up with crossing fitted regression lines, rather than parallel lines. HTH, Marc Schwartz On Aug 11, 2010, at 7:34 AM, Charles E. (Ted) Wright wrote:
Keep in mind that running an ANOVA on the difference is not the same thing as
using the baseline data as a covariate in an ANOVA on the Week 4 data. Essentially the ANOVA on the differences is like the ANCOVA with the slope constrained to be 1. Ted Wright On Wed, 11 Aug 2010, John Maindonald wrote:
All these are possibilities, except maybe making baseline measurement a random factor.? This would make sense only if data divide into groups, and you want the baseline effect to vary randomly from group to group. That may limit your ability to estimate parameters that are of interest. In most circumstances that I am familiar with, it makes better sense to treat baseline effect as fixed. John. On 11/08/2010, at 8:11 AM, array chip wrote:
Hi, I am wondering if it is still meaningful to run a mixed model if a longitudinal dataset has only 2 time points (baseline and week 4)? Would it
be
more appropriate to simply take the difference between the 2 time points and
run ANOVA (ANCOVA) on the difference? what about still running mixed model on
the
difference of the 2 time points, but adding baseline measurement as a
random
factor? Thanks for sharing your thoughts. John
Hi John,
Since we have crossed the threshold into mixed models, I am going to provide some comments, but (notably because I have not used lmer, although I attended Doug's class a few years ago at useR), will defer to and solicit comments from the lmer experts on the list.
First, I am not sure, unless we restate the model where Glucose is the response variable and Time is a covariate, that using Time in the random effects term make sense. But I could be wrong.
If we stay with and extend the ANCOVA style approach, then I might envision something like:
lmer(wk4.glucose ~ baseline.glucose + treatment + gender + age +
(1 | arm / subject))
where the random effect term expresses the nesting of arm within subject. I am also presuming that you are not interested in arm as a main effect. So we are still concerned with the other main effects as before, but now consider the variation in the multiple measurements of glucose from each arm within each subject.
If you restate the model as I noted above, then perhaps:
lmer(glucose ~ time + treatment + gender + age +
(1 | arm / subject / time))
might make sense. From a review of the archives, it would seem that a multi-level nesting is permitted in lmer formulae random effects terms, so this would reflect the nesting of arm, within subject, within time. The interpretation of this model is of course, going to be different than the ANCOVA based approach above.
Hopefully, this might at least provide a starting point for further discussion and others with greater expertise will chime in.
Regards,
Marc
P.S. Note that I trimmed some of the thread below, to conserve space...
On Aug 24, 2010, at 3:02 AM, array chip wrote:
Hi Marc, I have to admit that I didn't get a chance to carefully read the article before my previous reply. So I want to wait till now to respond after finally I got a chance to read the article. Thanks for your excellent explanation below. I agree that the coefficient for treatment is estimating the extent of the difference between treatment and control in the CHANGE of glucose in week 4 from baseline. Now my dataset becomes a little bt more complicated: each glucose testing was done twice (blood was draw from left arm and right arm and tested separately. So for each patient, on each time point, there are 2 measurements (from left and right arm separately). So I think I should now include factor "arm" as a random effect: lmer(wk4.glucose ~ baseline.glucose + treatment + gender + age+ (1|subject/time)) What do you think of this model specification? Adiitionally, since I am using mixed model now, if I code a new variable ?time? (either 0 or 4) and new response variable ?y?, how do I specify a mixed model with 2 random effects, one with respect to ?time? variable (2 time points per subject per arm), the other with respect to ?arm? variable (2 arms per subject per time point)? Thanks a lot! John ----- Original Message ---- From: Marc Schwartz <marc_schwartz at me.com> To: array chip <arrayprofile at yahoo.com> Cc: r-sig-mixed-models at r-project.org Sent: Fri, August 13, 2010 7:24:59 AM Subject: Re: [R-sig-ME] longitudinal with 2 time points John, That you are asking this question indicates that either you have yet to read the article or that you need to re-read it, as you have not comprehended the content. The beta coefficient for treatment IS the difference in mean glucose change between baseline and 4 weeks **attributable to treatment**, after adjusting for any baseline differences in glucose between the two groups. That is also presuming that there is no interaction at baseline. For example, let's say that the beta for treatment is -20. Then, at 4 weeks, given the same baseline glucose level, we would predict that, on average, the treatment group will have a glucose level 20 mg/dl less than the control group. In the absence of an interaction, we would estimate the same average treatment difference at 4 weeks of 20 mg/dl whether the baseline glucose was 300 mg/dl or 100 mg/dl. However, given regression to the mean, we might reasonably expect the patient with a 300 mg/dl baseline level to have a greater mean reduction at 4 weeks as compared to the patient with a 100 mg/dl baseline level. We might also expect a patient with a glucose level at the low end of the baseline range (eg. 50 mg/dl) to experience an average increase in glucose level at 4 weeks, presuming that your inclusion/exclusion criteria permitted patients with below normal glucose levels. But the difference will still be, on average, 20 mg/dl between the two treatment groups. So the patient with a 300 mg/dl baseline level might have an average reduction to 200 mg/dl at 4 weeks on the control treatment, whereas the same patient on the active treatment would have an average reduction to 180 mg/dl (a difference of -20). The patient with a 100 mg/dl baseline level might have an average reduction to 90 mg/dl at 4 weeks on the control treatment, whereas the same patient on the active treatment would have an average reduction to 70 mg/dl (again, a difference of -20). The patient with a 50 mg/dl baseline level might have an average increase to 90 mg/dl at 4 weeks on the control treatment, whereas the same patient on the active treatment would have an average increase to 70 mg/dl (yet again, a difference of -20). So your conclusion would be that on average, between baseline and 4 weeks, glucose levels were reduced by 20 mg/dl more in the active treatment group relative to control. This difference is the vertical separation in the two parallel fitted regression lines as shown in the figure in the paper. So the method is answering exactly the question the investigator is asking. Marc On Aug 13, 2010, at 1:02 AM, array chip wrote:
Marc, Thanks for sharing your insights. Let's take this model as an example: lm(wk4.glucose ~ baseline.glucose + treatment + gender + age) Because the investigator is interested in knowing whether the CHANGE of glucose in week 4 from baseline is different between treatment and control, Is it still legitimate to ask whether and HOW can we test this hypothesis? I think the coefficient of the treatment factor is only testing whether the week 4 glucose
level is different between treatment and control, but not testing whether the CHANGE of week 4 glucose level with respect to baseline is different between treatment and control. Thanks again for your suggestion. Yi
Hi Marc, thanks for your comments. Yes, I am debating between the 2 models as
well.
The first model doesnot have a "time" variable, and there is only 1 level of
nesting: arm within subject. I think the syntax for nesting is (1|subject /
arm) instead of (1|arm/subject).
The 2nd model certainly have a different layout of data with a "time" variable.
It has 2-level nesting: arm within subject within time, so the syntax should
(1|time/subject/arm)?
Now I have a little confusion on how to define the nesting here. Can I define it
as arm within time within subject instead? so the syntax would be
(1|arm/time/subject)? The reason I am thinking of this way is: each subject was
measured at 2 time points (0 & 4), at each time point, measured twice at 2 arms
(left & right).
What is the simplest way to define nesting structure? any principles that we
should follow? Sometimes I feel I can use different nesting structures as they
all sound reasonable to me.
Really wish someone can chime in and share their thoughts.
John
----- Original Message ----
From: Marc Schwartz <marc_schwartz at me.com>
To: array chip <arrayprofile at yahoo.com>
Cc: r-sig-mixed-models at r-project.org
Sent: Tue, August 24, 2010 11:55:36 AM
Subject: Re: [R-sig-ME] longitudinal with 2 time points
Hi John,
Since we have crossed the threshold into mixed models, I am going to provide
some comments, but (notably because I have not used lmer, although I attended
Doug's class a few years ago at useR), will defer to and solicit comments from
the lmer experts on the list.
First, I am not sure, unless we restate the model where Glucose is the response
variable and Time is a covariate, that using Time in the random effects term
make sense. But I could be wrong.
If we stay with and extend the ANCOVA style approach, then I might envision
something like:
lmer(wk4.glucose ~ baseline.glucose + treatment + gender + age +
(1 | arm / subject))
where the random effect term expresses the nesting of arm within subject. I am
also presuming that you are not interested in arm as a main effect. So we are
still concerned with the other main effects as before, but now consider the
variation in the multiple measurements of glucose from each arm within each
subject.
If you restate the model as I noted above, then perhaps:
lmer(glucose ~ time + treatment + gender + age +
(1 | arm / subject / time))
might make sense. From a review of the archives, it would seem that a
multi-level nesting is permitted in lmer formulae random effects terms, so this
would reflect the nesting of arm, within subject, within time. The
interpretation of this model is of course, going to be different than the ANCOVA
based approach above.
Hopefully, this might at least provide a starting point for further discussion
and others with greater expertise will chime in.
Regards,
Marc
P.S. Note that I trimmed some of the thread below, to conserve space...
On Aug 24, 2010, at 3:02 AM, array chip wrote:
Hi Marc, I have to admit that I didn't get a chance to carefully read the article before my previous reply. So I want to wait till now to respond after finally I got a
chance to read the article. Thanks for your excellent explanation below. I agree that the coefficient for treatment is estimating the extent of the difference between treatment and control in the CHANGE of glucose in week 4 from
baseline.
Now my dataset becomes a little bt more complicated: each glucose testing was done twice (blood was draw from left arm and right arm and tested separately. So for each patient, on each time point, there are 2 measurements (from left and right arm separately). So I think I should now include factor "arm" as a random effect: lmer(wk4.glucose ~ baseline.glucose + treatment + gender + age+ (1|subject/time)) What do you think of this model specification? Adiitionally, since I am using mixed model now, if I code a new variable ?time? (either 0 or 4) and new response variable ?y?, how do I specify a mixed model with 2 random effects, one with respect to ?time? variable (2 time points per subject per arm), the other with respect to ?arm? variable (2 arms per subject
per time point)? Thanks a lot! John ----- Original Message ---- From: Marc Schwartz <marc_schwartz at me.com> To: array chip <arrayprofile at yahoo.com> Cc: r-sig-mixed-models at r-project.org Sent: Fri, August 13, 2010 7:24:59 AM Subject: Re: [R-sig-ME] longitudinal with 2 time points John, That you are asking this question indicates that either you have yet to read the article or that you need to re-read it, as you have not comprehended the content. The beta coefficient for treatment IS the difference in mean glucose change between baseline and 4 weeks **attributable to treatment**, after adjusting for any baseline differences in glucose between the two groups. That is also presuming that there is no interaction at baseline. For example, let's say that the beta for treatment is -20. Then, at 4 weeks, given the same baseline glucose level, we would predict that, on average, the treatment group will have a glucose level 20 mg/dl less than the control group. In the absence of an interaction, we would estimate the same average treatment
difference at 4 weeks of 20 mg/dl whether the baseline glucose was 300 mg/dl or 100 mg/dl. However, given regression to the mean, we might reasonably expect the patient with a 300 mg/dl baseline level to have a greater mean reduction at 4 weeks as
compared to the patient with a 100 mg/dl baseline level. We might also expect a patient with a glucose level at the low end of the baseline range (eg. 50 mg/dl) to experience an average increase in glucose level at 4 weeks, presuming that your inclusion/exclusion criteria permitted patients with below normal glucose levels. But the difference will still be, on average, 20 mg/dl between the two treatment groups. So the patient with a 300 mg/dl baseline level might have an average reduction
to 200 mg/dl at 4 weeks on the control treatment, whereas the same patient on the active treatment would have an average reduction to 180 mg/dl (a difference of -20). The patient with a 100 mg/dl baseline level might have an average reduction to
90 mg/dl at 4 weeks on the control treatment, whereas the same patient on the active treatment would have an average reduction to 70 mg/dl (again, a difference of -20). The patient with a 50 mg/dl baseline level might have an average increase to 90 mg/dl at 4 weeks on the control treatment, whereas the same patient on the active treatment would have an average increase to 70 mg/dl (yet again, a difference of -20). So your conclusion would be that on average, between baseline and 4 weeks, glucose levels were reduced by 20 mg/dl more in the active treatment group relative to control. This difference is the vertical separation in the two parallel fitted regression lines as shown in the figure in the paper. So the method is answering exactly the question the investigator is asking. Marc On Aug 13, 2010, at 1:02 AM, array chip wrote:
Marc, Thanks for sharing your insights. Let's take this model as an example: lm(wk4.glucose ~ baseline.glucose + treatment + gender + age) Because the investigator is interested in knowing whether the CHANGE of glucose in week 4 from baseline is different between treatment and control, Is it still legitimate to ask whether and HOW can we test this hypothesis? I think the coefficient of the treatment factor is only testing whether the week 4 glucose
level is different between treatment and control, but not testing whether the CHANGE of week 4 glucose level with respect to baseline is different between treatment and control. Thanks again for your suggestion. Yi
John, I'll throw out one more reply, since as I look at what I had below, I was suffering from some transient cerebral flatulence... For the first model, indeed it should be: (1 | Subject / Arm)) For the second model, it should be: (Time | Subject / Arm) So I had the nesting hierarchy reversed and of course Time should not be nested. From further searching, it would seem that in actuality, the preferred expression of the random effects term above for lmer(), would be: (Time | Subject : Arm) + (Time | Subject) The initial expression would be largely equivalent to: random = ~Time | Subject / Arm in lme(). But I'll await clarification from someone who will not risk steering you further astray. Regards, Marc
On Aug 24, 2010, at 3:21 PM, array chip wrote:
Hi Marc, thanks for your comments. Yes, I am debating between the 2 models as
well.
The first model doesnot have a "time" variable, and there is only 1 level of
nesting: arm within subject. I think the syntax for nesting is (1|subject /
arm) instead of (1|arm/subject).
The 2nd model certainly have a different layout of data with a "time" variable.
It has 2-level nesting: arm within subject within time, so the syntax should
(1|time/subject/arm)?
Now I have a little confusion on how to define the nesting here. Can I define it
as arm within time within subject instead? so the syntax would be
(1|arm/time/subject)? The reason I am thinking of this way is: each subject was
measured at 2 time points (0 & 4), at each time point, measured twice at 2 arms
(left & right).
What is the simplest way to define nesting structure? any principles that we
should follow? Sometimes I feel I can use different nesting structures as they
all sound reasonable to me.
Really wish someone can chime in and share their thoughts.
John
----- Original Message ----
From: Marc Schwartz <marc_schwartz at me.com>
To: array chip <arrayprofile at yahoo.com>
Cc: r-sig-mixed-models at r-project.org
Sent: Tue, August 24, 2010 11:55:36 AM
Subject: Re: [R-sig-ME] longitudinal with 2 time points
Hi John,
Since we have crossed the threshold into mixed models, I am going to provide
some comments, but (notably because I have not used lmer, although I attended
Doug's class a few years ago at useR), will defer to and solicit comments from
the lmer experts on the list.
First, I am not sure, unless we restate the model where Glucose is the response
variable and Time is a covariate, that using Time in the random effects term
make sense. But I could be wrong.
If we stay with and extend the ANCOVA style approach, then I might envision
something like:
lmer(wk4.glucose ~ baseline.glucose + treatment + gender + age +
(1 | arm / subject))
where the random effect term expresses the nesting of arm within subject. I am
also presuming that you are not interested in arm as a main effect. So we are
still concerned with the other main effects as before, but now consider the
variation in the multiple measurements of glucose from each arm within each
subject.
If you restate the model as I noted above, then perhaps:
lmer(glucose ~ time + treatment + gender + age +
(1 | arm / subject / time))
might make sense. From a review of the archives, it would seem that a
multi-level nesting is permitted in lmer formulae random effects terms, so this
would reflect the nesting of arm, within subject, within time. The
interpretation of this model is of course, going to be different than the ANCOVA
based approach above.
Hopefully, this might at least provide a starting point for further discussion
and others with greater expertise will chime in.
Regards,
Marc
P.S. Note that I trimmed some of the thread below, to conserve space...
On Aug 24, 2010, at 3:02 AM, array chip wrote:
Hi Marc, I have to admit that I didn't get a chance to carefully read the article before my previous reply. So I want to wait till now to respond after finally I got a
chance to read the article. Thanks for your excellent explanation below. I agree that the coefficient for treatment is estimating the extent of the difference between treatment and control in the CHANGE of glucose in week 4 from
baseline.
Now my dataset becomes a little bt more complicated: each glucose testing was done twice (blood was draw from left arm and right arm and tested separately. So for each patient, on each time point, there are 2 measurements (from left and right arm separately). So I think I should now include factor "arm" as a random effect: lmer(wk4.glucose ~ baseline.glucose + treatment + gender + age+ (1|subject/time)) What do you think of this model specification? Adiitionally, since I am using mixed model now, if I code a new variable ?time? (either 0 or 4) and new response variable ?y?, how do I specify a mixed model with 2 random effects, one with respect to ?time? variable (2 time points per subject per arm), the other with respect to ?arm? variable (2 arms per subject
per time point)? Thanks a lot! John ----- Original Message ---- From: Marc Schwartz <marc_schwartz at me.com> To: array chip <arrayprofile at yahoo.com> Cc: r-sig-mixed-models at r-project.org Sent: Fri, August 13, 2010 7:24:59 AM Subject: Re: [R-sig-ME] longitudinal with 2 time points John, That you are asking this question indicates that either you have yet to read the article or that you need to re-read it, as you have not comprehended the content. The beta coefficient for treatment IS the difference in mean glucose change between baseline and 4 weeks **attributable to treatment**, after adjusting for any baseline differences in glucose between the two groups. That is also presuming that there is no interaction at baseline. For example, let's say that the beta for treatment is -20. Then, at 4 weeks, given the same baseline glucose level, we would predict that, on average, the treatment group will have a glucose level 20 mg/dl less than the control group. In the absence of an interaction, we would estimate the same average treatment
difference at 4 weeks of 20 mg/dl whether the baseline glucose was 300 mg/dl or 100 mg/dl. However, given regression to the mean, we might reasonably expect the patient with a 300 mg/dl baseline level to have a greater mean reduction at 4 weeks as
compared to the patient with a 100 mg/dl baseline level. We might also expect a patient with a glucose level at the low end of the baseline range (eg. 50 mg/dl) to experience an average increase in glucose level at 4 weeks, presuming that your inclusion/exclusion criteria permitted patients with below normal glucose levels. But the difference will still be, on average, 20 mg/dl between the two treatment groups. So the patient with a 300 mg/dl baseline level might have an average reduction
to 200 mg/dl at 4 weeks on the control treatment, whereas the same patient on the active treatment would have an average reduction to 180 mg/dl (a difference of -20). The patient with a 100 mg/dl baseline level might have an average reduction to
90 mg/dl at 4 weeks on the control treatment, whereas the same patient on the active treatment would have an average reduction to 70 mg/dl (again, a difference of -20). The patient with a 50 mg/dl baseline level might have an average increase to 90 mg/dl at 4 weeks on the control treatment, whereas the same patient on the active treatment would have an average increase to 70 mg/dl (yet again, a difference of -20). So your conclusion would be that on average, between baseline and 4 weeks, glucose levels were reduced by 20 mg/dl more in the active treatment group relative to control. This difference is the vertical separation in the two parallel fitted regression lines as shown in the figure in the paper. So the method is answering exactly the question the investigator is asking. Marc On Aug 13, 2010, at 1:02 AM, array chip wrote:
Marc, Thanks for sharing your insights. Let's take this model as an example: lm(wk4.glucose ~ baseline.glucose + treatment + gender + age) Because the investigator is interested in knowing whether the CHANGE of glucose in week 4 from baseline is different between treatment and control, Is it still legitimate to ask whether and HOW can we test this hypothesis? I think the coefficient of the treatment factor is only testing whether the week 4 glucose
level is different between treatment and control, but not testing whether the CHANGE of week 4 glucose level with respect to baseline is different between treatment and control. Thanks again for your suggestion. Yi