Skip to content

longitudinal with 2 time points

15 messages · John Maindonald, Charles E. (Ted) Wright, array chip +2 more

#
Hi, I am wondering if it is still meaningful to run a mixed model if a 
longitudinal dataset has only 2 time points (baseline and week 4)? Would it be 
more appropriate to simply take the difference between the 2 time points and run 
ANOVA (ANCOVA) on the difference? what about still running mixed model on the 
difference of the 2 time points, but adding baseline measurement as a random 
factor?

Thanks for sharing your thoughts.

John
#
All these are possibilities, except maybe making baseline measurement
a random factor.  This would make sense only if data divide into groups,
and you want the baseline effect to vary randomly from group to group.  
That may limit your ability to estimate parameters that are of interest.
In most circumstances that I am familiar with, it makes better sense to 
treat baseline effect as fixed.

John.

John Maindonald             email: john.maindonald at anu.edu.au
phone : +61 2 (6125)3473    fax  : +61 2(6125)5549
Centre for Mathematics & Its Applications, Room 1194,
John Dedman Mathematical Sciences Building (Building 27)
Australian National University, Canberra ACT 0200.
http://www.maths.anu.edu.au/~johnm
On 11/08/2010, at 8:11 AM, array chip wrote:

            
#
Keep in mind that running an ANOVA on the difference is not the same thing 
as using the baseline data as a covariate in an ANOVA on the Week 4 data. 
Essentially the ANOVA on the differences is like the ANCOVA with the slope 
constrained to be 1.

Ted Wright
On Wed, 11 Aug 2010, John Maindonald wrote:

            
#
Hi,

I'll throw in a reference that covers some of these issues:

Statistics Notes
Analysing controlled trials with baseline and follow up measurements
Vickers and Altman
BMJ. 2001 November 10; 323(7321): 1123?1124.
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1121605/


The basic model specification would of course be:

  lm(4Wks ~ Baseline + Group)

You will also want to test for an interaction between the baseline score and your grouping factor, in case the observed group (eg. treatment) effect is dependent upon the value of the baseline measurement. In this case, unlike in the above paper, you of course end up with crossing fitted regression lines, rather than parallel lines.

HTH,

Marc Schwartz
On Aug 11, 2010, at 7:34 AM, Charles E. (Ted) Wright wrote:

            
#
Thank you John. I agree making baseline as a random factor is not a good idea. 

The data have treatment groups and age and gender for each subject. The purpose 
of the study is to investigate the treatment effect on the change of the study 
endpoint?(glucose level) between week 4?and baseline. I am thinking of several 
models/methods to analye the data:

1. mixed model with fixed time and random intercept:
lmer(y ~ treatment + gender + age + time + (1|subject)??? where time = 0 or 4

2. mixed model with random intercept and random slope
lmer(y ~ treatment + gender + age + time + (time|subject)

3. mixed?model with random intercept but no fixed time factor:
lmer(y ~ treatment + gender + age + (1|subject)

4. calculate delta.y = difference of y between week 4?& baseline
lm(delta.y ~ treatment + gender + age)

5. same as 4, but add baseline as a factor
lm(delta.y ~ baseline.y + treatment + gender + age)

My thinking on these 5 models are: model 1 and 2 have a limitation that they 
impose a linear relationship of y versus time, which may not be sensible with 2 
time points. Model 3 simply treats baseline and week?4 as repeated measures, not 
imposing linear relationship. Model 4 & 5 are based on the difference between 
baseline and week 4, except that model 5 adds baseline as a covariate. The 
reason of adding baseline as covariate is based on assumption that the extent of 
the change of y between week 4 and baseline depends on the?level of baseline.

Anyone has any suggestions on which one you would use?

Thanks!

John


----- Original Message ----
From: John Maindonald <john.maindonald at anu.edu.au>
To: array chip <arrayprofile at yahoo.com>
Cc: r-sig-mixed-models at r-project.org
Sent: Wed, August 11, 2010 12:04:01 AM
Subject: Re: [R-sig-ME] longitudinal with 2 time points

All these are possibilities, except maybe making baseline measurement
a random factor.? This would make sense only if data divide into groups,
and you want the baseline effect to vary randomly from group to group.? 
That may limit your ability to estimate parameters that are of interest.
In most circumstances that I am familiar with, it makes better sense to 
treat baseline effect as fixed.

John.

John Maindonald? ? ? ? ? ? email: john.maindonald at anu.edu.au
phone : +61 2 (6125)3473? ? fax? : +61 2(6125)5549
Centre for Mathematics & Its Applications, Room 1194,
John Dedman Mathematical Sciences Building (Building 27)
Australian National University, Canberra ACT 0200.
http://www.maths.anu.edu.au/~johnm
On 11/08/2010, at 8:11 AM, array chip wrote:

            
#
Thank you Ted for pointing this out. See my response to John's reply. What would 
you think of the model 5 where I used ANCOVA on the difference between week 5 & 
baseline and also included baseline as a covariate?

Thanks

John



----- Original Message ----
From: Charles E. (Ted) Wright <cewright at uci.edu>
To: John Maindonald <john.maindonald at anu.edu.au>
Cc: array chip <arrayprofile at yahoo.com>; r-sig-mixed-models at r-project.org
Sent: Wed, August 11, 2010 5:34:21 AM
Subject: Re: [R-sig-ME] longitudinal with 2 time points

Keep in mind that running an ANOVA on the difference is not the same thing 
as using the baseline data as a covariate in an ANOVA on the Week 4 data. 
Essentially the ANOVA on the differences is like the ANCOVA with the slope 
constrained to be 1.

Ted Wright
On Wed, 11 Aug 2010, John Maindonald wrote:

            
be
#
Hi Marc,

Thanks for the reference. I will definitely read it. Please see my reponse to 
John's reply. Your model is another model I should add to the 5 models I 
proposed in that email. What's your overall thoughts on these different models?

Thank you for sharing.

John



----- Original Message ----
From: Marc Schwartz <marc_schwartz at me.com>
To: Charles E. (Ted) Wright <cewright at uci.edu>; array chip 
<arrayprofile at yahoo.com>
Cc: John Maindonald <john.maindonald at anu.edu.au>; 
r-sig-mixed-models at r-project.org
Sent: Wed, August 11, 2010 6:20:13 AM
Subject: Re: [R-sig-ME] longitudinal with 2 time points

Hi,

I'll throw in a reference that covers some of these issues:

Statistics Notes
Analysing controlled trials with baseline and follow up measurements
Vickers and Altman
BMJ. 2001 November 10; 323(7321): 1123?1124.
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1121605/


The basic model specification would of course be:

? lm(4Wks ~ Baseline + Group)

You will also want to test for an interaction between the baseline score and 
your grouping factor, in case the observed group (eg. treatment) effect is 
dependent upon the value of the baseline measurement. In this case, unlike in 
the above paper, you of course end up with crossing fitted regression lines, 
rather than parallel lines.

HTH,

Marc Schwartz
On Aug 11, 2010, at 7:34 AM, Charles E. (Ted) Wright wrote:

            
the
#
Hi John,

If there are only two time points per subject I think model 2 should  
throw an error because the residual variance and (time|Subject)  
(co)variances cannot be uniquely estimated. You can get around this  
problem  by moving the (time|Subject) term into the residual term and  
dropping it from the random terms using MCMCglmm or ASReml:

MCMCglmm(y ~ treatment + gender + age + time, rcov=~  
us(as.factor(time)):subject,  ...

This route was also suggested by Ben Bolker and John Maindonald for  
coping with negative variances.


However, when I try:

set.seed(1)
subject<-gl(50,2)
time<-gl(2,1,100)
y<-rnorm(100)
summary(lmer(y~time+(time|subject)))

I get estimates of all terms and so may be they can be uniquely  
estimated (although it would surprise me a lot)?

Jarrod
On 12 Aug 2010, at 06:33, array chip wrote:

            

  
    
#
Hi John,

If you read that article, you will see that your use of delta.y as the dependent variable does not make sense.

Thus, I would re-express your model 5 as:

  lm(wk4.glucose ~ baseline.glucose + treatment + gender + age)

and as noted, check for the interaction between baseline glucose and treatment:

  lm(wk4.glucose ~ baseline.glucose * treatment + gender + age)


You might also want to consider using a spline function on age, presuming that age is hopefully measured as a continuous variable (eg. not ordinal groups).

Since the ANCOVA based approach described in the paper is essentially an OLS linear regression, you can of course include the additional covariates for adjustment. If the interaction term p value is >0.1 (a common threshold), you can remove it and the beta coefficient and its CIs for the treatment factor is your estimated treatment effect relative to your control.

For the presentation of the results, besides the obvious tabular summaries and the scatter/regression lines plot, include a series of plots showing selected baseline values and the treatment versus control predicted follow up values and CIs for the same baseline value in each plot. This visually shows the common estimated treatment effect for each baseline value, which will also tend to reveal regression to the mean. This presentation is especially helpful if the interaction term is retained, which therefore shows how the treatment effect varies and will reverse, over the range of the baseline values. You can select a series of clinically relevant values over the range of the observed baseline values, and/or by default, select a five number plus mean series over the observed baseline values.

I don't see a role for a mixed effects model here, given that this is a pretty straightforward "change from baseline" type design, but there are many here with greater expertise than I. If this was a cross-over design, you have multiple measures of glucose for each patient at each time point, more than two time points, or a multi-center study, then a mixed effects model would make more sense to me.

HTH,

Marc
On Aug 12, 2010, at 12:39 AM, array chip wrote:

            
#
Marc,

Thanks for sharing your insights. Let's take this?model as an example:

?lm(wk4.glucose ~ baseline.glucose + treatment + gender + age)

Because the investigator is interested in knowing whether the?CHANGE of glucose 
in week 4 from baseline is different between treatment and control, Is it still 
legitimate to ask whether and?HOW can we test this hypothesis? I think the 
coefficient of the?treatment factor is only testing whether the week 4 glucose 
level is different between treatment and control, but not testing whether 
the?CHANGE of week 4 glucose level with respect to baseline is different between 
treatment and control.

Thanks again for your suggestion.

Yi



?


----- Original Message ----
From: Marc Schwartz <marc_schwartz at me.com>
To: array chip <arrayprofile at yahoo.com>
Cc: Charles E. (Ted) Wright <cewright at uci.edu>; John Maindonald 
<john.maindonald at anu.edu.au>; r-sig-mixed-models at r-project.org
Sent: Thu, August 12, 2010 6:02:29 AM
Subject: Re: [R-sig-ME] longitudinal with 2 time points

Hi John,

If you read that article, you will see that your use of delta.y as the dependent 
variable does not make sense.

Thus, I would re-express your model 5 as:

? lm(wk4.glucose ~ baseline.glucose + treatment + gender + age)

and as noted, check for the interaction between baseline glucose and treatment:

? lm(wk4.glucose ~ baseline.glucose * treatment + gender + age)


You might also want to consider using a spline function on age, presuming that 
age is hopefully measured as a continuous variable (eg. not ordinal groups).

Since the ANCOVA based approach described in the paper is essentially an OLS 
linear regression, you can of course include the additional covariates for 
adjustment. If the interaction term p value is >0.1 (a common threshold), you 
can remove it and the beta coefficient and its CIs for the treatment factor is 
your estimated treatment effect relative to your control.

For the presentation of the results, besides the obvious tabular summaries and 
the scatter/regression lines plot, include a series of plots showing selected 
baseline values and the treatment versus control predicted follow up values and 
CIs for the same baseline value in each plot. This visually shows the common 
estimated treatment effect for each baseline value, which will also tend to 
reveal regression to the mean. This presentation is especially helpful if the 
interaction term is retained, which therefore shows how the treatment effect 
varies and will reverse, over the range of the baseline values. You can select a 
series of clinically relevant values over the range of the observed baseline 
values, and/or by default, select a five number plus mean series over the 
observed baseline values.

I don't see a role for a mixed effects model here, given that this is a pretty 
straightforward "change from baseline" type design, but there are many here with 
greater expertise than I. If this was a cross-over design, you have multiple 
measures of glucose for each patient at each time point, more than two time 
points, or a multi-center study, then a mixed effects model would make more 
sense to me.

HTH,

Marc
On Aug 12, 2010, at 12:39 AM, array chip wrote:

            
models?
random
#
John,

That you are asking this question indicates that either you have yet to read the article or that you need to re-read it, as you have not comprehended the content.

The beta coefficient for treatment IS the difference in mean glucose change between baseline and 4 weeks **attributable to treatment**, after adjusting for any baseline differences in glucose between the two groups. That is also presuming that there is no interaction at baseline.

For example, let's say that the beta for treatment is -20. Then, at 4 weeks, given the same baseline glucose level, we would predict that, on average, the treatment group will have a glucose level 20 mg/dl less than the control group. 

In the absence of an interaction, we would estimate the same average treatment difference at 4 weeks of 20 mg/dl whether the baseline glucose was 300 mg/dl or 100 mg/dl. 

However, given regression to the mean, we might reasonably expect the patient with a 300 mg/dl baseline level to have a greater mean reduction at 4 weeks as compared to the patient with a 100 mg/dl baseline level. 

We might also expect a patient with a glucose level at the low end of the baseline range (eg. 50 mg/dl) to experience an average increase in glucose level at 4 weeks, presuming that your inclusion/exclusion criteria permitted patients with below normal glucose levels. But the difference will still be, on average, 20 mg/dl between the two treatment groups.

So the patient with a 300 mg/dl baseline level might have an average reduction to 200 mg/dl at 4 weeks on the control treatment, whereas the same patient on the active treatment would have an average reduction to 180 mg/dl (a difference of -20).

The patient with a 100 mg/dl baseline level might have an average reduction to 90 mg/dl at 4 weeks on the control treatment, whereas the same patient on the active treatment would have an average reduction to 70 mg/dl (again, a difference of -20).

The patient with a 50 mg/dl baseline level might have an average increase to 90 mg/dl at 4 weeks on the control treatment, whereas the same patient on the active treatment would have an average increase to 70 mg/dl (yet again, a difference of -20).

So your conclusion would be that on average, between baseline and 4 weeks, glucose levels were reduced by 20 mg/dl more in the active treatment group relative to control.

This difference is the vertical separation in the two parallel fitted regression lines as shown in the figure in the paper.

So the method is answering exactly the question the investigator is asking.

Marc
On Aug 13, 2010, at 1:02 AM, array chip wrote:

            
10 days later
#
Hi Marc,

I have to admit that I didn't get a chance to carefully read the article before 
my previous reply. So I want to wait till now to respond after finally I got a 
chance to read the article. Thanks for?your excellent explanation below. I agree 
that the coefficient for treatment is estimating?the extent of the difference 
between treatment and control?in?the CHANGE of glucose in week 4 from baseline.

Now my dataset becomes a little bt more complicated: each glucose testing was 
done twice (blood was draw from left arm and right arm and tested separately. So 
for each patient, on each time point, there are 2 measurements (from left and 
right arm separately). So I think I should now include factor "arm" as a random 
effect:

lmer(wk4.glucose ~ baseline.glucose + treatment + gender + age+ 
(1|subject/time))

What do you think of this model specification?
?
Adiitionally, since I am using mixed model now, if I code a new variable ?time? 
(either 0 or 4) and new response variable ?y?, how do I specify a mixed model 
with 2 random effects, one with respect to ?time? variable (2 time points per 
subject per arm), the other with respect to ?arm? variable (2 arms per subject 
per time point)?
?
Thanks a lot!
?John




----- Original Message ----
From: Marc Schwartz <marc_schwartz at me.com>
To: array chip <arrayprofile at yahoo.com>
Cc: r-sig-mixed-models at r-project.org
Sent: Fri, August 13, 2010 7:24:59 AM
Subject: Re: [R-sig-ME] longitudinal with 2 time points

John,

That you are asking this question indicates that either you have yet to read the 
article or that you need to re-read it, as you have not comprehended the 
content.

The beta coefficient for treatment IS the difference in mean glucose change 
between baseline and 4 weeks **attributable to treatment**, after adjusting for 
any baseline differences in glucose between the two groups. That is also 
presuming that there is no interaction at baseline.

For example, let's say that the beta for treatment is -20. Then, at 4 weeks, 
given the same baseline glucose level, we would predict that, on average, the 
treatment group will have a glucose level 20 mg/dl less than the control group. 


In the absence of an interaction, we would estimate the same average treatment 
difference at 4 weeks of 20 mg/dl whether the baseline glucose was 300 mg/dl or 
100 mg/dl. 


However, given regression to the mean, we might reasonably expect the patient 
with a 300 mg/dl baseline level to have a greater mean reduction at 4 weeks as 
compared to the patient with a 100 mg/dl baseline level. 


We might also expect a patient with a glucose level at the low end of the 
baseline range (eg. 50 mg/dl) to experience an average increase in glucose level 
at 4 weeks, presuming that your inclusion/exclusion criteria permitted patients 
with below normal glucose levels. But the difference will still be, on average, 
20 mg/dl between the two treatment groups.

So the patient with a 300 mg/dl baseline level might have an average reduction 
to 200 mg/dl at 4 weeks on the control treatment, whereas the same patient on 
the active treatment would have an average reduction to 180 mg/dl (a difference 
of -20).

The patient with a 100 mg/dl baseline level might have an average reduction to 
90 mg/dl at 4 weeks on the control treatment, whereas the same patient on the 
active treatment would have an average reduction to 70 mg/dl (again, a 
difference of -20).

The patient with a 50 mg/dl baseline level might have an average increase to 90 
mg/dl at 4 weeks on the control treatment, whereas the same patient on the 
active treatment would have an average increase to 70 mg/dl (yet again, a 
difference of -20).

So your conclusion would be that on average, between baseline and 4 weeks, 
glucose levels were reduced by 20 mg/dl more in the active treatment group 
relative to control.

This difference is the vertical separation in the two parallel fitted regression 
lines as shown in the figure in the paper.

So the method is answering exactly the question the investigator is asking.

Marc
On Aug 13, 2010, at 1:02 AM, array chip wrote:

            
treatment:
#
Hi John,

Since we have crossed the threshold into mixed models, I am going to provide some comments, but (notably because I have not used lmer, although I attended Doug's class a few years ago at useR), will defer to and solicit comments from the lmer experts on the list.

First, I am not sure, unless we restate the model where Glucose is the response variable and Time is a covariate, that using Time in the random effects term make sense. But I could be wrong.

If we stay with and extend the ANCOVA style approach, then I might envision something like:
  
  lmer(wk4.glucose ~ baseline.glucose + treatment + gender + age + 
       (1 | arm / subject))

where the random effect term expresses the nesting of arm within subject. I am also presuming that you are not interested in arm as a main effect. So we are still concerned with the other main effects as before, but now consider the variation in the multiple measurements of glucose from each arm within each subject.

If you restate the model as I noted above, then perhaps:

  lmer(glucose ~ time + treatment + gender + age + 
       (1 | arm / subject / time))

might make sense. From a review of the archives, it would seem that a multi-level nesting is permitted in lmer formulae random effects terms, so this would reflect the nesting of arm, within subject, within time. The interpretation of this model is of course, going to be different than the ANCOVA based approach above.

Hopefully, this might at least provide a starting point for further discussion and others with greater expertise will chime in.

Regards,

Marc

P.S. Note that I trimmed some of the thread below, to conserve space...
On Aug 24, 2010, at 3:02 AM, array chip wrote:

            
#
Hi Marc, thanks for your comments. Yes, I am debating between the 2 models as 
well. 


The first model doesnot have a "time" variable, and there is only 1 level of 
nesting: arm within subject. I think the syntax for nesting is   (1|subject / 
arm) instead of (1|arm/subject).

The 2nd model certainly have a different layout of data with a "time" variable. 
It has 2-level nesting: arm within subject within time, so the syntax should 
(1|time/subject/arm)?

Now I have a little confusion on how to define the nesting here. Can I define it 
as arm within time within subject instead? so the syntax would be 
(1|arm/time/subject)? The reason I am thinking of this way is: each subject was 
measured at 2 time points (0 & 4), at each time point, measured twice at 2 arms 
(left & right).

What is the simplest way to define nesting structure? any principles that we 
should follow? Sometimes I feel I can use different nesting structures as they 
all sound reasonable to me.

Really wish someone can chime in and share their thoughts.

John




----- Original Message ----
From: Marc Schwartz <marc_schwartz at me.com>
To: array chip <arrayprofile at yahoo.com>
Cc: r-sig-mixed-models at r-project.org
Sent: Tue, August 24, 2010 11:55:36 AM
Subject: Re: [R-sig-ME] longitudinal with 2 time points

Hi John,

Since we have crossed the threshold into mixed models, I am going to provide 
some comments, but (notably because I have not used lmer, although I attended 
Doug's class a few years ago at useR), will defer to and solicit comments from 
the lmer experts on the list.

First, I am not sure, unless we restate the model where Glucose is the response 
variable and Time is a covariate, that using Time in the random effects term 
make sense. But I could be wrong.

If we stay with and extend the ANCOVA style approach, then I might envision 
something like:
  
  lmer(wk4.glucose ~ baseline.glucose + treatment + gender + age + 
       (1 | arm / subject))

where the random effect term expresses the nesting of arm within subject. I am 
also presuming that you are not interested in arm as a main effect. So we are 
still concerned with the other main effects as before, but now consider the 
variation in the multiple measurements of glucose from each arm within each 
subject.

If you restate the model as I noted above, then perhaps:

  lmer(glucose ~ time + treatment + gender + age + 
       (1 | arm / subject / time))

might make sense. From a review of the archives, it would seem that a 
multi-level nesting is permitted in lmer formulae random effects terms, so this 
would reflect the nesting of arm, within subject, within time. The 
interpretation of this model is of course, going to be different than the ANCOVA 
based approach above.

Hopefully, this might at least provide a starting point for further discussion 
and others with greater expertise will chime in.

Regards,

Marc

P.S. Note that I trimmed some of the thread below, to conserve space...
On Aug 24, 2010, at 3:02 AM, array chip wrote:

            
baseline.
#
John,

I'll throw out one more reply, since as I look at what I had below, I was suffering from some transient cerebral flatulence...

For the first model, indeed it should be:

  (1 | Subject / Arm))

For the second model, it should be:

  (Time | Subject / Arm)

So I had the nesting hierarchy reversed and of course Time should not be nested. From further searching, it would seem that in actuality, the preferred expression of the random effects term above for lmer(), would be:

  (Time | Subject : Arm) + (Time | Subject)

The initial expression would be largely equivalent to:

  random = ~Time | Subject / Arm

in lme().

But I'll await clarification from someone who will not risk steering you further astray.

Regards,

Marc
On Aug 24, 2010, at 3:21 PM, array chip wrote: