Skip to content

Fitting linear mixed model to longitudinal data with very few data points

7 messages · Steven J. Pierce, David Westergaard, Ben Bolker

#
Hello everyone,

First off, I've posted a similar question to StackExchange
(http://stats.stackexchange.com/questions/76980/analysis-of-longitudinal-data-with-very-few-points),
but I received no answers.

To summarise the data: From 2 subjects, 8 response values were
measured at time points T0, T1, T2, T3. At T1, subject 1 underwent
treatment. Subject 1 received no further treatment after T1.

I've reasoned that this is a repeated measures mixed model kind of
design, so I tried to fit a linear model with random effects, using
lme4:

lm1 <- lmer(Response ~ Treatment * Timepoint + (1|Subject),
data=my_data,REML=FALSE)

However, I am not sure if this model is "correct," I have entered time
points as factorial values, but I am ensure if they should instead be
numerical values. They are quite spread. On a side note, if I don't
set REML=FALSE, I get an error "Computed variance-covariance matrix is
not positive definite" when I try to run "summary(lm1)". I'm guessing
this may have something to do with my sample size.

I am a bit unsure of how to evaluate the model. The number of data
points is extremely low. My naive approach was to make an alternative
model, which does not include treatment:

lm2 <- lmer(Response ~  Timepoint + (1|subject_id), data=test,REML=FALSE)

And do an ANOVA to see which one fits the data better. This is the output:
anova(lm1,lm2)
        Df     AIC     BIC  logLik deviance  Chisq Chi Df Pr(>Chisq)
lm2  6   87.12   87.60 -37.561    75.12
lm1 10 -453.72 -452.93 236.860  -473.72 548.84      4  < 2.2e-16 ***
and is a reliable model?

What I'm trying to investigate, is:

1. Is there any observable effect after administering the drug (i.e.
is the difference between response values significantly greater than
zero)
2. If there is an effect, what is the effect size at each time point
(i.e. what is the difference between response values)
3. How does the effect vary over time
4. If there is an effect, is the effect observed from the drug at T1
still persistant at T3

Any help on this matter is much appreciated.

Regards,
David
#
You probably need data from a lot more subjects to get good estimates of the
parameters in that model. 


Steven J. Pierce, Ph.D.
Associate Director
Center for Statistical Training & Consulting (CSTAT)
Michigan State University
E-mail: pierces1 at msu.edu
Web: http://www.cstat.msu.edu 

-----Original Message-----
From: David Westergaard [mailto:david at harsk.dk] 
Sent: Sunday, November 24, 2013 2:55 AM
To: r-sig-mixed-models at r-project.org
Subject: [R-sig-ME] Fitting linear mixed model to longitudinal data with
very few data points

Hello everyone,

First off, I've posted a similar question to StackExchange
(http://stats.stackexchange.com/questions/76980/analysis-of-longitudinal-dat
a-with-very-few-points),
but I received no answers.

To summarise the data: From 2 subjects, 8 response values were
measured at time points T0, T1, T2, T3. At T1, subject 1 underwent
treatment. Subject 1 received no further treatment after T1.

I've reasoned that this is a repeated measures mixed model kind of
design, so I tried to fit a linear model with random effects, using
lme4:

lm1 <- lmer(Response ~ Treatment * Timepoint + (1|Subject),
data=my_data,REML=FALSE)

However, I am not sure if this model is "correct," I have entered time
points as factorial values, but I am ensure if they should instead be
numerical values. They are quite spread. On a side note, if I don't
set REML=FALSE, I get an error "Computed variance-covariance matrix is
not positive definite" when I try to run "summary(lm1)". I'm guessing
this may have something to do with my sample size.

I am a bit unsure of how to evaluate the model. The number of data
points is extremely low. My naive approach was to make an alternative
model, which does not include treatment:

lm2 <- lmer(Response ~  Timepoint + (1|subject_id), data=test,REML=FALSE)

And do an ANOVA to see which one fits the data better. This is the output:
anova(lm1,lm2)
        Df     AIC     BIC  logLik deviance  Chisq Chi Df Pr(>Chisq)
lm2  6   87.12   87.60 -37.561    75.12
lm1 10 -453.72 -452.93 236.860  -473.72 548.84      4  < 2.2e-16 ***
and is a reliable model?

What I'm trying to investigate, is:

1. Is there any observable effect after administering the drug (i.e.
is the difference between response values significantly greater than
zero)
2. If there is an effect, what is the effect size at each time point
(i.e. what is the difference between response values)
3. How does the effect vary over time
4. If there is an effect, is the effect observed from the drug at T1
still persistant at T3

Any help on this matter is much appreciated.

Regards,
David
#
I agree, but I won't be getting any more data, so I'm trying to find
the least-worst solution, so to speak.

Any suggestions/ideas are most welcome.

Regards,
David

2013/11/24 Steven J. Pierce <pierces1 at msu.edu>:
#
I don't see any hope of drawing trustworthy conclusions from a dataset this
small given the complexity of the model you want to use and the long list of
things you want to know. 

Maybe the least-worst approach is to accept that this data should not be
analyzed and go search the literature for previously published evidence
pertaining to your question instead, or to advocate for obtaining the
resources required to plan a study with an appropriate sample size and
research design. 

For most of your questions (e.g., pairwise comparisons at each time point),
you have two relevant data points, one with and one without treatment. It
would take a pretty extraordinary set of circumstances to convince me that
this sample is the best evidence one can acquire to answer your questions.
Barring that, doing statistics on data this sparse and using them to support
any serious decision-making seems unethical to me.


Steven J. Pierce, Ph.D.
Associate Director
Center for Statistical Training & Consulting (CSTAT)
Michigan State University
E-mail: pierces1 at msu.edu
Web: http://www.cstat.msu.edu 

-----Original Message-----
From: David Westergaard [mailto:david at harsk.dk] 
Sent: Sunday, November 24, 2013 9:44 AM
To: Steven J. Pierce
Cc: r-sig-mixed-models at r-project.org
Subject: Re: [R-sig-ME] Fitting linear mixed model to longitudinal data with
very few data points

I agree, but I won't be getting any more data, so I'm trying to find
the least-worst solution, so to speak.

Any suggestions/ideas are most welcome.

Regards,
David

2013/11/24 Steven J. Pierce <pierces1 at msu.edu>:
the
(http://stats.stackexchange.com/questions/76980/analysis-of-longitudinal-dat
#
On 13-11-24 09:43 AM, David Westergaard wrote:

            
So you have a total of 64 (2 subjects * 4 times * 8 obs) observations?
Overlooking the problem of extrapolating from two individuals to the
whole population that might get treated, it seems to me it would be
perfectly reasonable to treat this as a regular linear model problem --
specifically, ecologists would call this a "before-after-control-impact"
design.  If the individuals have different underlying time courses then
you won't be able to detect it -- it will be confounded with the
treatment effect.  Most of your questions can be set up as contrasts:
for example, the effect of the drug is represented by the interaction
between (subject) and (T0 vs. {T1,T2,T3}).  (The main effect of subject
gives the difference between subjects: the main effect of (T0 vs.
{T1,T2,T3}) gives the before-after difference for the untreated subject;
the interaction gives the estimated effect size.

  And so on.  (This is a reasonable question, but I don't think it can
be framed as a mixed model question with this design.)

  Ben Bolker
1 day later
#
Thank you both for your valuable input. Just to clarify, this is NOT a
clinical study. This is a first of its kind study, and we are
interested in generating hypothesis for further investigation and
experimental evaluation. We accept the limitations of our study, but
we have a need to estimate these things, given data. Especially
because the pattern that we observe makes absolutely perfect sense
biologically.

@Ben, you could put it like that, I guess. In truth, what we have
measured is the total gene abundance. We have then binned the
abundance of individual genes into categories, and its those
categories that we term Response values.

Are there any good introductions to working with contrasts in R? When
I search google, I just get hit by a massive amount of hits, and its a
bit overwhelming. Also, how would you suggest making the design?

Best,
David

2013/11/25 Ben Bolker <bbolker at gmail.com>:
#
On 13-11-25 09:45 PM, David Westergaard wrote:
Maybe

http://www.ats.ucla.edu/stat/r/library/contrast_coding.htm
http://ms.mcmaster.ca/~bolker/classes/s4c03/notes/week2B.pdf

  (I'd welcome other suggestions from the list)