I am trying to figure out mixed models using a simulated example.
I get the essentially the same parameter values and SEs whether I do:
1. lm() ignoring the within subjects nature of the expt
2. lmer()
3. fit a line to each subject's data and find mean and SE of parameters
across subjects
At this point, my feeling is I should just do (1) or possibly (3) above.
Please tell me why I should do anything else. Maybe my example is somehow
weird? It is typical of expts I deal with.
- cond is a factor with levels a, b, and c
- x is a continuous indep var
- each condition has slope and intercept
- each subject receives all conditions; each subject has own slope
and intercept randomly varying about population value
==========================
set.seed(1234)
nsubj<-30
z<-expand.grid(x=1:5,cond=c("a","b","c"),subj=as.factor(1:nsubj))
x<-z$x
cond<-z$cond
subj<-z$subj
a1<- rep(1+rnorm(nsubj,mean=0,sd=.1),5) #inta=1
a2<- rep(1+rnorm(nsubj,mean=0,sd=.1),5) #intb-inta=2-1=1
a3<- rep(2+rnorm(nsubj,mean=0,sd=.1),5) #intc-inta=3-1=2
a4<- rep(3+rnorm(nsubj,mean=0,sd=.1),5) #slopea=3
a5<- rep(-1+rnorm(nsubj,mean=0,sd=.1),5) #slopeb-slopea
a6<- rep(-2+rnorm(nsubj,mean=0,sd=.1),5) #slopec-slopea
y<- (cond=="a")*(a1 + a4*x) + (cond=="b")*(a1+a2 + (a4+a5)*x) +
(cond=="c")*(a1+a3 + (a4+a6)*x) + rnorm(5*3*nsubj,mean=0,sd=.5)
#### ignore within subjects design and treat as independent errors
fit1<-lm(y~cond*x)
summary(fit)
## mixed model using lmer()
library(lme4)
fit2<-lmer(y~x*cond + (x*cond|subj))
summary(fit2)
## lme from nlme package
library(nlme)
fit3<-lme(y~x*cond, random=~(x*cond)|subj)
#does not converge
##fit line to each subject; get mean and se of params across subjects
beta<-matrix(0,ncol=6,nrow=nsubj)
for(i in 1:nsubj)
{
f<-lm(y[subj==i]~cond[subj==i]*x[subj==i])
beta[i,]<-coef(f)
}
colMeans(beta)
apply(beta,2,sd)/sqrt(nsubj)
# 0.9730283 0.3577780 1.8709841 3.0270420 -0.8013505 -1.9840602
# 0.10121264 0.17860228 0.15895837 0.02820369 0.04765495 0.03834946
======================
This last method makes the most sense to me. Getting 95% CIs and p-values
for the params would be straightforward.
Thanks very much for your help.
Stan
within subjects, 3 conditions, 3 lines
5 messages · Stanislav Aggerwal, marKo, ONKELINX, Thierry
1 day later
I will attempt an answer to my own question. Possibly the reason I got the same answer with the 3 methods is that my simulation was wrong. It simulated the responses as: y<- (cond=="a")*(a1 + a4*x) + (cond=="b")*(a1+a2 + (a4+a5)*x) + (cond=="c")*(a1+a3 + (a4+a6)*x) + rnorm(5*3*nsubj,mean=0,sd=.5) This contains the intercepts and slopes which are unique to each subject and randomly sampled from some population values. However the term: rnorm(5*3*nsubj,mean=0,sd=.5) is wrong because it says the errors are independent. Instead they should be correlated within each subject. I will give that a go. Stan
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 I think that you have correctly identified the problem. Cheers, Marko
On 01/21/2015 09:16 AM, Stanislav Aggerwal wrote:
I will attempt an answer to my own question. Possibly the reason I got the same answer with the 3 methods is that my simulation was wrong. It simulated the responses as: y<- (cond=="a")*(a1 + a4*x) + (cond=="b")*(a1+a2 + (a4+a5)*x) + (cond=="c")*(a1+a3 + (a4+a6)*x) + rnorm(5*3*nsubj,mean=0,sd=.5) This contains the intercepts and slopes which are unique to each subject and randomly sampled from some population values. However the term: rnorm(5*3*nsubj,mean=0,sd=.5) is wrong because it says the errors are independent. Instead they should be correlated within each subject. I will give that a go. Stan [[alternative HTML version deleted]]
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
-----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAEBAgAGBQJUv2OgAAoJEJcj4KySkkQsUbgP/jhOyGuwPpAyS8I6uFElxztg sQdiGeUaVKjutcJJsHisezEcRxcf84k09hWnR1bnc/fuFQgDqS58qnoUa2JJqiZh BrnCjIV7Cj///DLC0MLzyzvmZLq21medi6vdUwdAF5gc1GOJHoA9IWgoEACucqEc SbpWiaqUWpNK2rCOZ3crTEBZNHEHO72qKBq670L5zDTfRfx4IXsjGMyY68Lf/Tz/ jECjDGM3byO2pw5KginLWl3hA+0ChcKNNmti3rY8p9JVZ8c4iEDrT0Hjvx/VN8zn JPcRSJFR263haozcENATtAgUmBCT66j+IfB/7aQgAOthTdEWWjtnnnVxTbDvBkik tVB617PXdUL5l7kcmxZYcwiigOQerJrEDcUhoL8SMtL54JluWg9MZIZxXmjGHMlm mtrvbv6rfU3XzB80ck8zm6SSgPwP1IYoqJZ7QWBx0XTuU057tjyG8rA6F/jkYc3t Uoe0NwY1Ox9Q62paJQ4sysyf2dgyW2RYRVVQuK1UjlH9ZfmETtq36J6yyEXjuhir ztVsKeofNgRKo4FrmhcOLcBPofzQBnzTI1RjuqCxcbt9vsr4Wwn1uvup20OqDYb7 VKbzoKr8HR26YDS7Oc3vUiOzjszNEALnKAWFoUfW/BQRQGzIZl0lR1g52SzRphL5 u/shRpd5ZYjTJUgT9IQ+ =/B+Q -----END PGP SIGNATURE-----
I have revised the simulation. Now each subject has autocorrelated errors.
It is a substantial autocorrelation. Still, the 3 methods, including the
obviously "wrong" and simple lm() method, all give the same parameter
estimates and SEs.
Thanks for any insights. Should I really just ignore the repeated measures
design and use lm() for problems like this? (Funny how lme() cannot fit
these data)
Cheers,
Stan
The simulation:
######### previous example assumed indep errors. Now use errors that
# are correlated within each subject
set.seed(1234)
nsubj<-30
z<-expand.grid(x=1:5,cond=c("a","b","c"),subj=as.factor(1:nsubj))
x<-z$x
cond<-z$cond
subj<-z$subj
y<-rep(0,nrow(z))
a1<- rep(1+rnorm(nsubj,mean=0,sd=.1),each=5) #inta=1
a2<- rep(1+rnorm(nsubj,mean=0,sd=.1),each=5) #intb-inta=2-1=1
a3<- rep(2+rnorm(nsubj,mean=0,sd=.1),each=5) #intc-inta=3-1=2
a4<- rep(3+rnorm(nsubj,mean=0,sd=.1),each=5) #slopea=3
a5<- rep(-1+rnorm(nsubj,mean=0,sd=.1),each=5) #slopeb-slopea
a6<- rep(-2+rnorm(nsubj,mean=0,sd=.1),each=5) #slopec-slopea
y[cond=="a"]<-a1 + a4*x[cond=="a"]
y[cond=="b"]<-a1+a2 + (a4+a5)*x[cond=="b"]
y[cond=="c"]<-a1+a3 + (a4+a6)*x[cond=="c"]
autocorrelated errors
for(i in 1:nsubj)
{
y[subj==i]<-y[subj==i] +
as.numeric(filter(rnorm(5*3,mean=0,sd=.5),filter=0.5,method="recursive"))
}
plot(x[cond=="a"],y[cond=="a"],xlab="x", ylab="y")
points(x[cond=="b"],y[cond=="b"],col='red')
points(x[cond=="c"],y[cond=="c"],col='green')
## ignore within subjects design and treat as independent errors
fit<-lm(y~cond*x)
summary(fit)
# Estimate Std. Error t value Pr(>|t|)
#(Intercept) 1.00355 0.13628 7.364 8.75e-13 ***
#condb 0.84320 0.19273 4.375 1.51e-05 ***
#condc 2.03207 0.19273 10.544 < 2e-16 ***
#x 2.99975 0.04109 73.005 < 2e-16 ***
#condb:x -0.95972 0.05811 -16.516 < 2e-16 ***
#condc:x -1.98477 0.05811 -34.156 < 2e-16 ***
b<-coef(fit)
abline(b[1],b[4]) #line for cond a
abline(b[1]+b[2],b[4]+b[5],col='red') #line for cond b
abline(b[1]+b[3],b[4]+b[6],col='green') #line for cond c
## use Linear Mixed Effects to take account of repeated measures
# lmer() from lme4 package
library(lme4)
fit2<-lmer(y~x*cond + (x*cond|subj))
summary(fit2)
# no p-values. Same param estimates and ses as fit()
#Fixed effects:
# Estimate Std. Error t value
#(Intercept) 1.00355 0.12632 7.94
#x 2.99975 0.03463 86.62
#condb 0.84320 0.21619 3.90
#condc 2.03207 0.18783 10.82
#x:condb -0.95972 0.05558 -17.27
#x:condc -1.98477 0.04671 -42.50
## lme from nlme package
#library(nlme)
#fit3<-lme(y~x*cond, random=~(x*cond)|subj)
##does not converge
##fit line to each subject; get mean and sd of params across subjects
beta<-matrix(0,ncol=6,nrow=nsubj)
for(i in 1:nsubj)
{
f<-lm(y[subj==i]~cond[subj==i]*x[subj==i])
beta[i,]<-coef(f)
}
m<-colMeans(beta)
s<-apply(beta,2,sd)/sqrt(nsubj)
z<-m/s
pval<-pnorm(abs(z),lower.tail=F)*2 #two-tailed
#pval<-pt(abs(z),df=nsubj-1,lower.tail=F)*2 #two-tailed
cbind(m,s,z,pval)
# m s z pval
#[1,] 1.0035461 0.12250006 8.192209 2.564748e-16
#[2,] 0.8432030 0.21703074 3.885178 1.022551e-04
#[3,] 2.0320664 0.19155457 10.608290 2.726940e-26
#[4,] 2.9997546 0.03330906 90.058219 0.000000e+00
#[5,] -0.9597171 0.05593306 -17.158318 5.446869e-66
#[6,] -1.9847668 0.05045750 -39.335421 0.000000e+00
Dear Stan,
You get the right answer for the wrong question. Because you have a bug in the code that simulates the data. You added a second noise term instead of a random intercept. Hint: set all parameters not related to the random intercept to 0 in a1 to a6. Then a1 to a6 should be equal.
Best regards,
Thierry
ir. Thierry Onkelinx
Instituut voor natuur- en bosonderzoek / Research Institute for Nature and Forest
team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance
Kliniekstraat 25
1070 Anderlecht
Belgium
+ 32 2 525 02 51
+ 32 54 43 61 85
Thierry.Onkelinx at inbo.be
www.inbo.be
To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of.
~ Sir Ronald Aylmer Fisher
The plural of anecdote is not data.
~ Roger Brinner
The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data.
~ John Tukey
-----Oorspronkelijk bericht-----
Van: R-sig-mixed-models [mailto:r-sig-mixed-models-bounces at r-project.org] Namens Stanislav Aggerwal
Verzonden: woensdag 21 januari 2015 11:07
Aan: r-sig-mixed-models at r-project.org
Onderwerp: Re: [R-sig-ME] within subjects, 3 conditions, 3 lines
I have revised the simulation. Now each subject has autocorrelated errors.
It is a substantial autocorrelation. Still, the 3 methods, including the obviously "wrong" and simple lm() method, all give the same parameter estimates and SEs.
Thanks for any insights. Should I really just ignore the repeated measures design and use lm() for problems like this? (Funny how lme() cannot fit these data)
Cheers,
Stan
The simulation:
######### previous example assumed indep errors. Now use errors that # are correlated within each subject
set.seed(1234)
nsubj<-30
z<-expand.grid(x=1:5,cond=c("a","b","c"),subj=as.factor(1:nsubj))
x<-z$x
cond<-z$cond
subj<-z$subj
y<-rep(0,nrow(z))
a1<- rep(1+rnorm(nsubj,mean=0,sd=.1),each=5) #inta=1
a2<- rep(1+rnorm(nsubj,mean=0,sd=.1),each=5) #intb-inta=2-1=1
a3<- rep(2+rnorm(nsubj,mean=0,sd=.1),each=5) #intc-inta=3-1=2
a4<- rep(3+rnorm(nsubj,mean=0,sd=.1),each=5) #slopea=3
a5<- rep(-1+rnorm(nsubj,mean=0,sd=.1),each=5) #slopeb-slopea
a6<- rep(-2+rnorm(nsubj,mean=0,sd=.1),each=5) #slopec-slopea
y[cond=="a"]<-a1 + a4*x[cond=="a"]
y[cond=="b"]<-a1+a2 + (a4+a5)*x[cond=="b"]
y[cond=="c"]<-a1+a3 + (a4+a6)*x[cond=="c"]
autocorrelated errors
for(i in 1:nsubj)
{
y[subj==i]<-y[subj==i] +
as.numeric(filter(rnorm(5*3,mean=0,sd=.5),filter=0.5,method="recursive"))
}
plot(x[cond=="a"],y[cond=="a"],xlab="x", ylab="y")
points(x[cond=="b"],y[cond=="b"],col='red')
points(x[cond=="c"],y[cond=="c"],col='green')
## ignore within subjects design and treat as independent errors
fit<-lm(y~cond*x)
summary(fit)
# Estimate Std. Error t value Pr(>|t|)
#(Intercept) 1.00355 0.13628 7.364 8.75e-13 ***
#condb 0.84320 0.19273 4.375 1.51e-05 ***
#condc 2.03207 0.19273 10.544 < 2e-16 ***
#x 2.99975 0.04109 73.005 < 2e-16 ***
#condb:x -0.95972 0.05811 -16.516 < 2e-16 ***
#condc:x -1.98477 0.05811 -34.156 < 2e-16 ***
b<-coef(fit)
abline(b[1],b[4]) #line for cond a
abline(b[1]+b[2],b[4]+b[5],col='red') #line for cond b
abline(b[1]+b[3],b[4]+b[6],col='green') #line for cond c
## use Linear Mixed Effects to take account of repeated measures # lmer() from lme4 package
library(lme4)
fit2<-lmer(y~x*cond + (x*cond|subj))
summary(fit2)
# no p-values. Same param estimates and ses as fit() #Fixed effects:
# Estimate Std. Error t value
#(Intercept) 1.00355 0.12632 7.94
#x 2.99975 0.03463 86.62
#condb 0.84320 0.21619 3.90
#condc 2.03207 0.18783 10.82
#x:condb -0.95972 0.05558 -17.27
#x:condc -1.98477 0.04671 -42.50
## lme from nlme package
#library(nlme)
#fit3<-lme(y~x*cond, random=~(x*cond)|subj) ##does not converge
##fit line to each subject; get mean and sd of params across subjects
beta<-matrix(0,ncol=6,nrow=nsubj)
for(i in 1:nsubj)
{
f<-lm(y[subj==i]~cond[subj==i]*x[subj==i])
beta[i,]<-coef(f)
}
m<-colMeans(beta)
s<-apply(beta,2,sd)/sqrt(nsubj)
z<-m/s
pval<-pnorm(abs(z),lower.tail=F)*2 #two-tailed
#pval<-pt(abs(z),df=nsubj-1,lower.tail=F)*2 #two-tailed
cbind(m,s,z,pval)
# m s z pval
#[1,] 1.0035461 0.12250006 8.192209 2.564748e-16
#[2,] 0.8432030 0.21703074 3.885178 1.022551e-04
#[3,] 2.0320664 0.19155457 10.608290 2.726940e-26 #[4,] 2.9997546 0.03330906 90.058219 0.000000e+00 #[5,] -0.9597171 0.05593306 -17.158318 5.446869e-66 #[6,] -1.9847668 0.05045750 -39.335421 0.000000e+00
_______________________________________________
R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
Disclaimer<https://www.inbo.be/nl/disclaimer-mailberichten-van-het-inbo>