Repeated measures mixed model with AR(1) correlation structure in nlme vs SAS Proc Mixed
This makes a really nice comparison, thank you. It makes me think that there is some difference that we haven't yet figured out in the way that lme and PROC MIXED are defining the model, although nothing springs to mind, although I can't see what it would be. As far as I can tell, both are fitting a fixed effect of Response ~ Day where Day is a categorical predictor; both are treating Subject as a random effect (affecting the intercept only); both are assuming AR(1) autocorrelation in the residuals, applying only within-subject; both are using REML ... ? I agree with your points about taking account of the gap between time=3 and time=6 -- but I think the main point here is the narrower one of "what are SAS and R doing differently for this particular (even if sub-optimal) model"? What is the meaning of "Columns in X 5; Columns in Z 0" in the SAS output; I would have thought X (fixed effect design matrix) would have 4 columns, Z (random effect design matrix) would have 10 ... ?
On 11-03-27 03:13 PM, Jim Baldwin wrote:
I've added equivalent loop in SAS to obtain the equivalent output as
Prof. Bolker has below in R. The SAS code is as follows:
*%macro**/loop/*;
%doi=*0*%to*100*;
%letrho = %SYSEVALF(0.2 + 0.2*&i/100);
proc mixed data=sasdata method=REML itdetails;
class Subject Day;
model Response = Day / solution;
** Set initial values for variance components and*
* hold the correlation coefficient to a constant;*
parms (&rho) (*0.06330649*) / hold=*1*;
repeated / subject = Subject type=ar(*1*);
** Keep a copy of the fit statistics;*
ods output fitstatistics=fs1;
run;
** Grab just what is needed from the fit statistics dataset;*
data fs1;
set fs1;
keep rho M2LogL;
if descr = "-2 Res Log Likelihood";
M2LogL = value;
rho = ρ
run;
** Merge with overall summary;*
data fs;
set fs fs1;
keep M2LogL rho;
if rho = *.*then delete;
run;
%end;
*%mend*loop;
** Initialize dataset to hold rho and -2*logL;*
*data*fs; *run*;
** Run through loop and print out results;*
%*/loop/*;
*proc**print*data=fs;
*run*;
The resulting values are
M2LogL = c(12.1794,12.1641,12.1491,12.1342,12.1196,12.1051,12.0909,12.0769,
12.0631,12.0494,12.036,12.0229,12.0099,11.9971,11.9845,11.9722,11.96,11.9481,
11.9364,11.9249,11.9136,11.9025,11.8916,11.881,11.8706,11.8603,11.8503,11.8405,
11.831,11.8216,11.8125,11.8036,11.7949,11.7864,11.7781,11.7701,11.7623,11.7547,
11.7473,11.7402,11.7332,11.7265,11.72,11.7138,11.7078,11.7019,11.6964,11.691,
11.6859,11.681,11.6763,11.6719,11.6677,11.6637,11.6599,11.6564,11.6531,11.6501,
11.6473,11.6447,11.6423,11.6402,11.6383,11.6366,11.6352,11.634,11.6331,11.6324,
11.6319,11.6317,11.6317,11.632,11.6325,11.6332,11.6342,11.6354,11.6369,11.6386,
11.6405,11.6427,11.6452,11.6478,11.6508,11.654,11.6574,11.6611,11.665,11.6692,
11.6736,11.6783,11.6832,11.6884,11.6939,11.6996,11.7055,11.7117,11.7182,11.7249,
11.7319,11.7391,11.7466)
rho = 0.2 + 0.2*c(0:100)/100
and these can be added to the last plot from below as
lines(rho,M2LogL,col="red")
One can then see after adding in this curve that the -2 log L values are
nearly idential above the estimate given by PROC MIXED but become more
and more different when farther from that value.
Jim
Jim Baldwin
Station Statistician
Pacific Southwest Research Station
USDA Forest Service
Albany, California
510-559-6332
P.S. (This is probably more SAS code than some (all?) of you would have
ever
wanted to see in your life but such adversity maybe builds character?)
*Ben Bolker <bbolker at gmail.com>*
Sent by: r-sig-mixed-models-bounces at r-project.org
03/27/2011 09:25 AM
To
r-sig-mixed-models at r-project.org
cc
Subject
Re: [R-sig-ME] Repeated measures mixed model with AR(1) correlation
structure in nlme vs SAS Proc Mixed
Simon Bate <simontbate at ...> writes:
Hi all, I don't know if anyone has any thoughts on this, I suspect it is a local convergence issue but am not sure. I have been trying to move from SAS Proc Mixed to R nlme and have an unusual result. I have several subjects measured at four time points. I want to model the within-subject correlation using an autoregressive structure. I've attached the R and SAS code I'm using along with the results from SAS. With R lme I get an estimate of the autoregressive parameter phi = 0.2782601, whereas SAS gives me an estimate of 0.3389 Intriguingly if I include a between subject factor, a covariate or delete one of the observations, then the results appear to agree. I'm surprised the seemingly simpler model if different between the two implementations whereas the more complex models agree. Any ideas would be most welcome! Regards, Simon
Hmmm. I do agree it could be a convergence problem of some sort.
I agree that it's worth trying to figure out what the difference
is, but I think it's going to be hard. I went through some basic
sanity checking with R (see below) and don't see an obvious problem.
library(nlme)
sasdata <- data.frame(Response=c(0.55,0.86,0.21,0.36,0.46,
0.32,0.11,0.24,0.36,0.29,0.48,0.93,
0.56,0.67,0.36,0.55,0.51,0.4,0.34,0.51,
1,0.61,0.65,0.41,0.99,0.86,0.64,0.86,0.31,
0.19,0.21,0.36,0.41,0.47,0.16,0.81,0.9,0.72,0.87,0.02),
Subject=c(1,1,1,1,2,2,2,2,3,3,3,3,
4,4,4,4,5,5,5,5,6,6,6,6,7,7,7,7,
8,8,8,8,9,9,9,9,10,10,10,10),
## or: rep(1:10,each=4)
Day=c(1,2,4,6,1,2,4,6,1,2,4,6,1,2,4,6,
1,2,4,6,1,2,4,6,1,2,4,6,1,2,4,6,1,2,4,6,1,2,4,6))
## or: rep(c(1,2,4,6),4)
sasdata$Time<-factor(sasdata$Day)
fit0<-lme(Response~Time, random=~1|Subject,
data=sasdata, na.action = (na.omit), method = "REML")
plot(ACF(fit0),alpha=0.05,grid=TRUE)
## it actually looks like AR1 is not a very good model here
## anyway?
AR1<- update(fit0,
correlation=corAR1(form=~as.numeric(Time)|Subject,
fixed =FALSE))
plot(ACF(AR1),alpha=0.05,grid=TRUE)
## ACF(AR1) shows that the raw rho(1) is about 0.2
## identical (based on order of observations within groups)
AR1B<- update(fit0,
correlation=corAR1(form=~1|Subject,
fixed =FALSE))
## construct likelihood profile on rho
rhovec <- seq(-0.99,0.99,by=0.01)
Lvec <- sapply(rhovec,
function(r) {
logLik(update(fit0,
correlation=corAR1(form=~1|Subject,
value=r,
fixed =TRUE)))
})
plot(rhovec,-2*Lvec,type="b")
abline(h=-2*logLik(AR1)+3.84,lty=2) ## LRT cutoff
abline(v=c(-0.53,0.82),lty=2)
## CIs on rho from intervals() -- reasonable ballpark
## zoom in
plot(rhovec,-2*Lvec,type="b",ylim=c(11.5,15),xlim=c(-0.5,0.7))
abline(v=c(0.278,0.339)) ## SAS/R answers
## zoom in further
plot(rhovec,-2*Lvec,type="b",ylim=c(11.6,11.8),xlim=c(0.2,0.4),
las=1,bty="l")
abline(v=c(0.278,0.339),lty=2)
text(c(0.278,0.339),rep(par("usr")[4],2),c("lme","SAS"),
pos=3,xpd=NA)
abline(h=11.63168,lty=2)
text(par("usr")[2],11.63168,"SAS",pos=1,xpd=NA)
So ... I don't see anything obviously wrong or weird about
what R is doing. You can try
update(AR1,control=lmeControl(msVerbose=TRUE))
to see more detail on the optimization, but you will have
to dig into Pinheiro & Bates 2000 and/or the code and
documentation for details on the internal parameterizations
of rho in order to interpret the results ...
###
R Code:
SAS Code:
proc mixed;
class Subject Day;
model Response = Day / outp=pout;
repeated Day / subject = Subject type=AR(1);
run;
SAS Results:
Model Information
Data Set WORK.ALLDATA
Dependent Variable Response
Covariance Structure Autoregressive
Subject Effect Subject
Estimation Method REML
Residual Variance Method Profile
Fixed Effects SE Method Model-Based
Degrees of Freedom Method Between-Within
Class Level Information
Class Levels Values
Subject 10 1 10 2 3 4 5 6 7 8 9
Day 4 1 2 3 4
Dimensions
Covariance Parameters 2
Columns in X 5
Columns in Z 0
Subjects 10
Max Obs Per Subject 4
Number of Observations
Number of Observations Read 40
Number of Observations Used 40
Number of Observations Not Used 0
Iteration History
Iteration Evaluations -2 Res Log Like Criterion
0 1 14.67045653
1 2 11.63168913 0.00000018
2 1 11.63168429 0.00000000
Convergence criteria met.
Covariance Parameter Estimates
Cov Parm Subject Estimate
AR(1) Animal1 0.3389
Residual 0.06862
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models