Dear R users,
We are working on a data set in which we have measured repeatedly a physiological response variable (y)
every 20 min for 12 h (time variable; 'x') in subjects ('id') beloning to one of five groups ('group'; 'A' to 'E'). Data are located at:https://www.dropbox.com/s/hf455aev3teb5e0/data.csv?dl=0
We are interested to model if the response in y differences with time (i.e. 'x') for the two groups. Thus:
require(nlme)
m1<-lme(y~group*x+group*I(x^2),random=~x|id,data=data.df,na.action=na.omit)
But because data are collected repeatedly over short time intervals for each subject, it seemed prudent to consider an autoregressive covariance structure. Thus:
m2<-update(m1,~.,corr=corCAR1(form=~x|id))
AIC values indicate the latter (i.e. m2) as more appropriate:
anova(m1,m2)
# Model df AIC BIC logLik Test L.Ratio p-value
#m1 1 19 2155.996 2260.767 -1058.9981
#m2 2 20 2021.944 2132.229 -990.9718 1 vs 2 136.0525 <.0001
Fixed effects and test statistics differ between models. A look at marginal ANOVA tables suggest inference might differ somewhat between models:
anova.lme(m1,type="m")
# numDF denDF F-value p-value
#(Intercept) 1 1789 63384.80 <.0001
#group 4 45 1.29 0.2893
#x 1 1789 0.05 0.8226
#I(x^2) 1 1789 4.02 0.0451
#group:x 4 1789 2.61 0.0341
#group:I(x^2) 4 1789 4.37 0.0016
anova.lme(m2,type="m")
# numDF denDF F-value p-value
#(Intercept) 1 1789 59395.79 <.0001
#group 4 45 1.33 0.2725
#x 1 1789 0.04 0.8379
#I(x^2) 1 1789 2.28 0.1312
#group:x 4 1789 2.09 0.0802
#group:I(x^2) 4 1789 2.81 0.0244
Now, this is all well. But: my colleagues have been running the same data set using PROC MIXED in SAS and come up with substantially different results when comparing SAS default covariance structure (variance components) and AR1. Specifically, there is virtually no change in either test statistics or fitted values when using AR1 instead of Variance Components in SAS, which fits the observation that AIC values (in SAS) indicate both covariance structures fit data equally well.
This is not very satisfactory to me, and I would be interesting to know what is happening here. Realizing
this might not be the correct forum for this question, I would like to ask you all if anyone would have any
input as to what is going on here, e.g. am I setting up my model erroneously, etc.?
N.b. I have no desire to replicate SAS results, but I would most certainly be interested to know what could possibly explain such a large discrepancy between the two platforms. Any suggestions greatly welcomed.
(Data are located at: https://www.dropbox.com/s/hf455aev3teb5e0/data.csv?dl=0)
With all best wishes,
Andreas
Autoregressive covariance structure for lme object and R/SAS differences in model output
2 messages · anord, Thierry Onkelinx
Dear Andreas, Write down the equation of the lme model and the SAS model and look for differences in that. It is likely that both models look at a different kind of correlation. lme() models the correlation in the residuals WITHIN the same group as defined by the random effect levels. If I recall correctly SAS has correlation structures for what they call the G-side and the R-side. You need to ask your colleagues how SAS handles the correlation. Best regards, Thierry ir. Thierry Onkelinx Instituut voor natuur- en bosonderzoek / Research Institute for Nature and Forest team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance Kliniekstraat 25 1070 Anderlecht Belgium To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher The plural of anecdote is not data. ~ Roger Brinner The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data. ~ John Tukey 2015-02-17 15:51 GMT+01:00 Andreas Nord <andreas.nord at biol.lu.se>:
Dear R users,
We are working on a data set in which we have measured repeatedly a
physiological response variable (y)
every 20 min for 12 h (time variable; 'x') in subjects ('id') beloning to
one of five groups ('group'; 'A' to 'E'). Data are located at:
https://www.dropbox.com/s/hf455aev3teb5e0/data.csv?dl=0
We are interested to model if the response in y differences with time
(i.e. 'x') for the two groups. Thus:
require(nlme)
m1<-lme(y~group*x+group*I(x^2),random=~x|id,data=data.df,na.action=na.omit)
But because data are collected repeatedly over short time intervals for
each subject, it seemed prudent to consider an autoregressive covariance
structure. Thus:
m2<-update(m1,~.,corr=corCAR1(form=~x|id))
AIC values indicate the latter (i.e. m2) as more appropriate:
anova(m1,m2)
# Model df AIC BIC logLik Test L.Ratio
p-value
#m1 1 19 2155.996 2260.767 -1058.9981
#m2 2 20 2021.944 2132.229 -990.9718 1 vs 2 136.0525 <.0001
Fixed effects and test statistics differ between models. A look at
marginal ANOVA tables suggest inference might differ somewhat between
models:
anova.lme(m1,type="m")
# numDF denDF F-value p-value
#(Intercept) 1 1789 63384.80 <.0001
#group 4 45 1.29 0.2893
#x 1 1789 0.05 0.8226
#I(x^2) 1 1789 4.02 0.0451
#group:x 4 1789 2.61 0.0341
#group:I(x^2) 4 1789 4.37 0.0016
anova.lme(m2,type="m")
# numDF denDF F-value p-value
#(Intercept) 1 1789 59395.79 <.0001
#group 4 45 1.33 0.2725
#x 1 1789 0.04 0.8379
#I(x^2) 1 1789 2.28 0.1312
#group:x 4 1789 2.09 0.0802
#group:I(x^2) 4 1789 2.81 0.0244
Now, this is all well. But: my colleagues have been running the same data
set using PROC MIXED in SAS and come up with substantially different
results when comparing SAS default covariance structure (variance
components) and AR1. Specifically, there is virtually no change in either
test statistics or fitted values when using AR1 instead of Variance
Components in SAS, which fits the observation that AIC values (in SAS)
indicate both covariance structures fit data equally well.
This is not very satisfactory to me, and I would be interesting to know
what is happening here. Realizing
this might not be the correct forum for this question, I would like to ask
you all if anyone would have any
input as to what is going on here, e.g. am I setting up my model
erroneously, etc.?
N.b. I have no desire to replicate SAS results, but I would most certainly
be interested to know what could possibly explain such a large discrepancy
between the two platforms. Any suggestions greatly welcomed.
(Data are located at:
https://www.dropbox.com/s/hf455aev3teb5e0/data.csv?dl=0)
With all best wishes,
Andreas
[[alternative HTML version deleted]]
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models