Trouble Replicating Unstructured Mixed Procedure in R
I've twigged that "unstructured" means a variance-covariance matrix that ignores the time structure, i.e., each element is estimated separately. One can do this in lme4, thus: Orthodont$Age <- factor(Orthodont$age)
orth.lmer <- lmer(distance ~ Sex*Age+((Sex*Age)|Subject),
+ data=Orthodont)
anova(orth.lmer)
Analysis of Variance Table
Df Sum Sq Mean Sq F value
Sex 1 9.591 9.5905 33.9645
Age 3 34.876 11.6252 41.1703
Sex:Age 3 2.930 0.9768 3.4594
The sum of squares for the Sex*Age interaction agrees with that from SAS.
lmer() leaves the user to work out an appropriate df for the F-statistic. One
has, in agreement with the SAS output:
1-pf(2.93, 3, 25)
[1] 0.05318945 The main effects SS and F's are of course not expected to agree and indeed, in my strong view, true SAS type III SS's are inappropriate. I regard this R analysis however as an inelegant use of the data, with low power. My book with John Braun (3rd edn and if I recall correctly, 2nd) has an analysis that I am prepared to defend that uses lmer() from lme4. Several years ago, I placed on the web a version of the analysis that uses lme() from nlme: http://maths.anu.edu.au/~johnm/r-book/xtras/mlm-lme.pdf I'd forgotten that it was there. This is a list for R users. You will not necessarily find folk here with a high level of SAS expertise, maybe not even enough to understand what the SAS documentation means when it uses the term "unstructured". Charles, I think you've done pretty well in the amount of free advice that you have received. I think too that Steven McKinney's point is well made. Kevin's interpretation of the EULA is probably more or less correct, but the document comes across rather more ferociously than he suggests, and it is not a good starting point for scientific assessment of the respective methodologies. Many of us have other reasons why we are not very interested in SAS does. The implied threat in the EULA provides, even if we are not party to any such agreement, just one more reason why SAS is not to any large extent in our path ahead to the future. John Maindonald email: john.maindonald at anu.edu.au phone : +61 2 (6125)3473 fax : +61 2(6125)5549 Centre for Mathematics & Its Applications, Room 1194, John Dedman Mathematical Sciences Building (Building 27) Australian National University, Canberra ACT 0200. http://www.maths.anu.edu.au/~johnm
On 27/01/2012, at 4:03 PM, Kevin Wright wrote:
Dear Charles,
First, I hope you are not put off by the tone of some the responses in this
email thread. Sometimes people have strong opinions and it might come
across aggressively.
Second, be sure to understand that reproducing a SAS analysis with lme in
no way violates any legal agreements that SAS may have, if for no other
reason than you never signed an agreement with SAS! That bit in the EULA
about decompiling and reverse engineering means that people are prohibited
from creating a new version of PROC MIXED that does the same thing. The
nlme package uses different methods than SAS. E.g. different optimizers,
even uses a log-parameterization deep in the code so that negative variance
components cannot happen.
Third, by now you've probably figured out that PROC MIXED and lme have very
different ideas about degrees of freedom. Also, the loglikelihoods are on
different scales. For that reason, when I try to reproduce an analysis, I
find the best way to compare is to look at the variance components.
Here is what PROC MIXED says:
Cov Parm Estimate Std Error Z Pr > |Z|
UN(1,1) 5.41545455 1.53172185 3.54 0.0004
UN(2,1) 2.71681818 1.09623989 2.48 0.0132
UN(2,2) 4.18477273 1.18363247 3.54 0.0004
UN(3,1) 3.91022727 1.41775367 2.76 0.0058
UN(3,2) 2.92715909 1.19304751 2.45 0.0141
UN(3,3) 6.45573864 1.82595863 3.54 0.0004
UN(4,1) 2.71022727 1.17209851 2.31 0.0208
UN(4,2) 3.31715909 1.12903016 2.94 0.0033
UN(4,3) 4.13073864 1.40356157 2.94 0.0033
UN(4,4) 4.98573864 1.41017984 3.54 0.0004
Residual 1.00000000 . . .
That's our target, and here is the lme code to get us there.
library(nlme)
orth <- as.data.frame(Orthodont)
m1 <- gls(distance ~ Sex*age,
correlation=corSymm(form = ~ 1 | Subject),
weights = varIdent(form = ~ 1 | age),
data = orth)
cors <- corMatrix(m1$modelStruct$corStruct)[[1]]
vars <- c(1.0000000, 0.8788793, 1.0744596, 0.9586878)^2 * 2.329213^2
covs <- outer(vars,vars, function(x,y) sqrt(x)*sqrt(y))
cors*covs
Now, some explanations will surely help, so let's step through the code
with comments.
The Orthodont data is a groupedData object, which can be helpful, but
sometimes confusing, so coercing to a data.frame removes the formula--you
can see it with 'formula(Orthodont)'.
orth <- as.data.frame(Orthodont)
Since there is no "random" line in the PROC MIXED code, we don't want to
use 'lme' (also why I removed the formula from the data), but instead use
'gls' for generalized least squares. The 'corSymm' part specifies a
symmetric correlation matrix with 1 on the diagonal. Looking at the MIXED
parameters, the results are given a covariances, not correlations. Note
how the variances UN(1,1), UN(2,2,), etc are different, not constant along
the diagonal. We use the 'weights' statement to specify different stratum
variances.
m1 <- gls(distance ~ Sex*age,
correlation=corSymm(form = ~ 1 | Subject),
weights = varIdent(form = ~ 1 | age),
data = orth)
Now we have to convert the correlations and standard deviations of lme into
covariance parameters of MIXED.
First, extract the correlation matrix from lme.
cors <- corMatrix(m1$modelStruct$corStruct)[[1]]
Now, hard-code the stratum std deviations and square them to get
variances. Multiply by the square of the residual std dev. (There's a way
to extract all these values from the fitted model instead of hand-typing
them, but I can't find them at the moment.)
vars <- c(1.0000000, 0.8788793, 1.0744596, 0.9586878)^2 * 2.329213^2
Now create a matrix of variances and covariances.
covs <- outer(vars,vars, function(x,y) sqrt(x)*sqrt(y))
Finally, multiply the correlations by the covariances. Let's compare
results:
PROC MIXED:
Row Col1 Col2 Col3 Col4
1 5.415 2.716 3.910 2.710
2 2.716 4.184 2.927 3.317
3 3.910 2.927 6.455 4.130
4 2.710 3.317 4.130 4.985
R> round(cors*covs,3)
[,1] [,2] [,3] [,4]
[1,] 5.425 2.709 3.841 2.715
[2,] 2.709 4.191 2.975 3.314
[3,] 3.841 2.975 6.263 4.133
[4,] 2.715 3.314 4.133 4.986
In my experience, the small differences in the results are typical for
unstructured models.
Hope this helps.
Kevin Wright
On Tue, Jan 24, 2012 at 8:32 PM, Charles Determan Jr <deter088 at umn.edu>wrote:
Greetings,
I have been working on R for some time now and I have begun the endeavor of
trying to replicate some SAS code in R. I have scoured the forums but
haven't been able to find an answer. I hope one of you could be so kind as
to enlighten me.
I am attempting to replicate a repeated measures experiment using some
standard data. I have posted the SAS code and output directly from a
publication as well as my attempts in R to replicate it. My main issue
comes with the 'unstructured' component.
The 'dental' dataset from 'mixedQF' package,
equivalent to formixed data in SAS
distance age Subject Sex
1 26.0 8 M01 Male
2 25.0 10 M01 Male
3 29.0 12 M01 Male
4 31.0 14 M01 Male
5 21.5 8 M02 Male
6 22.5 10 M02 Male
7 23.0 12 M02 Male
8 26.5 14 M02 Male
9 23.0 8 M03 Male
10 22.5 10 M03 Male
11 24.0 12 M03 Male
12 27.5 14 M03 Male
13 25.5 8 M04 Male
14 27.5 10 M04 Male
15 26.5 12 M04 Male
16 27.0 14 M04 Male
17 20.0 8 M05 Male
18 23.5 10 M05 Male
19 22.5 12 M05 Male
20 26.0 14 M05 Male
21 24.5 8 M06 Male
22 25.5 10 M06 Male
23 27.0 12 M06 Male
24 28.5 14 M06 Male
25 22.0 8 M07 Male
26 22.0 10 M07 Male
27 24.5 12 M07 Male
28 26.5 14 M07 Male
29 24.0 8 M08 Male
30 21.5 10 M08 Male
31 24.5 12 M08 Male
32 25.5 14 M08 Male
33 23.0 8 M09 Male
34 20.5 10 M09 Male
35 31.0 12 M09 Male
36 26.0 14 M09 Male
37 27.5 8 M10 Male
38 28.0 10 M10 Male
39 31.0 12 M10 Male
40 31.5 14 M10 Male
41 23.0 8 M11 Male
42 23.0 10 M11 Male
43 23.5 12 M11 Male
44 25.0 14 M11 Male
45 21.5 8 M12 Male
46 23.5 10 M12 Male
47 24.0 12 M12 Male
48 28.0 14 M12 Male
49 17.0 8 M13 Male
50 24.5 10 M13 Male
51 26.0 12 M13 Male
52 29.5 14 M13 Male
53 22.5 8 M14 Male
54 25.5 10 M14 Male
55 25.5 12 M14 Male
56 26.0 14 M14 Male
57 23.0 8 M15 Male
58 24.5 10 M15 Male
59 26.0 12 M15 Male
60 30.0 14 M15 Male
61 22.0 8 M16 Male
62 21.5 10 M16 Male
63 23.5 12 M16 Male
64 25.0 14 M16 Male
65 21.0 8 F01 Female
66 20.0 10 F01 Female
67 21.5 12 F01 Female
68 23.0 14 F01 Female
69 21.0 8 F02 Female
70 21.5 10 F02 Female
71 24.0 12 F02 Female
72 25.5 14 F02 Female
73 20.5 8 F03 Female
74 24.0 10 F03 Female
75 24.5 12 F03 Female
76 26.0 14 F03 Female
77 23.5 8 F04 Female
78 24.5 10 F04 Female
79 25.0 12 F04 Female
80 26.5 14 F04 Female
81 21.5 8 F05 Female
82 23.0 10 F05 Female
83 22.5 12 F05 Female
84 23.5 14 F05 Female
85 20.0 8 F06 Female
86 21.0 10 F06 Female
87 21.0 12 F06 Female
88 22.5 14 F06 Female
89 21.5 8 F07 Female
90 22.5 10 F07 Female
91 23.0 12 F07 Female
92 25.0 14 F07 Female
93 23.0 8 F08 Female
94 23.0 10 F08 Female
95 23.5 12 F08 Female
96 24.0 14 F08 Female
97 20.0 8 F09 Female
98 21.0 10 F09 Female
99 22.0 12 F09 Female
100 21.5 14 F09 Female
101 16.5 8 F10 Female
102 19.0 10 F10 Female
103 19.0 12 F10 Female
104 19.5 14 F10 Female
105 24.5 8 F11 Female
106 25.0 10 F11 Female
107 28.0 12 F11 Female
108 28.0 14 F11 Female
*Mixed modeling and fixed effect test*
SAS
proc mixed data=formixed;
class gender age person;
model y = gender|age;
repeated / type=cs sub=person;
run;
output of interest to me
Tests of Fixed Effects
Source NDF DDF Type III F Pr > F
GENDER 1 25 9.29 0.0054
AGE 3 75 35.35 0.0001
GENDER*AGE 3 75 2.36 0.0781
R (nlme package)
y=lme(distance~Sex*age, random=(~1|Subject), data=dental)
anova(y)
numDF denDF F-value p-value
(Intercept) 1 75 4123.156 <.0001
Sex 1 25 9.292 0.0054
age 3 75 40.032 <.0001
Sex:age 3 75 2.362 0.0781
Now this isn't exact but it is extremely close, however when I try to
replicate the unstructured,
SAS
proc mixed data=formixed;
class gender age person;
model y = gender|age;
repeated / type=un sub=person;
run;
Tests of Fixed Effects
Source NDF DDF Type III F Pr > F
GENDER 1 25 9.29 0.0054
AGE 3 25 34.45 0.0001
GENDER*AGE 3 25 2.93 0.0532
R
either
y=lme(distance~Sex*age, random=(~1|Subject), corr=corSymm(,~1|Subject),
data=dental)
anova(y)
or
z=lme(distance~Sex*age, random=(~1|Subject), corr=corSymm(), data=dental)
anova(z)
gives the output
numDF denDF F-value p-value
(Intercept) 1 75 4052.028 <.0001
Sex 1 25 8.462 0.0075
age 3 75 39.022 <.0001
Sex:age 3 75 2.868 0.0421
What am I doing wrong to replicate the unstructured linear mixed model from
SAS?
Regards,
Charles
[[alternative HTML version deleted]]
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
-- Kevin Wright [[alternative HTML version deleted]]
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models