Prev 7409 / 20628 Next

Trouble Replicating Unstructured Mixed Procedure in R

John Maindonald

Thu, Jan 26, 2012 10:10 PM

I've twigged that "unstructured" means a variance-covariance matrix 
that ignores the time structure, i.e., each element is estimated 
separately.  One can do this in lme4, thus:

Orthodont$Age <- factor(Orthodont$age)

+ data=Orthodont)

Analysis of Variance Table
        Df Sum Sq Mean Sq F value
Sex      1  9.591  9.5905 33.9645
Age      3 34.876 11.6252 41.1703
Sex:Age  3  2.930  0.9768  3.4594

The sum of squares for the Sex*Age interaction agrees with that from SAS.
lmer() leaves the user to work out an appropriate df for the F-statistic.  One
has, in agreement with the SAS output:

[1] 0.05318945

The main effects SS and F's are of course not expected to agree and 
indeed, in my strong view, true SAS type III SS's are inappropriate.

I regard this R analysis however as an inelegant use of the data, with low
power.  My book with John Braun (3rd edn and if I recall correctly, 2nd)
has an analysis that I am prepared to defend that uses lmer() from lme4.
Several years ago, I placed on the web a version of the analysis that uses
 lme() from nlme:
   http://maths.anu.edu.au/~johnm/r-book/xtras/mlm-lme.pdf
I'd forgotten that it was there.

This is a list for R users.  You will not necessarily find folk here with a high
level of SAS expertise, maybe not even enough to understand what the
SAS documentation means when it uses the term "unstructured".  Charles,
I think you've done pretty well in the amount of free advice that you have
received.  I think too that Steven McKinney's point is well made.  Kevin's
interpretation of the EULA is probably more or less correct, but the 
document comes across rather more ferociously than he suggests, and it
is not a good starting point for scientific assessment of the respective 
methodologies.  Many of us have other reasons why we are not very
interested in SAS does.  The implied threat in the EULA provides, even
if we are not party to any such agreement, just one more reason why SAS
is not to any large extent in our path ahead to the future.

John Maindonald             email: john.maindonald at anu.edu.au
phone : +61 2 (6125)3473    fax  : +61 2(6125)5549
Centre for Mathematics & Its Applications, Room 1194,
John Dedman Mathematical Sciences Building (Building 27)
Australian National University, Canberra ACT 0200.
http://www.maths.anu.edu.au/~johnm

On 27/01/2012, at 4:03 PM, Kevin Wright wrote:

Dear Charles,

First, I hope you are not put off by the tone of some the responses in this
email thread.  Sometimes people have strong opinions and it might come
across aggressively.

Second, be sure to understand that reproducing a SAS analysis with lme in
no way violates any legal agreements that SAS may have, if for no other
reason than you never signed an agreement with SAS!  That bit in the EULA
about decompiling and reverse engineering means that people are prohibited
from creating a new version of PROC MIXED that does the same thing.  The
nlme package uses different methods than SAS. E.g. different optimizers,
even uses a log-parameterization deep in the code so that negative variance
components cannot happen.

Third, by now you've probably figured out that PROC MIXED and lme have very
different ideas about degrees of freedom.  Also, the loglikelihoods are on
different scales.  For that reason, when I try to reproduce an analysis, I
find the best way to compare is to look at the variance components.

Here is what PROC MIXED says:

   Cov Parm Estimate     Std Error       Z  Pr > |Z|
UN(1,1)     5.41545455    1.53172185    3.54    0.0004
UN(2,1)     2.71681818    1.09623989    2.48    0.0132
UN(2,2)     4.18477273    1.18363247    3.54    0.0004
UN(3,1)     3.91022727    1.41775367    2.76    0.0058
UN(3,2)     2.92715909    1.19304751    2.45    0.0141
UN(3,3)     6.45573864    1.82595863    3.54    0.0004
UN(4,1)     2.71022727    1.17209851    2.31    0.0208
UN(4,2)     3.31715909    1.12903016    2.94    0.0033
UN(4,3)     4.13073864    1.40356157    2.94    0.0033
UN(4,4)     4.98573864    1.41017984    3.54    0.0004
Residual         1.00000000 .       .         .

That's our target, and here is the lme code to get us there.

library(nlme)
orth <- as.data.frame(Orthodont)
m1 <- gls(distance ~ Sex*age,
         correlation=corSymm(form = ~ 1 | Subject),
         weights = varIdent(form = ~ 1 | age),
         data = orth)
cors <- corMatrix(m1$modelStruct$corStruct)[[1]]
vars <- c(1.0000000, 0.8788793, 1.0744596, 0.9586878)^2 * 2.329213^2
covs <- outer(vars,vars, function(x,y) sqrt(x)*sqrt(y))
cors*covs

Now, some explanations will surely help, so let's step through the code
with comments.

The Orthodont data is a groupedData object, which can be helpful, but
sometimes confusing, so coercing to a data.frame removes the formula--you
can see it with 'formula(Orthodont)'.

orth <- as.data.frame(Orthodont)

Since there is no "random" line in the PROC MIXED code, we don't want to
use 'lme' (also why I removed the formula from the data), but instead use
'gls' for generalized least squares.  The 'corSymm' part specifies a
symmetric correlation matrix with 1 on the diagonal.  Looking at the MIXED
parameters, the results are given a covariances, not correlations.  Note
how the variances UN(1,1), UN(2,2,), etc are different, not constant along
the diagonal.  We use the 'weights' statement to specify different stratum
variances.

m1 <- gls(distance ~ Sex*age,
         correlation=corSymm(form = ~ 1 | Subject),
         weights = varIdent(form = ~ 1 | age),
         data = orth)

Now we have to convert the correlations and standard deviations of lme into
covariance parameters of MIXED.

First, extract the correlation matrix from lme.

cors <- corMatrix(m1$modelStruct$corStruct)[[1]]

Now, hard-code the stratum std deviations and square them to get
variances.  Multiply by the square of the residual std dev.  (There's a way
to extract all these values from the fitted model instead of hand-typing
them, but I can't find them at the moment.)

vars <- c(1.0000000, 0.8788793, 1.0744596, 0.9586878)^2 * 2.329213^2

Now create a matrix of variances and covariances.

covs <- outer(vars,vars, function(x,y) sqrt(x)*sqrt(y))

Finally, multiply the correlations by the covariances.  Let's compare
results:

PROC MIXED:

Row Col1 Col2 Col3 Col4
1 5.415 2.716 3.910 2.710
2 2.716 4.184 2.927 3.317
3 3.910 2.927 6.455 4.130
4 2.710 3.317 4.130 4.985

R> round(cors*covs,3)
     [,1]  [,2]  [,3]  [,4]
[1,] 5.425 2.709 3.841 2.715
[2,] 2.709 4.191 2.975 3.314
[3,] 3.841 2.975 6.263 4.133
[4,] 2.715 3.314 4.133 4.986

In my experience, the small differences in the results are typical for
unstructured models.

Hope this helps.

Kevin Wright


On Tue, Jan 24, 2012 at 8:32 PM, Charles Determan Jr <deter088 at umn.edu>wrote:

Greetings,

I have been working on R for some time now and I have begun the endeavor of
trying to replicate some SAS code in R.  I have scoured the forums but
haven't been able to find an answer.  I hope one of you could be so kind as
to enlighten me.

I am attempting to replicate a repeated measures experiment using some
standard data.  I have posted the SAS code and output directly from a
publication as well as my attempts in R to replicate it.  My main issue
comes with the 'unstructured' component.

The 'dental' dataset from 'mixedQF' package,
equivalent to formixed data in SAS

  distance age Subject    Sex
1       26.0   8     M01   Male
2       25.0  10     M01   Male
3       29.0  12     M01   Male
4       31.0  14     M01   Male
5       21.5   8     M02   Male
6       22.5  10     M02   Male
7       23.0  12     M02   Male
8       26.5  14     M02   Male
9       23.0   8     M03   Male
10      22.5  10     M03   Male
11      24.0  12     M03   Male
12      27.5  14     M03   Male
13      25.5   8     M04   Male
14      27.5  10     M04   Male
15      26.5  12     M04   Male
16      27.0  14     M04   Male
17      20.0   8     M05   Male
18      23.5  10     M05   Male
19      22.5  12     M05   Male
20      26.0  14     M05   Male
21      24.5   8     M06   Male
22      25.5  10     M06   Male
23      27.0  12     M06   Male
24      28.5  14     M06   Male
25      22.0   8     M07   Male
26      22.0  10     M07   Male
27      24.5  12     M07   Male
28      26.5  14     M07   Male
29      24.0   8     M08   Male
30      21.5  10     M08   Male
31      24.5  12     M08   Male
32      25.5  14     M08   Male
33      23.0   8     M09   Male
34      20.5  10     M09   Male
35      31.0  12     M09   Male
36      26.0  14     M09   Male
37      27.5   8     M10   Male
38      28.0  10     M10   Male
39      31.0  12     M10   Male
40      31.5  14     M10   Male
41      23.0   8     M11   Male
42      23.0  10     M11   Male
43      23.5  12     M11   Male
44      25.0  14     M11   Male
45      21.5   8     M12   Male
46      23.5  10     M12   Male
47      24.0  12     M12   Male
48      28.0  14     M12   Male
49      17.0   8     M13   Male
50      24.5  10     M13   Male
51      26.0  12     M13   Male
52      29.5  14     M13   Male
53      22.5   8     M14   Male
54      25.5  10     M14   Male
55      25.5  12     M14   Male
56      26.0  14     M14   Male
57      23.0   8     M15   Male
58      24.5  10     M15   Male
59      26.0  12     M15   Male
60      30.0  14     M15   Male
61      22.0   8     M16   Male
62      21.5  10     M16   Male
63      23.5  12     M16   Male
64      25.0  14     M16   Male
65      21.0   8     F01 Female
66      20.0  10     F01 Female
67      21.5  12     F01 Female
68      23.0  14     F01 Female
69      21.0   8     F02 Female
70      21.5  10     F02 Female
71      24.0  12     F02 Female
72      25.5  14     F02 Female
73      20.5   8     F03 Female
74      24.0  10     F03 Female
75      24.5  12     F03 Female
76      26.0  14     F03 Female
77      23.5   8     F04 Female
78      24.5  10     F04 Female
79      25.0  12     F04 Female
80      26.5  14     F04 Female
81      21.5   8     F05 Female
82      23.0  10     F05 Female
83      22.5  12     F05 Female
84      23.5  14     F05 Female
85      20.0   8     F06 Female
86      21.0  10     F06 Female
87      21.0  12     F06 Female
88      22.5  14     F06 Female
89      21.5   8     F07 Female
90      22.5  10     F07 Female
91      23.0  12     F07 Female
92      25.0  14     F07 Female
93      23.0   8     F08 Female
94      23.0  10     F08 Female
95      23.5  12     F08 Female
96      24.0  14     F08 Female
97      20.0   8     F09 Female
98      21.0  10     F09 Female
99      22.0  12     F09 Female
100     21.5  14     F09 Female
101     16.5   8     F10 Female
102     19.0  10     F10 Female
103     19.0  12     F10 Female
104     19.5  14     F10 Female
105     24.5   8     F11 Female
106     25.0  10     F11 Female
107     28.0  12     F11 Female
108     28.0  14     F11 Female

*Mixed modeling and fixed effect test*
SAS
proc mixed data=formixed;
class gender age person;
model y = gender|age;
repeated / type=cs sub=person;
run;

output of interest to me
        Tests of Fixed Effects
Source             NDF   DDF    Type III F    Pr > F
GENDER           1        25        9.29        0.0054
AGE                  3        75       35.35       0.0001
GENDER*AGE   3        75        2.36        0.0781

R (nlme package)
y=lme(distance~Sex*age, random=(~1|Subject), data=dental)
anova(y)

          numDF denDF  F-value p-value
(Intercept)     1    75 4123.156  <.0001
Sex              1    25    9.292  0.0054
age               3    75   40.032  <.0001
Sex:age        3    75    2.362  0.0781

Now this isn't exact but it is extremely close, however when I try to
replicate the unstructured,

SAS
proc mixed data=formixed;
class gender age person;
model y = gender|age;
repeated / type=un sub=person;
run;

           Tests of Fixed Effects
Source          NDF DDF Type III F Pr > F
GENDER         1    25     9.29    0.0054
AGE                3    25    34.45   0.0001
GENDER*AGE 3    25     2.93    0.0532

R
either
y=lme(distance~Sex*age, random=(~1|Subject), corr=corSymm(,~1|Subject),
data=dental)
anova(y)
or
z=lme(distance~Sex*age, random=(~1|Subject), corr=corSymm(), data=dental)
anova(z)

gives the output

          numDF denDF  F-value    p-value
(Intercept)     1    75     4052.028  <.0001
Sex              1    25       8.462      0.0075
age               3    75      39.022    <.0001
Sex:age        3    75       2.868      0.0421

What am I doing wrong to replicate the unstructured linear mixed model from
SAS?

Regards,

Charles

      [[alternative HTML version deleted]]

_______________________________________________
R-sig-mixed-models at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models

_______________________________________________
R-sig-mixed-models at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models

Thread (19 messages)

Charles Determan Jr Trouble Replicating Unstructured Mixed Procedure in R Jan 24 Thompson,Paul Trouble Replicating Unstructured Mixed Procedure in R Jan 24 Charles Determan Jr Trouble Replicating Unstructured Mixed Procedure in R Jan 25 David Atkins Trouble Replicating Unstructured Mixed Procedure in R Jan 25 David Duffy Trouble Replicating Unstructured Mixed Procedure in R Jan 25 John Maindonald Trouble Replicating Unstructured Mixed Procedure in R Jan 25 Charles Determan Jr Trouble Replicating Unstructured Mixed Procedure in R Jan 26 Thompson,Paul Trouble Replicating Unstructured Mixed Procedure in R Jan 26 Luca Borger Trouble Replicating Unstructured Mixed Procedure in R Jan 26 Thompson,Paul Trouble Replicating Unstructured Mixed Procedure in R Jan 26 John Maindonald Trouble Replicating Unstructured Mixed Procedure in R Jan 26 Charles Determan Jr Trouble Replicating Unstructured Mixed Procedure in R Jan 26 Kevin Wright Trouble Replicating Unstructured Mixed Procedure in R Jan 26 John Maindonald Trouble Replicating Unstructured Mixed Procedure in R Jan 26 Charles Determan Jr Trouble Replicating Unstructured Mixed Procedure in R Jan 27 Charles Determan Jr Trouble Replicating Unstructured Mixed Procedure in R Jan 27 iwhite@staffmail.ed.ac.uk Trouble Replicating Unstructured Mixed Procedure in R Jan 27 Steven J. Pierce Trouble Replicating Unstructured Mixed Procedure in R Jan 27 Juliet Hannah Trouble Replicating Unstructured Mixed Procedure in R Jan 27