Trouble Replicating Unstructured Mixed Procedure in R

An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <https://stat.ethz.ch/pipermail/r-sig-mixed-models/attachments/20120124/c00c0e1d/attachment-0001.pl>
In the CS model, the F values for Gender and Gender*age are really close, but age is quite discrepant. That seems problematic.

-----Original Message-----
From: r-sig-mixed-models-bounces at r-project.org [mailto:r-sig-mixed-models-bounces at r-project.org] On Behalf Of Charles Determan Jr
Sent: Tuesday, January 24, 2012 8:32 PM
To: r-sig-mixed-models at r-project.org
Subject: [R-sig-ME] Trouble Replicating Unstructured Mixed Procedure in R

Greetings,

I have been working on R for some time now and I have begun the endeavor of
trying to replicate some SAS code in R.  I have scoured the forums but
haven't been able to find an answer.  I hope one of you could be so kind as
to enlighten me.

I am attempting to replicate a repeated measures experiment using some
standard data.  I have posted the SAS code and output directly from a
publication as well as my attempts in R to replicate it.  My main issue
comes with the 'unstructured' component.

The 'dental' dataset from 'mixedQF' package,
equivalent to formixed data in SAS

    distance age Subject    Sex
1       26.0   8     M01   Male
2       25.0  10     M01   Male
3       29.0  12     M01   Male
4       31.0  14     M01   Male
5       21.5   8     M02   Male
6       22.5  10     M02   Male
7       23.0  12     M02   Male
8       26.5  14     M02   Male
9       23.0   8     M03   Male
10      22.5  10     M03   Male
11      24.0  12     M03   Male
12      27.5  14     M03   Male
13      25.5   8     M04   Male
14      27.5  10     M04   Male
15      26.5  12     M04   Male
16      27.0  14     M04   Male
17      20.0   8     M05   Male
18      23.5  10     M05   Male
19      22.5  12     M05   Male
20      26.0  14     M05   Male
21      24.5   8     M06   Male
22      25.5  10     M06   Male
23      27.0  12     M06   Male
24      28.5  14     M06   Male
25      22.0   8     M07   Male
26      22.0  10     M07   Male
27      24.5  12     M07   Male
28      26.5  14     M07   Male
29      24.0   8     M08   Male
30      21.5  10     M08   Male
31      24.5  12     M08   Male
32      25.5  14     M08   Male
33      23.0   8     M09   Male
34      20.5  10     M09   Male
35      31.0  12     M09   Male
36      26.0  14     M09   Male
37      27.5   8     M10   Male
38      28.0  10     M10   Male
39      31.0  12     M10   Male
40      31.5  14     M10   Male
41      23.0   8     M11   Male
42      23.0  10     M11   Male
43      23.5  12     M11   Male
44      25.0  14     M11   Male
45      21.5   8     M12   Male
46      23.5  10     M12   Male
47      24.0  12     M12   Male
48      28.0  14     M12   Male
49      17.0   8     M13   Male
50      24.5  10     M13   Male
51      26.0  12     M13   Male
52      29.5  14     M13   Male
53      22.5   8     M14   Male
54      25.5  10     M14   Male
55      25.5  12     M14   Male
56      26.0  14     M14   Male
57      23.0   8     M15   Male
58      24.5  10     M15   Male
59      26.0  12     M15   Male
60      30.0  14     M15   Male
61      22.0   8     M16   Male
62      21.5  10     M16   Male
63      23.5  12     M16   Male
64      25.0  14     M16   Male
65      21.0   8     F01 Female
66      20.0  10     F01 Female
67      21.5  12     F01 Female
68      23.0  14     F01 Female
69      21.0   8     F02 Female
70      21.5  10     F02 Female
71      24.0  12     F02 Female
72      25.5  14     F02 Female
73      20.5   8     F03 Female
74      24.0  10     F03 Female
75      24.5  12     F03 Female
76      26.0  14     F03 Female
77      23.5   8     F04 Female
78      24.5  10     F04 Female
79      25.0  12     F04 Female
80      26.5  14     F04 Female
81      21.5   8     F05 Female
82      23.0  10     F05 Female
83      22.5  12     F05 Female
84      23.5  14     F05 Female
85      20.0   8     F06 Female
86      21.0  10     F06 Female
87      21.0  12     F06 Female
88      22.5  14     F06 Female
89      21.5   8     F07 Female
90      22.5  10     F07 Female
91      23.0  12     F07 Female
92      25.0  14     F07 Female
93      23.0   8     F08 Female
94      23.0  10     F08 Female
95      23.5  12     F08 Female
96      24.0  14     F08 Female
97      20.0   8     F09 Female
98      21.0  10     F09 Female
99      22.0  12     F09 Female
100     21.5  14     F09 Female
101     16.5   8     F10 Female
102     19.0  10     F10 Female
103     19.0  12     F10 Female
104     19.5  14     F10 Female
105     24.5   8     F11 Female
106     25.0  10     F11 Female
107     28.0  12     F11 Female
108     28.0  14     F11 Female

*Mixed modeling and fixed effect test*
SAS
proc mixed data=formixed;
class gender age person;
model y = gender|age;
repeated / type=cs sub=person;
run;

output of interest to me
          Tests of Fixed Effects
Source             NDF   DDF    Type III F    Pr > F
GENDER           1        25        9.29        0.0054
AGE                  3        75       35.35       0.0001
GENDER*AGE   3        75        2.36        0.0781

R (nlme package)
y=lme(distance~Sex*age, random=(~1|Subject), data=dental)
anova(y)

            numDF denDF  F-value p-value
(Intercept)     1    75 4123.156  <.0001
Sex              1    25    9.292  0.0054
age               3    75   40.032  <.0001
Sex:age        3    75    2.362  0.0781

Now this isn't exact but it is extremely close, however when I try to
replicate the unstructured,

SAS
proc mixed data=formixed;
class gender age person;
model y = gender|age;
repeated / type=un sub=person;
run;

             Tests of Fixed Effects
Source          NDF DDF Type III F Pr > F
GENDER         1    25     9.29    0.0054
AGE                3    25    34.45   0.0001
GENDER*AGE 3    25     2.93    0.0532

R
either
y=lme(distance~Sex*age, random=(~1|Subject), corr=corSymm(,~1|Subject),
data=dental)
anova(y)
or
z=lme(distance~Sex*age, random=(~1|Subject), corr=corSymm(), data=dental)
anova(z)

gives the output

            numDF denDF  F-value    p-value
(Intercept)     1    75     4052.028  <.0001
Sex              1    25       8.462      0.0075
age               3    75      39.022    <.0001
Sex:age        3    75       2.868      0.0421

What am I doing wrong to replicate the unstructured linear mixed model from
SAS?

Regards,

Charles

_______________________________________________
R-sig-mixed-models at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models

-----------------------------------------------------------------------
Confidentiality Notice: This e-mail message, including any attachments,
is for the sole use of the intended recipient(s) and may contain
privileged and confidential information.  Any unauthorized review, use,
disclosure or distribution is prohibited.  If you are not the intended
recipient, please contact the sender by reply e-mail and destroy
all copies of the original message.
An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <https://stat.ethz.ch/pipermail/r-sig-mixed-models/attachments/20120125/bbae2d1f/attachment-0001.pl>
Hi Charles--

So, I don't know SAS at all, so just providing the SAS code and anova 
table don't tell me (at least) all that much about what SAS is doing. 
Also keep in mind that the anova table summary will depend on the "type" 
of sums of squares (and there have been some diatribes about this and 
type III sums of squares in the past on this list).

I find it a bit more informative to look at the summary output from lme(r):

 > y=lme(distance~Sex*age, random=(~1|Subject), data=dental)
 >
 > summary(y)
Linear mixed-effects model fit by REML
  Data: dental
        AIC      BIC    logLik
   445.7572 461.6236 -216.8786

Random effects:
  Formula: ~1 | Subject
         (Intercept) Residual
StdDev:    1.816214 1.386382

Fixed effects: distance ~ Sex * age
                   Value Std.Error DF   t-value p-value
(Intercept)   16.340625 0.9813122 79 16.651810  0.0000
SexFemale      1.032102 1.5374208 25  0.671321  0.5082
age            0.784375 0.0775011 79 10.120823  0.0000
SexFemale:age -0.304830 0.1214209 79 -2.510520  0.0141
  Correlation:
               (Intr) SexFml age
SexFemale     -0.638
age           -0.869  0.555
SexFemale:age  0.555 -0.869 -0.638

Standardized Within-Group Residuals:
         Min          Q1         Med          Q3         Max
-3.59804400 -0.45461690  0.01578365  0.50244658  3.68620792

Number of Observations: 108
Number of Groups: 27

DAVE: so, we fit a random-intercept, residual error, and 4 fixed-effects 
(N = 27 people and 108 observations)

Here is your second model:

 > y=lme(distance~Sex*age, random=(~1|Subject), corr=corSymm(,~1|Subject),
+ data=dental)
 > summary(y)
Linear mixed-effects model fit by REML
  Data: dental
        AIC      BIC    logLik
   450.1706 481.9033 -213.0853

Random effects:
  Formula: ~1 | Subject
         (Intercept) Residual
StdDev:    1.827112 1.375353

Correlation Structure: General
  Formula: ~1 | Subject
  Parameter estimate(s):
  Correlation:
   1      2      3
2 -0.174
3 -0.002 -0.177
4 -0.342  0.306  0.227
Fixed effects: distance ~ Sex * age
                   Value Std.Error DF   t-value p-value
(Intercept)   15.932927 0.9978753 79 15.966852  0.0000
SexFemale      1.473662 1.5633701 25  0.942619  0.3549
age            0.824339 0.0824260 79 10.000962  0.0000
SexFemale:age -0.348100 0.1291367 79 -2.695594  0.0086
  Correlation:
               (Intr) SexFml age
SexFemale     -0.638
age           -0.874  0.558
SexFemale:age  0.558 -0.874 -0.638

Standardized Within-Group Residuals:
          Min           Q1          Med           Q3          Max
-3.180968424 -0.544002662  0.001195789  0.495264398  3.730242930

Number of Observations: 108
Number of Groups: 27

DAVE: Now, in addition to earlier parameters, we have an unstructured 
covariance matrix for repeated measures.  By all indications this is a 
*worse* fitting model...

Finally, I wonder whether SAS is fitting random-effects at all (no idea, 
just a guess).  If so, you might check it relative to a gls() fit, a la:

 > y2=gls(distance~Sex*age, corr=corSymm(,~1|Subject),
+ data=dental)
 > summary(y2)
Generalized least squares fit by REML
   Model: distance ~ Sex * age
   Data: dental
        AIC      BIC    logLik
   448.1706 477.2589 -213.0853

Correlation Structure: General
  Formula: ~1 | Subject
  Parameter estimate(s):
  Correlation:
   1     2     3
2 0.575
3 0.638 0.574
4 0.515 0.749 0.721

Coefficients:
                   Value Std.Error   t-value p-value
(Intercept)   15.932911 0.9978729 15.966874  0.0000
SexFemale      1.473679 1.5633664  0.942632  0.3481
age            0.824340 0.0824258 10.001000  0.0000
SexFemale:age -0.348102 0.1291364 -2.695611  0.0082

  Correlation:
               (Intr) SexFml age
SexFemale     -0.638
age           -0.874  0.558
SexFemale:age  0.558 -0.874 -0.638

Standardized residuals:
         Min          Q1         Med          Q3         Max
-2.41707752 -0.64439706 -0.07388955  0.58480683  2.26288228

Residual standard error: 2.286908
Degrees of freedom: 108 total; 104 residual

So, not sure if this is helpful, but perhaps you could provide 
additional output / commentary on what exactly SAS is doing (beyond 
anova table).

cheers, Dave

Greetings,

I have been working on R for some time now and I have begun the endeavor of
trying to replicate some SAS code in R.  I have scoured the forums but
haven't been able to find an answer.  I hope one of you could be so kind as
to enlighten me.

I am attempting to replicate a repeated measures experiment using some
standard data.  I have posted the SAS code and output directly from a
publication as well as my attempts in R to replicate it.  My main issue
comes with the 'unstructured' component.

The 'dental' dataset from 'mixedQF' package,
equivalent to formixed data in SAS

     distance age Subject    Sex
1       26.0   8     M01   Male
2       25.0  10     M01   Male
3       29.0  12     M01   Male
4       31.0  14     M01   Male
5       21.5   8     M02   Male
6       22.5  10     M02   Male
7       23.0  12     M02   Male
8       26.5  14     M02   Male
9       23.0   8     M03   Male
10      22.5  10     M03   Male
11      24.0  12     M03   Male
12      27.5  14     M03   Male
13      25.5   8     M04   Male
14      27.5  10     M04   Male
15      26.5  12     M04   Male
16      27.0  14     M04   Male
17      20.0   8     M05   Male
18      23.5  10     M05   Male
19      22.5  12     M05   Male
20      26.0  14     M05   Male
21      24.5   8     M06   Male
22      25.5  10     M06   Male
23      27.0  12     M06   Male
24      28.5  14     M06   Male
25      22.0   8     M07   Male
26      22.0  10     M07   Male
27      24.5  12     M07   Male
28      26.5  14     M07   Male
29      24.0   8     M08   Male
30      21.5  10     M08   Male
31      24.5  12     M08   Male
32      25.5  14     M08   Male
33      23.0   8     M09   Male
34      20.5  10     M09   Male
35      31.0  12     M09   Male
36      26.0  14     M09   Male
37      27.5   8     M10   Male
38      28.0  10     M10   Male
39      31.0  12     M10   Male
40      31.5  14     M10   Male
41      23.0   8     M11   Male
42      23.0  10     M11   Male
43      23.5  12     M11   Male
44      25.0  14     M11   Male
45      21.5   8     M12   Male
46      23.5  10     M12   Male
47      24.0  12     M12   Male
48      28.0  14     M12   Male
49      17.0   8     M13   Male
50      24.5  10     M13   Male
51      26.0  12     M13   Male
52      29.5  14     M13   Male
53      22.5   8     M14   Male
54      25.5  10     M14   Male
55      25.5  12     M14   Male
56      26.0  14     M14   Male
57      23.0   8     M15   Male
58      24.5  10     M15   Male
59      26.0  12     M15   Male
60      30.0  14     M15   Male
61      22.0   8     M16   Male
62      21.5  10     M16   Male
63      23.5  12     M16   Male
64      25.0  14     M16   Male
65      21.0   8     F01 Female
66      20.0  10     F01 Female
67      21.5  12     F01 Female
68      23.0  14     F01 Female
69      21.0   8     F02 Female
70      21.5  10     F02 Female
71      24.0  12     F02 Female
72      25.5  14     F02 Female
73      20.5   8     F03 Female
74      24.0  10     F03 Female
75      24.5  12     F03 Female
76      26.0  14     F03 Female
77      23.5   8     F04 Female
78      24.5  10     F04 Female
79      25.0  12     F04 Female
80      26.5  14     F04 Female
81      21.5   8     F05 Female
82      23.0  10     F05 Female
83      22.5  12     F05 Female
84      23.5  14     F05 Female
85      20.0   8     F06 Female
86      21.0  10     F06 Female
87      21.0  12     F06 Female
88      22.5  14     F06 Female
89      21.5   8     F07 Female
90      22.5  10     F07 Female
91      23.0  12     F07 Female
92      25.0  14     F07 Female
93      23.0   8     F08 Female
94      23.0  10     F08 Female
95      23.5  12     F08 Female
96      24.0  14     F08 Female
97      20.0   8     F09 Female
98      21.0  10     F09 Female
99      22.0  12     F09 Female
100     21.5  14     F09 Female
101     16.5   8     F10 Female
102     19.0  10     F10 Female
103     19.0  12     F10 Female
104     19.5  14     F10 Female
105     24.5   8     F11 Female
106     25.0  10     F11 Female
107     28.0  12     F11 Female
108     28.0  14     F11 Female

*Mixed modeling and fixed effect test*
SAS
proc mixed data=formixed;
class gender age person;
model y = gender|age;
repeated / type=cs sub=person;
run;

output of interest to me
           Tests of Fixed Effects
Source             NDF   DDF    Type III F    Pr > F
GENDER           1        25        9.29        0.0054
AGE                  3        75       35.35       0.0001
GENDER*AGE   3        75        2.36        0.0781

R (nlme package)
y=lme(distance~Sex*age, random=(~1|Subject), data=dental)
anova(y)

             numDF denDF  F-value p-value
(Intercept)     1    75 4123.156  <.0001
Sex              1    25    9.292  0.0054
age               3    75   40.032  <.0001
Sex:age        3    75    2.362  0.0781

Now this isn't exact but it is extremely close, however when I try to
replicate the unstructured,

SAS
proc mixed data=formixed;
class gender age person;
model y = gender|age;
repeated / type=un sub=person;
run;

              Tests of Fixed Effects
Source          NDF DDF Type III F Pr > F
GENDER         1    25     9.29    0.0054
AGE                3    25    34.45   0.0001
GENDER*AGE 3    25     2.93    0.0532

R
either
y=lme(distance~Sex*age, random=(~1|Subject), corr=corSymm(,~1|Subject),
data=dental)
anova(y)
or
z=lme(distance~Sex*age, random=(~1|Subject), corr=corSymm(), data=dental)
anova(z)

gives the output

             numDF denDF  F-value    p-value
(Intercept)     1    75     4052.028  <.0001
Sex              1    25       8.462      0.0075
age               3    75      39.022    <.0001
Sex:age        3    75       2.868      0.0421

What am I doing wrong to replicate the unstructured linear mixed model from
SAS?

Regards,

Charles
Dave Atkins, PhD
Research Associate Professor
Department of Psychiatry and Behavioral Science
University of Washington
datkins at u.washington.edu

Center for the Study of Health and Risk Behaviors (CSHRB)		
1100 NE 45th Street, Suite 300 	
Seattle, WA  98105 	
206-616-3879 	
http://depts.washington.edu/cshrb/
(Mon-Wed)	

Center for Healthcare Improvement, for Addictions, Mental Illness,
   Medically Vulnerable Populations (CHAMMP)
325 9th Avenue, 2HH-15
Box 359911
Seattle, WA 98104
http://www.chammp.org
(Thurs)

Greetings,

I have been working on R for some time now and I have begun the endeavor of
trying to replicate some SAS code in R.  I have scoured the forums but

This is also the Orthodont dataset, distributed with nlme.

As David Atkins pointed out, R defaults to Type I SS. so you would need to 
use, for example, the Anova() command from the car package.  The other 
thing is that the SAS F statistics are only approximate, depending on 
which covariance structure is chosen (perhaps John Maindonald or someone 
clever could comment), so SAS offers different possibilities for ddf eg

http://www2.sas.com/proceedings/sugi26/p262-26.pdf

while lme and lmer offer one or none.
| David Duffy (MBBS PhD)                                         ,-_|\
| email: davidD at qimr.edu.au  ph: INT+61+7+3362-0217 fax: -0101  /     *
| Epidemiology Unit, Queensland Institute of Medical Research   \_,-._/
| 300 Herston Rd, Brisbane, Queensland 4029, Australia  GPG 4D0B994A v
It is well to note that type III sums of squares are problematic.
For testing the effects of a main effect, the null model is constraining
the main effect in a manner that depends on the parameterisation.

There are situations where it makes sense to fit interactions without
main effects, and it is clear what constraint on the main effect is the
relevant null (with an interaction between a factor and a variable,
does one want all lines to go though the same point, or through
perhaps the origin?), but that situation is unusual.  For lines that 
are separate or all through the one point, one does not need 
type III sums of squares.

Analyses often or frequently have enough genuine complications 
worrying (unless it is blindingly obvious that one ought to worry
about it) without the rarely relevant complication of attending to a 
type III sum of squares.  

I'd guess that SAS and lme are, effectively, making different
assumptions about the intended generalisation.  They are
clearly using different denominator degrees of freedom for F.
As one is looking for consistency across the 27 different youths,
SAS's denominator degrees of freedom for the interaction seem 
more or less right, pretty much equivalent to calculating slopes 
for females and slopes for males and using a t-test to compare 
them.  (Sure, in the analyses presented, age has been treated 
as a categorical variable, but the comment still applies.)

John Maindonald             email: john.maindonald at anu.edu.au
phone : +61 2 (6125)3473    fax  : +61 2(6125)5549
Centre for Mathematics & Its Applications, Room 1194,
John Dedman Mathematical Sciences Building (Building 27)
Australian National University, Canberra ACT 0200.
http://www.maths.anu.edu.au/~johnm

On Tue, 24 Jan 2012, Charles Determan Jr wrote:

Greetings,

I have been working on R for some time now and I have begun the endeavor of
trying to replicate some SAS code in R.  I have scoured the forums but

This is also the Orthodont dataset, distributed with nlme.

As David Atkins pointed out, R defaults to Type I SS. so you would need to use, for example, the Anova() command from the car package.  The other thing is that the SAS F statistics are only approximate, depending on which covariance structure is chosen (perhaps John Maindonald or someone clever could comment), so SAS offers different possibilities for ddf eg

http://www2.sas.com/proceedings/sugi26/p262-26.pdf

while lme and lmer offer one or none.

-- 
| David Duffy (MBBS PhD)                                         ,-_|\
| email: davidD at qimr.edu.au  ph: INT+61+7+3362-0217 fax: -0101  /     *
| Epidemiology Unit, Queensland Institute of Medical Research   \_,-._/
| 300 Herston Rd, Brisbane, Queensland 4029, Australia  GPG 4D0B994A v

_______________________________________________
R-sig-mixed-models at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <https://stat.ethz.ch/pipermail/r-sig-mixed-models/attachments/20120126/f2c20fbd/attachment-0001.pl>
I am unfamiliar with this critique of Type III SS. Can you point me to a reference discussing the difficulties with Type III SS?

-----Original Message-----
From: r-sig-mixed-models-bounces at r-project.org [mailto:r-sig-mixed-models-bounces at r-project.org] On Behalf Of John Maindonald
Sent: Wednesday, January 25, 2012 11:19 PM
To: David Duffy
Cc: r-sig-mixed-models at r-project.org
Subject: Re: [R-sig-ME] Trouble Replicating Unstructured Mixed Procedure in R

It is well to note that type III sums of squares are problematic.
For testing the effects of a main effect, the null model is constraining
the main effect in a manner that depends on the parameterisation.

There are situations where it makes sense to fit interactions without
main effects, and it is clear what constraint on the main effect is the
relevant null (with an interaction between a factor and a variable,
does one want all lines to go though the same point, or through
perhaps the origin?), but that situation is unusual.  For lines that 
are separate or all through the one point, one does not need 
type III sums of squares.

Analyses often or frequently have enough genuine complications 
worrying (unless it is blindingly obvious that one ought to worry
about it) without the rarely relevant complication of attending to a 
type III sum of squares.  

I'd guess that SAS and lme are, effectively, making different
assumptions about the intended generalisation.  They are
clearly using different denominator degrees of freedom for F.
As one is looking for consistency across the 27 different youths,
SAS's denominator degrees of freedom for the interaction seem 
more or less right, pretty much equivalent to calculating slopes 
for females and slopes for males and using a t-test to compare 
them.  (Sure, in the analyses presented, age has been treated 
as a categorical variable, but the comment still applies.)

John Maindonald             email: john.maindonald at anu.edu.au
phone : +61 2 (6125)3473    fax  : +61 2(6125)5549
Centre for Mathematics & Its Applications, Room 1194,
John Dedman Mathematical Sciences Building (Building 27)
Australian National University, Canberra ACT 0200.
http://www.maths.anu.edu.au/~johnm

On Tue, 24 Jan 2012, Charles Determan Jr wrote:

Greetings,

I have been working on R for some time now and I have begun the endeavor of
trying to replicate some SAS code in R.  I have scoured the forums but

This is also the Orthodont dataset, distributed with nlme.

As David Atkins pointed out, R defaults to Type I SS. so you would need to use, for example, the Anova() command from the car package.  The other thing is that the SAS F statistics are only approximate, depending on which covariance structure is chosen (perhaps John Maindonald or someone clever could comment), so SAS offers different possibilities for ddf eg

http://www2.sas.com/proceedings/sugi26/p262-26.pdf

while lme and lmer offer one or none.

-- 
| David Duffy (MBBS PhD)                                         ,-_|\
| email: davidD at qimr.edu.au  ph: INT+61+7+3362-0217 fax: -0101  /     *
| Epidemiology Unit, Queensland Institute of Medical Research   \_,-._/
| 300 Herston Rd, Brisbane, Queensland 4029, Australia  GPG 4D0B994A v

_______________________________________________
R-sig-mixed-models at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
_______________________________________________
R-sig-mixed-models at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models

-----------------------------------------------------------------------
Confidentiality Notice: This e-mail message, including any attachments,
is for the sole use of the intended recipient(s) and may contain
privileged and confidential information.  Any unauthorized review, use,
disclosure or distribution is prohibited.  If you are not the intended
recipient, please contact the sender by reply e-mail and destroy
all copies of the original message.
I think:

http://www.stats.ox.ac.uk/pub/MASS3/Exegeses.pdf

HTH
Luca

Le 26/01/2012 15:52, Thompson,Paul a ?crit :
I am unfamiliar with this critique of Type III SS. Can you point me to a reference discussing the difficulties with Type III SS?

-----Original Message-----
From: r-sig-mixed-models-bounces at r-project.org [mailto:r-sig-mixed-models-bounces at r-project.org] On Behalf Of John Maindonald
Sent: Wednesday, January 25, 2012 11:19 PM
To: David Duffy
Cc: r-sig-mixed-models at r-project.org
Subject: Re: [R-sig-ME] Trouble Replicating Unstructured Mixed Procedure in R

It is well to note that type III sums of squares are problematic.
For testing the effects of a main effect, the null model is constraining
the main effect in a manner that depends on the parameterisation.

There are situations where it makes sense to fit interactions without
main effects, and it is clear what constraint on the main effect is the
relevant null (with an interaction between a factor and a variable,
does one want all lines to go though the same point, or through
perhaps the origin?), but that situation is unusual.  For lines that
are separate or all through the one point, one does not need
type III sums of squares.

Analyses often or frequently have enough genuine complications
worrying (unless it is blindingly obvious that one ought to worry
about it) without the rarely relevant complication of attending to a
type III sum of squares.

I'd guess that SAS and lme are, effectively, making different
assumptions about the intended generalisation.  They are
clearly using different denominator degrees of freedom for F.
As one is looking for consistency across the 27 different youths,
SAS's denominator degrees of freedom for the interaction seem
more or less right, pretty much equivalent to calculating slopes
for females and slopes for males and using a t-test to compare
them.  (Sure, in the analyses presented, age has been treated
as a categorical variable, but the comment still applies.)

John Maindonald             email: john.maindonald at anu.edu.au
phone : +61 2 (6125)3473    fax  : +61 2(6125)5549
Centre for Mathematics&  Its Applications, Room 1194,
John Dedman Mathematical Sciences Building (Building 27)
Australian National University, Canberra ACT 0200.
http://www.maths.anu.edu.au/~johnm

On 26/01/2012, at 1:54 PM, David Duffy wrote:

On Tue, 24 Jan 2012, Charles Determan Jr wrote:

Greetings,

I have been working on R for some time now and I have begun the endeavor of
trying to replicate some SAS code in R.  I have scoured the forums but

This is also the Orthodont dataset, distributed with nlme.

As David Atkins pointed out, R defaults to Type I SS. so you would need to use, for example, the Anova() command from the car package.  The other thing is that the SAS F statistics are only approximate, depending on which covariance structure is chosen (perhaps John Maindonald or someone clever could comment), so SAS offers different possibilities for ddf eg

http://www2.sas.com/proceedings/sugi26/p262-26.pdf

while lme and lmer offer one or none.

-- 
| David Duffy (MBBS PhD)                                         ,-_|\
| email: davidD at qimr.edu.au  ph: INT+61+7+3362-0217 fax: -0101  /     *
| Epidemiology Unit, Queensland Institute of Medical Research   \_,-._/
| 300 Herston Rd, Brisbane, Queensland 4029, Australia  GPG 4D0B994A v

_______________________________________________
R-sig-mixed-models at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
_______________________________________________
R-sig-mixed-models at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models

-----------------------------------------------------------------------
Confidentiality Notice: This e-mail message, including any attachments,
is for the sole use of the intended recipient(s) and may contain
privileged and confidential information.  Any unauthorized review, use,
disclosure or distribution is prohibited.  If you are not the intended
recipient, please contact the sender by reply e-mail and destroy
all copies of the original message.

_______________________________________________
R-sig-mixed-models at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models

OK, I've looked at that reference. 

There are 2 aspects of an estimate like a SS. The first is the stability of the estimate, and the second is the interpretation of the estimate. The issues with the interpretation of the different estimates go back to 1970, and they are simply a matter of interpretation. The point of the Venables discussion is that he does not like Type III SS, not that they are wrong. He does not agree with the interpretation.

The issue here is the accuracy of the Type III or Type I or Type II or whatever. Accuracy comes before interpretation. If the r module and SAS do not arrive at the same estimates, that is an important thing. 

Once we agree upon computation, we can argue about interpretation. Charles Determan is inquiring as to computational accuracy. The use and interpretation of the various Type I, II, III, IV, LVX SS are secondary.

-----Original Message-----
From: r-sig-mixed-models-bounces at r-project.org [mailto:r-sig-mixed-models-bounces at r-project.org] On Behalf Of Luca Borger
Sent: Thursday, January 26, 2012 9:03 AM
To: r-sig-mixed-models at r-project.org
Subject: Re: [R-sig-ME] Trouble Replicating Unstructured Mixed Procedure in R

I think:

http://www.stats.ox.ac.uk/pub/MASS3/Exegeses.pdf

HTH
Luca

Le 26/01/2012 15:52, Thompson,Paul a ?crit :
I am unfamiliar with this critique of Type III SS. Can you point me to a reference discussing the difficulties with Type III SS?

-----Original Message-----
From: r-sig-mixed-models-bounces at r-project.org [mailto:r-sig-mixed-models-bounces at r-project.org] On Behalf Of John Maindonald
Sent: Wednesday, January 25, 2012 11:19 PM
To: David Duffy
Cc: r-sig-mixed-models at r-project.org
Subject: Re: [R-sig-ME] Trouble Replicating Unstructured Mixed Procedure in R

It is well to note that type III sums of squares are problematic.
For testing the effects of a main effect, the null model is constraining
the main effect in a manner that depends on the parameterisation.

There are situations where it makes sense to fit interactions without
main effects, and it is clear what constraint on the main effect is the
relevant null (with an interaction between a factor and a variable,
does one want all lines to go though the same point, or through
perhaps the origin?), but that situation is unusual.  For lines that
are separate or all through the one point, one does not need
type III sums of squares.

Analyses often or frequently have enough genuine complications
worrying (unless it is blindingly obvious that one ought to worry
about it) without the rarely relevant complication of attending to a
type III sum of squares.

I'd guess that SAS and lme are, effectively, making different
assumptions about the intended generalisation.  They are
clearly using different denominator degrees of freedom for F.
As one is looking for consistency across the 27 different youths,
SAS's denominator degrees of freedom for the interaction seem
more or less right, pretty much equivalent to calculating slopes
for females and slopes for males and using a t-test to compare
them.  (Sure, in the analyses presented, age has been treated
as a categorical variable, but the comment still applies.)

John Maindonald             email: john.maindonald at anu.edu.au
phone : +61 2 (6125)3473    fax  : +61 2(6125)5549
Centre for Mathematics&  Its Applications, Room 1194,
John Dedman Mathematical Sciences Building (Building 27)
Australian National University, Canberra ACT 0200.
http://www.maths.anu.edu.au/~johnm

On 26/01/2012, at 1:54 PM, David Duffy wrote:

On Tue, 24 Jan 2012, Charles Determan Jr wrote:

Greetings,

I have been working on R for some time now and I have begun the endeavor of
trying to replicate some SAS code in R.  I have scoured the forums but

This is also the Orthodont dataset, distributed with nlme.

As David Atkins pointed out, R defaults to Type I SS. so you would need to use, for example, the Anova() command from the car package.  The other thing is that the SAS F statistics are only approximate, depending on which covariance structure is chosen (perhaps John Maindonald or someone clever could comment), so SAS offers different possibilities for ddf eg

http://www2.sas.com/proceedings/sugi26/p262-26.pdf

while lme and lmer offer one or none.

-- 
| David Duffy (MBBS PhD)                                         ,-_|\
| email: davidD at qimr.edu.au  ph: INT+61+7+3362-0217 fax: -0101  /     *
| Epidemiology Unit, Queensland Institute of Medical Research   \_,-._/
| 300 Herston Rd, Brisbane, Queensland 4029, Australia  GPG 4D0B994A v

_______________________________________________
R-sig-mixed-models at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
_______________________________________________
R-sig-mixed-models at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models

-----------------------------------------------------------------------
Confidentiality Notice: This e-mail message, including any attachments,
is for the sole use of the intended recipient(s) and may contain
privileged and confidential information.  Any unauthorized review, use,
disclosure or distribution is prohibited.  If you are not the intended
recipient, please contact the sender by reply e-mail and destroy
all copies of the original message.

_______________________________________________
R-sig-mixed-models at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models

_______________________________________________
R-sig-mixed-models at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models

-----------------------------------------------------------------------
Confidentiality Notice: This e-mail message, including any attachments,
is for the sole use of the intended recipient(s) and may contain
privileged and confidential information.  Any unauthorized review, use,
disclosure or distribution is prohibited.  If you are not the intended
recipient, please contact the sender by reply e-mail and destroy
all copies of the original message.
It is not really a matter of computational accuracy.  One can get highly
accurate values for an inappropriate statistic.  

Or if there is insistence on using the word, accuracy, what is the 
meaning?

i) the wrong formula is used?  Then in what sense is it 'wrong'?

ii) there is a numerical inaccuracy in the calculation?  This is almost
never an issue in a relatively simple calculation such as this, given
the care taken by the code writers in such matters.

iii) where an approximation is used, as in using an F-distribution
approximation, is the best choice of degrees of freedom  made to
for use of this approximation?  I judge that the degrees of freedom 
for lme's F-statistic for the interaction are not well chosen.  Users
really have to sort this out for themselves, rather than relying on
what may be a fairly wild approximation that appears in lm's
output.  Using 75df rather than 25df does not however make the 
difference that a choice between (e.g.) 5df and 25df would.

A further and more basic issue is whether the statistic that is 
provided is appropriate to the intended generalisation.  I'd take
this to be generalisation to another sample of youths from the
same population.  In order to understand why R and SAS are 
giving different F-statistics for the interaction, one needs to
understand just what variance-covariance structure is assumed
in each case.  One might extract the two estimates of the 
var-cov structure and compare them. Look for terms in one that
do not appear, or maybe that are zero, in the other.

Finally, it is not just that Venables does not like type III SS.
He is saying that they almost never correspond to a null
hypothesis that makes any sense.  Those who disagree try to 
write down the model to which the null hypothesis corresponds
in testing for the main effect of factor1 with a factor1:factor2
interaction.

John Maindonald             email: john.maindonald at anu.edu.au
phone : +61 2 (6125)3473    fax  : +61 2(6125)5549
Centre for Mathematics & Its Applications, Room 1194,
John Dedman Mathematical Sciences Building (Building 27)
Australian National University, Canberra ACT 0200.
http://www.maths.anu.edu.au/~johnm

OK, I've looked at that reference. 

There are 2 aspects of an estimate like a SS. The first is the stability of the estimate, and the second is the interpretation of the estimate. The issues with the interpretation of the different estimates go back to 1970, and they are simply a matter of interpretation. The point of the Venables discussion is that he does not like Type III SS, not that they are wrong. He does not agree with the interpretation.

The issue here is the accuracy of the Type III or Type I or Type II or whatever. Accuracy comes before interpretation. If the r module and SAS do not arrive at the same estimates, that is an important thing. 

Once we agree upon computation, we can argue about interpretation. Charles Determan is inquiring as to computational accuracy. The use and interpretation of the various Type I, II, III, IV, LVX SS are secondary.

-----Original Message-----
From: r-sig-mixed-models-bounces at r-project.org [mailto:r-sig-mixed-models-bounces at r-project.org] On Behalf Of Luca Borger
Sent: Thursday, January 26, 2012 9:03 AM
To: r-sig-mixed-models at r-project.org
Subject: Re: [R-sig-ME] Trouble Replicating Unstructured Mixed Procedure in R

I think:

http://www.stats.ox.ac.uk/pub/MASS3/Exegeses.pdf

HTH
Luca

Le 26/01/2012 15:52, Thompson,Paul a ?crit :
I am unfamiliar with this critique of Type III SS. Can you point me to a reference discussing the difficulties with Type III SS?

-----Original Message-----
From: r-sig-mixed-models-bounces at r-project.org [mailto:r-sig-mixed-models-bounces at r-project.org] On Behalf Of John Maindonald
Sent: Wednesday, January 25, 2012 11:19 PM
To: David Duffy
Cc: r-sig-mixed-models at r-project.org
Subject: Re: [R-sig-ME] Trouble Replicating Unstructured Mixed Procedure in R

It is well to note that type III sums of squares are problematic.
For testing the effects of a main effect, the null model is constraining
the main effect in a manner that depends on the parameterisation.

There are situations where it makes sense to fit interactions without
main effects, and it is clear what constraint on the main effect is the
relevant null (with an interaction between a factor and a variable,
does one want all lines to go though the same point, or through
perhaps the origin?), but that situation is unusual.  For lines that
are separate or all through the one point, one does not need
type III sums of squares.

Analyses often or frequently have enough genuine complications
worrying (unless it is blindingly obvious that one ought to worry
about it) without the rarely relevant complication of attending to a
type III sum of squares.

I'd guess that SAS and lme are, effectively, making different
assumptions about the intended generalisation.  They are
clearly using different denominator degrees of freedom for F.
As one is looking for consistency across the 27 different youths,
SAS's denominator degrees of freedom for the interaction seem
more or less right, pretty much equivalent to calculating slopes
for females and slopes for males and using a t-test to compare
them.  (Sure, in the analyses presented, age has been treated
as a categorical variable, but the comment still applies.)

John Maindonald             email: john.maindonald at anu.edu.au
phone : +61 2 (6125)3473    fax  : +61 2(6125)5549
Centre for Mathematics&  Its Applications, Room 1194,
John Dedman Mathematical Sciences Building (Building 27)
Australian National University, Canberra ACT 0200.
http://www.maths.anu.edu.au/~johnm

On 26/01/2012, at 1:54 PM, David Duffy wrote:

On Tue, 24 Jan 2012, Charles Determan Jr wrote:

Greetings,

I have been working on R for some time now and I have begun the endeavor of
trying to replicate some SAS code in R.  I have scoured the forums but

This is also the Orthodont dataset, distributed with nlme.

As David Atkins pointed out, R defaults to Type I SS. so you would need to use, for example, the Anova() command from the car package.  The other thing is that the SAS F statistics are only approximate, depending on which covariance structure is chosen (perhaps John Maindonald or someone clever could comment), so SAS offers different possibilities for ddf eg

http://www2.sas.com/proceedings/sugi26/p262-26.pdf

while lme and lmer offer one or none.

-- 
| David Duffy (MBBS PhD)                                         ,-_|\
| email: davidD at qimr.edu.au  ph: INT+61+7+3362-0217 fax: -0101  /     *
| Epidemiology Unit, Queensland Institute of Medical Research   \_,-._/
| 300 Herston Rd, Brisbane, Queensland 4029, Australia  GPG 4D0B994A v

_______________________________________________
R-sig-mixed-models at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
_______________________________________________
R-sig-mixed-models at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models

-----------------------------------------------------------------------
Confidentiality Notice: This e-mail message, including any attachments,
is for the sole use of the intended recipient(s) and may contain
privileged and confidential information.  Any unauthorized review, use,
disclosure or distribution is prohibited.  If you are not the intended
recipient, please contact the sender by reply e-mail and destroy
all copies of the original message.

_______________________________________________
R-sig-mixed-models at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models

_______________________________________________
R-sig-mixed-models at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models

-----------------------------------------------------------------------
Confidentiality Notice: This e-mail message, including any attachments,
is for the sole use of the intended recipient(s) and may contain
privileged and confidential information.  Any unauthorized review, use,
disclosure or distribution is prohibited.  If you are not the intended
recipient, please contact the sender by reply e-mail and destroy
all copies of the original message.

_______________________________________________
R-sig-mixed-models at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <https://stat.ethz.ch/pipermail/r-sig-mixed-models/attachments/20120126/26eaf876/attachment-0001.pl>
An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <https://stat.ethz.ch/pipermail/r-sig-mixed-models/attachments/20120126/75056ae5/attachment-0001.pl>
I've twigged that "unstructured" means a variance-covariance matrix 
that ignores the time structure, i.e., each element is estimated 
separately.  One can do this in lme4, thus:

Orthodont$Age <- factor(Orthodont$age)
orth.lmer <- lmer(distance ~ Sex*Age+((Sex*Age)|Subject), 
+ data=Orthodont)
anova(orth.lmer)
Analysis of Variance Table
        Df Sum Sq Mean Sq F value
Sex      1  9.591  9.5905 33.9645
Age      3 34.876 11.6252 41.1703
Sex:Age  3  2.930  0.9768  3.4594

The sum of squares for the Sex*Age interaction agrees with that from SAS.
lmer() leaves the user to work out an appropriate df for the F-statistic.  One
has, in agreement with the SAS output:
1-pf(2.93, 3, 25)
[1] 0.05318945

The main effects SS and F's are of course not expected to agree and 
indeed, in my strong view, true SAS type III SS's are inappropriate.

I regard this R analysis however as an inelegant use of the data, with low
power.  My book with John Braun (3rd edn and if I recall correctly, 2nd)
has an analysis that I am prepared to defend that uses lmer() from lme4.
Several years ago, I placed on the web a version of the analysis that uses
 lme() from nlme:
   http://maths.anu.edu.au/~johnm/r-book/xtras/mlm-lme.pdf
I'd forgotten that it was there.

This is a list for R users.  You will not necessarily find folk here with a high
level of SAS expertise, maybe not even enough to understand what the
SAS documentation means when it uses the term "unstructured".  Charles,
I think you've done pretty well in the amount of free advice that you have
received.  I think too that Steven McKinney's point is well made.  Kevin's
interpretation of the EULA is probably more or less correct, but the 
document comes across rather more ferociously than he suggests, and it
is not a good starting point for scientific assessment of the respective 
methodologies.  Many of us have other reasons why we are not very
interested in SAS does.  The implied threat in the EULA provides, even
if we are not party to any such agreement, just one more reason why SAS
is not to any large extent in our path ahead to the future.

John Maindonald             email: john.maindonald at anu.edu.au
phone : +61 2 (6125)3473    fax  : +61 2(6125)5549
Centre for Mathematics & Its Applications, Room 1194,
John Dedman Mathematical Sciences Building (Building 27)
Australian National University, Canberra ACT 0200.
http://www.maths.anu.edu.au/~johnm

Dear Charles,

First, I hope you are not put off by the tone of some the responses in this
email thread.  Sometimes people have strong opinions and it might come
across aggressively.

Second, be sure to understand that reproducing a SAS analysis with lme in
no way violates any legal agreements that SAS may have, if for no other
reason than you never signed an agreement with SAS!  That bit in the EULA
about decompiling and reverse engineering means that people are prohibited
from creating a new version of PROC MIXED that does the same thing.  The
nlme package uses different methods than SAS. E.g. different optimizers,
even uses a log-parameterization deep in the code so that negative variance
components cannot happen.

Third, by now you've probably figured out that PROC MIXED and lme have very
different ideas about degrees of freedom.  Also, the loglikelihoods are on
different scales.  For that reason, when I try to reproduce an analysis, I
find the best way to compare is to look at the variance components.

Here is what PROC MIXED says:

   Cov Parm Estimate     Std Error       Z  Pr > |Z|
UN(1,1)     5.41545455    1.53172185    3.54    0.0004
UN(2,1)     2.71681818    1.09623989    2.48    0.0132
UN(2,2)     4.18477273    1.18363247    3.54    0.0004
UN(3,1)     3.91022727    1.41775367    2.76    0.0058
UN(3,2)     2.92715909    1.19304751    2.45    0.0141
UN(3,3)     6.45573864    1.82595863    3.54    0.0004
UN(4,1)     2.71022727    1.17209851    2.31    0.0208
UN(4,2)     3.31715909    1.12903016    2.94    0.0033
UN(4,3)     4.13073864    1.40356157    2.94    0.0033
UN(4,4)     4.98573864    1.41017984    3.54    0.0004
Residual         1.00000000 .       .         .

That's our target, and here is the lme code to get us there.

library(nlme)
orth <- as.data.frame(Orthodont)
m1 <- gls(distance ~ Sex*age,
         correlation=corSymm(form = ~ 1 | Subject),
         weights = varIdent(form = ~ 1 | age),
         data = orth)
cors <- corMatrix(m1$modelStruct$corStruct)[[1]]
vars <- c(1.0000000, 0.8788793, 1.0744596, 0.9586878)^2 * 2.329213^2
covs <- outer(vars,vars, function(x,y) sqrt(x)*sqrt(y))
cors*covs

Now, some explanations will surely help, so let's step through the code
with comments.

The Orthodont data is a groupedData object, which can be helpful, but
sometimes confusing, so coercing to a data.frame removes the formula--you
can see it with 'formula(Orthodont)'.

orth <- as.data.frame(Orthodont)

Since there is no "random" line in the PROC MIXED code, we don't want to
use 'lme' (also why I removed the formula from the data), but instead use
'gls' for generalized least squares.  The 'corSymm' part specifies a
symmetric correlation matrix with 1 on the diagonal.  Looking at the MIXED
parameters, the results are given a covariances, not correlations.  Note
how the variances UN(1,1), UN(2,2,), etc are different, not constant along
the diagonal.  We use the 'weights' statement to specify different stratum
variances.

m1 <- gls(distance ~ Sex*age,
         correlation=corSymm(form = ~ 1 | Subject),
         weights = varIdent(form = ~ 1 | age),
         data = orth)

Now we have to convert the correlations and standard deviations of lme into
covariance parameters of MIXED.

First, extract the correlation matrix from lme.

cors <- corMatrix(m1$modelStruct$corStruct)[[1]]

Now, hard-code the stratum std deviations and square them to get
variances.  Multiply by the square of the residual std dev.  (There's a way
to extract all these values from the fitted model instead of hand-typing
them, but I can't find them at the moment.)

vars <- c(1.0000000, 0.8788793, 1.0744596, 0.9586878)^2 * 2.329213^2

Now create a matrix of variances and covariances.

covs <- outer(vars,vars, function(x,y) sqrt(x)*sqrt(y))

Finally, multiply the correlations by the covariances.  Let's compare
results:

PROC MIXED:

Row Col1 Col2 Col3 Col4
1 5.415 2.716 3.910 2.710
2 2.716 4.184 2.927 3.317
3 3.910 2.927 6.455 4.130
4 2.710 3.317 4.130 4.985

R> round(cors*covs,3)
     [,1]  [,2]  [,3]  [,4]
[1,] 5.425 2.709 3.841 2.715
[2,] 2.709 4.191 2.975 3.314
[3,] 3.841 2.975 6.263 4.133
[4,] 2.715 3.314 4.133 4.986

In my experience, the small differences in the results are typical for
unstructured models.

Hope this helps.

Kevin Wright

On Tue, Jan 24, 2012 at 8:32 PM, Charles Determan Jr <deter088 at umn.edu>wrote:

Greetings,

I have been working on R for some time now and I have begun the endeavor of
trying to replicate some SAS code in R.  I have scoured the forums but
haven't been able to find an answer.  I hope one of you could be so kind as
to enlighten me.

I am attempting to replicate a repeated measures experiment using some
standard data.  I have posted the SAS code and output directly from a
publication as well as my attempts in R to replicate it.  My main issue
comes with the 'unstructured' component.

The 'dental' dataset from 'mixedQF' package,
equivalent to formixed data in SAS

  distance age Subject    Sex
1       26.0   8     M01   Male
2       25.0  10     M01   Male
3       29.0  12     M01   Male
4       31.0  14     M01   Male
5       21.5   8     M02   Male
6       22.5  10     M02   Male
7       23.0  12     M02   Male
8       26.5  14     M02   Male
9       23.0   8     M03   Male
10      22.5  10     M03   Male
11      24.0  12     M03   Male
12      27.5  14     M03   Male
13      25.5   8     M04   Male
14      27.5  10     M04   Male
15      26.5  12     M04   Male
16      27.0  14     M04   Male
17      20.0   8     M05   Male
18      23.5  10     M05   Male
19      22.5  12     M05   Male
20      26.0  14     M05   Male
21      24.5   8     M06   Male
22      25.5  10     M06   Male
23      27.0  12     M06   Male
24      28.5  14     M06   Male
25      22.0   8     M07   Male
26      22.0  10     M07   Male
27      24.5  12     M07   Male
28      26.5  14     M07   Male
29      24.0   8     M08   Male
30      21.5  10     M08   Male
31      24.5  12     M08   Male
32      25.5  14     M08   Male
33      23.0   8     M09   Male
34      20.5  10     M09   Male
35      31.0  12     M09   Male
36      26.0  14     M09   Male
37      27.5   8     M10   Male
38      28.0  10     M10   Male
39      31.0  12     M10   Male
40      31.5  14     M10   Male
41      23.0   8     M11   Male
42      23.0  10     M11   Male
43      23.5  12     M11   Male
44      25.0  14     M11   Male
45      21.5   8     M12   Male
46      23.5  10     M12   Male
47      24.0  12     M12   Male
48      28.0  14     M12   Male
49      17.0   8     M13   Male
50      24.5  10     M13   Male
51      26.0  12     M13   Male
52      29.5  14     M13   Male
53      22.5   8     M14   Male
54      25.5  10     M14   Male
55      25.5  12     M14   Male
56      26.0  14     M14   Male
57      23.0   8     M15   Male
58      24.5  10     M15   Male
59      26.0  12     M15   Male
60      30.0  14     M15   Male
61      22.0   8     M16   Male
62      21.5  10     M16   Male
63      23.5  12     M16   Male
64      25.0  14     M16   Male
65      21.0   8     F01 Female
66      20.0  10     F01 Female
67      21.5  12     F01 Female
68      23.0  14     F01 Female
69      21.0   8     F02 Female
70      21.5  10     F02 Female
71      24.0  12     F02 Female
72      25.5  14     F02 Female
73      20.5   8     F03 Female
74      24.0  10     F03 Female
75      24.5  12     F03 Female
76      26.0  14     F03 Female
77      23.5   8     F04 Female
78      24.5  10     F04 Female
79      25.0  12     F04 Female
80      26.5  14     F04 Female
81      21.5   8     F05 Female
82      23.0  10     F05 Female
83      22.5  12     F05 Female
84      23.5  14     F05 Female
85      20.0   8     F06 Female
86      21.0  10     F06 Female
87      21.0  12     F06 Female
88      22.5  14     F06 Female
89      21.5   8     F07 Female
90      22.5  10     F07 Female
91      23.0  12     F07 Female
92      25.0  14     F07 Female
93      23.0   8     F08 Female
94      23.0  10     F08 Female
95      23.5  12     F08 Female
96      24.0  14     F08 Female
97      20.0   8     F09 Female
98      21.0  10     F09 Female
99      22.0  12     F09 Female
100     21.5  14     F09 Female
101     16.5   8     F10 Female
102     19.0  10     F10 Female
103     19.0  12     F10 Female
104     19.5  14     F10 Female
105     24.5   8     F11 Female
106     25.0  10     F11 Female
107     28.0  12     F11 Female
108     28.0  14     F11 Female

*Mixed modeling and fixed effect test*
SAS
proc mixed data=formixed;
class gender age person;
model y = gender|age;
repeated / type=cs sub=person;
run;

output of interest to me
        Tests of Fixed Effects
Source             NDF   DDF    Type III F    Pr > F
GENDER           1        25        9.29        0.0054
AGE                  3        75       35.35       0.0001
GENDER*AGE   3        75        2.36        0.0781

R (nlme package)
y=lme(distance~Sex*age, random=(~1|Subject), data=dental)
anova(y)

          numDF denDF  F-value p-value
(Intercept)     1    75 4123.156  <.0001
Sex              1    25    9.292  0.0054
age               3    75   40.032  <.0001
Sex:age        3    75    2.362  0.0781

Now this isn't exact but it is extremely close, however when I try to
replicate the unstructured,

SAS
proc mixed data=formixed;
class gender age person;
model y = gender|age;
repeated / type=un sub=person;
run;

           Tests of Fixed Effects
Source          NDF DDF Type III F Pr > F
GENDER         1    25     9.29    0.0054
AGE                3    25    34.45   0.0001
GENDER*AGE 3    25     2.93    0.0532

R
either
y=lme(distance~Sex*age, random=(~1|Subject), corr=corSymm(,~1|Subject),
data=dental)
anova(y)
or
z=lme(distance~Sex*age, random=(~1|Subject), corr=corSymm(), data=dental)
anova(z)

gives the output

          numDF denDF  F-value    p-value
(Intercept)     1    75     4052.028  <.0001
Sex              1    25       8.462      0.0075
age               3    75      39.022    <.0001
Sex:age        3    75       2.868      0.0421

What am I doing wrong to replicate the unstructured linear mixed model from
SAS?

Regards,

Charles

      [[alternative HTML version deleted]]

_______________________________________________
R-sig-mixed-models at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models

-- 
Kevin Wright

	[[alternative HTML version deleted]]

_______________________________________________
R-sig-mixed-models at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <https://stat.ethz.ch/pipermail/r-sig-mixed-models/attachments/20120127/b326ed53/attachment-0001.pl>
An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <https://stat.ethz.ch/pipermail/r-sig-mixed-models/attachments/20120127/ebcca832/attachment-0001.pl>
Charles,

I guess that SAS is using the Satterthwaite formula to approximate 
denominator degrees of freedom (google 'Satterthwaite'). See Kenward and 
Roger (1997) Biometrics 53 983-997 for another approach.
Thank you John, I truly appreciate all the advice I have received.  At no
point to I feel entitled for someone to provide this for me.  I do hope
that others may find this thread useful as well.  If I may just as one more
question.  Is there an accepted way to calculate the degrees of freedom in
this case to come to the value of 25?  This has been something I have been
trying to determine.

Again, thank you to all for providing me with all the information you have,

Charles

On Fri, Jan 27, 2012 at 12:10 AM, John Maindonald <
john.maindonald at anu.edu.au> wrote:

I've twigged that "unstructured" means a variance-covariance matrix
that ignores the time structure, i.e., each element is estimated
separately.  One can do this in lme4, thus:

Orthodont$Age <- factor(Orthodont$age)
orth.lmer <- lmer(distance ~ Sex*Age+((Sex*Age)|Subject),
+ data=Orthodont)
anova(orth.lmer)
Analysis of Variance Table
       Df Sum Sq Mean Sq F value
Sex      1  9.591  9.5905 33.9645
Age      3 34.876 11.6252 41.1703
Sex:Age  3  2.930  0.9768  3.4594

The sum of squares for the Sex*Age interaction agrees with that from SAS.
lmer() leaves the user to work out an appropriate df for the F-statistic.
 One
has, in agreement with the SAS output:
1-pf(2.93, 3, 25)
[1] 0.05318945

The main effects SS and F's are of course not expected to agree and
indeed, in my strong view, true SAS type III SS's are inappropriate.

I regard this R analysis however as an inelegant use of the data, with low
power.  My book with John Braun (3rd edn and if I recall correctly, 2nd)
has an analysis that I am prepared to defend that uses lmer() from lme4.
Several years ago, I placed on the web a version of the analysis that uses
 lme() from nlme:
  http://maths.anu.edu.au/~johnm/r-book/xtras/mlm-lme.pdf
I'd forgotten that it was there.

This is a list for R users.  You will not necessarily find folk here with
a high
level of SAS expertise, maybe not even enough to understand what the
SAS documentation means when it uses the term "unstructured".  Charles,
I think you've done pretty well in the amount of free advice that you have
received.  I think too that Steven McKinney's point is well made.  Kevin's
interpretation of the EULA is probably more or less correct, but the
document comes across rather more ferociously than he suggests, and it
is not a good starting point for scientific assessment of the respective
methodologies.  Many of us have other reasons why we are not very
interested in SAS does.  The implied threat in the EULA provides, even
if we are not party to any such agreement, just one more reason why SAS
is not to any large extent in our path ahead to the future.

John Maindonald             email: john.maindonald at anu.edu.au
phone : +61 2 (6125)3473    fax  : +61 2(6125)5549
Centre for Mathematics & Its Applications, Room 1194,
John Dedman Mathematical Sciences Building (Building 27)
Australian National University, Canberra ACT 0200.
http://www.maths.anu.edu.au/~johnm

On 27/01/2012, at 4:03 PM, Kevin Wright wrote:

Dear Charles,

First, I hope you are not put off by the tone of some the responses in
this
email thread.  Sometimes people have strong opinions and it might come
across aggressively.

Second, be sure to understand that reproducing a SAS analysis with lme in
no way violates any legal agreements that SAS may have, if for no other
reason than you never signed an agreement with SAS!  That bit in the EULA
about decompiling and reverse engineering means that people are
prohibited
from creating a new version of PROC MIXED that does the same thing.  The
nlme package uses different methods than SAS. E.g. different optimizers,
even uses a log-parameterization deep in the code so that negative
variance
components cannot happen.

Third, by now you've probably figured out that PROC MIXED and lme have
very
different ideas about degrees of freedom.  Also, the loglikelihoods are
on
different scales.  For that reason, when I try to reproduce an analysis,
I
find the best way to compare is to look at the variance components.

Here is what PROC MIXED says:

   Cov Parm Estimate     Std Error       Z  Pr > |Z|
UN(1,1)     5.41545455    1.53172185    3.54    0.0004
UN(2,1)     2.71681818    1.09623989    2.48    0.0132
UN(2,2)     4.18477273    1.18363247    3.54    0.0004
UN(3,1)     3.91022727    1.41775367    2.76    0.0058
UN(3,2)     2.92715909    1.19304751    2.45    0.0141
UN(3,3)     6.45573864    1.82595863    3.54    0.0004
UN(4,1)     2.71022727    1.17209851    2.31    0.0208
UN(4,2)     3.31715909    1.12903016    2.94    0.0033
UN(4,3)     4.13073864    1.40356157    2.94    0.0033
UN(4,4)     4.98573864    1.41017984    3.54    0.0004
Residual         1.00000000 .       .         .

That's our target, and here is the lme code to get us there.

library(nlme)
orth <- as.data.frame(Orthodont)
m1 <- gls(distance ~ Sex*age,
         correlation=corSymm(form = ~ 1 | Subject),
         weights = varIdent(form = ~ 1 | age),
         data = orth)
cors <- corMatrix(m1$modelStruct$corStruct)[[1]]
vars <- c(1.0000000, 0.8788793, 1.0744596, 0.9586878)^2 * 2.329213^2
covs <- outer(vars,vars, function(x,y) sqrt(x)*sqrt(y))
cors*covs

Now, some explanations will surely help, so let's step through the code
with comments.

The Orthodont data is a groupedData object, which can be helpful, but
sometimes confusing, so coercing to a data.frame removes the formula--you
can see it with 'formula(Orthodont)'.

orth <- as.data.frame(Orthodont)

Since there is no "random" line in the PROC MIXED code, we don't want to
use 'lme' (also why I removed the formula from the data), but instead use
'gls' for generalized least squares.  The 'corSymm' part specifies a
symmetric correlation matrix with 1 on the diagonal.  Looking at the
MIXED
parameters, the results are given a covariances, not correlations.  Note
how the variances UN(1,1), UN(2,2,), etc are different, not constant
along
the diagonal.  We use the 'weights' statement to specify different
stratum
variances.

m1 <- gls(distance ~ Sex*age,
         correlation=corSymm(form = ~ 1 | Subject),
         weights = varIdent(form = ~ 1 | age),
         data = orth)

Now we have to convert the correlations and standard deviations of lme
into
covariance parameters of MIXED.

First, extract the correlation matrix from lme.

cors <- corMatrix(m1$modelStruct$corStruct)[[1]]

Now, hard-code the stratum std deviations and square them to get
variances.  Multiply by the square of the residual std dev.  (There's a
way
to extract all these values from the fitted model instead of hand-typing
them, but I can't find them at the moment.)

vars <- c(1.0000000, 0.8788793, 1.0744596, 0.9586878)^2 * 2.329213^2

Now create a matrix of variances and covariances.

covs <- outer(vars,vars, function(x,y) sqrt(x)*sqrt(y))

Finally, multiply the correlations by the covariances.  Let's compare
results:

PROC MIXED:

Row Col1 Col2 Col3 Col4
1 5.415 2.716 3.910 2.710
2 2.716 4.184 2.927 3.317
3 3.910 2.927 6.455 4.130
4 2.710 3.317 4.130 4.985

R> round(cors*covs,3)
     [,1]  [,2]  [,3]  [,4]
[1,] 5.425 2.709 3.841 2.715
[2,] 2.709 4.191 2.975 3.314
[3,] 3.841 2.975 6.263 4.133
[4,] 2.715 3.314 4.133 4.986

In my experience, the small differences in the results are typical for
unstructured models.

Hope this helps.

Kevin Wright

On Tue, Jan 24, 2012 at 8:32 PM, Charles Determan Jr <deter088 at umn.edu
wrote:

Greetings,

I have been working on R for some time now and I have begun the
endeavor of
trying to replicate some SAS code in R.  I have scoured the forums but
haven't been able to find an answer.  I hope one of you could be so
kind as
to enlighten me.

I am attempting to replicate a repeated measures experiment using some
standard data.  I have posted the SAS code and output directly from a
publication as well as my attempts in R to replicate it.  My main issue
comes with the 'unstructured' component.

The 'dental' dataset from 'mixedQF' package,
equivalent to formixed data in SAS

  distance age Subject    Sex
1       26.0   8     M01   Male
2       25.0  10     M01   Male
3       29.0  12     M01   Male
4       31.0  14     M01   Male
5       21.5   8     M02   Male
6       22.5  10     M02   Male
7       23.0  12     M02   Male
8       26.5  14     M02   Male
9       23.0   8     M03   Male
10      22.5  10     M03   Male
11      24.0  12     M03   Male
12      27.5  14     M03   Male
13      25.5   8     M04   Male
14      27.5  10     M04   Male
15      26.5  12     M04   Male
16      27.0  14     M04   Male
17      20.0   8     M05   Male
18      23.5  10     M05   Male
19      22.5  12     M05   Male
20      26.0  14     M05   Male
21      24.5   8     M06   Male
22      25.5  10     M06   Male
23      27.0  12     M06   Male
24      28.5  14     M06   Male
25      22.0   8     M07   Male
26      22.0  10     M07   Male
27      24.5  12     M07   Male
28      26.5  14     M07   Male
29      24.0   8     M08   Male
30      21.5  10     M08   Male
31      24.5  12     M08   Male
32      25.5  14     M08   Male
33      23.0   8     M09   Male
34      20.5  10     M09   Male
35      31.0  12     M09   Male
36      26.0  14     M09   Male
37      27.5   8     M10   Male
38      28.0  10     M10   Male
39      31.0  12     M10   Male
40      31.5  14     M10   Male
41      23.0   8     M11   Male
42      23.0  10     M11   Male
43      23.5  12     M11   Male
44      25.0  14     M11   Male
45      21.5   8     M12   Male
46      23.5  10     M12   Male
47      24.0  12     M12   Male
48      28.0  14     M12   Male
49      17.0   8     M13   Male
50      24.5  10     M13   Male
51      26.0  12     M13   Male
52      29.5  14     M13   Male
53      22.5   8     M14   Male
54      25.5  10     M14   Male
55      25.5  12     M14   Male
56      26.0  14     M14   Male
57      23.0   8     M15   Male
58      24.5  10     M15   Male
59      26.0  12     M15   Male
60      30.0  14     M15   Male
61      22.0   8     M16   Male
62      21.5  10     M16   Male
63      23.5  12     M16   Male
64      25.0  14     M16   Male
65      21.0   8     F01 Female
66      20.0  10     F01 Female
67      21.5  12     F01 Female
68      23.0  14     F01 Female
69      21.0   8     F02 Female
70      21.5  10     F02 Female
71      24.0  12     F02 Female
72      25.5  14     F02 Female
73      20.5   8     F03 Female
74      24.0  10     F03 Female
75      24.5  12     F03 Female
76      26.0  14     F03 Female
77      23.5   8     F04 Female
78      24.5  10     F04 Female
79      25.0  12     F04 Female
80      26.5  14     F04 Female
81      21.5   8     F05 Female
82      23.0  10     F05 Female
83      22.5  12     F05 Female
84      23.5  14     F05 Female
85      20.0   8     F06 Female
86      21.0  10     F06 Female
87      21.0  12     F06 Female
88      22.5  14     F06 Female
89      21.5   8     F07 Female
90      22.5  10     F07 Female
91      23.0  12     F07 Female
92      25.0  14     F07 Female
93      23.0   8     F08 Female
94      23.0  10     F08 Female
95      23.5  12     F08 Female
96      24.0  14     F08 Female
97      20.0   8     F09 Female
98      21.0  10     F09 Female
99      22.0  12     F09 Female
100     21.5  14     F09 Female
101     16.5   8     F10 Female
102     19.0  10     F10 Female
103     19.0  12     F10 Female
104     19.5  14     F10 Female
105     24.5   8     F11 Female
106     25.0  10     F11 Female
107     28.0  12     F11 Female
108     28.0  14     F11 Female

*Mixed modeling and fixed effect test*
SAS
proc mixed data=formixed;
class gender age person;
model y = gender|age;
repeated / type=cs sub=person;
run;

output of interest to me
        Tests of Fixed Effects
Source             NDF   DDF    Type III F    Pr > F
GENDER           1        25        9.29        0.0054
AGE                  3        75       35.35       0.0001
GENDER*AGE   3        75        2.36        0.0781

R (nlme package)
y=lme(distance~Sex*age, random=(~1|Subject), data=dental)
anova(y)

          numDF denDF  F-value p-value
(Intercept)     1    75 4123.156  <.0001
Sex              1    25    9.292  0.0054
age               3    75   40.032  <.0001
Sex:age        3    75    2.362  0.0781

Now this isn't exact but it is extremely close, however when I try to
replicate the unstructured,

SAS
proc mixed data=formixed;
class gender age person;
model y = gender|age;
repeated / type=un sub=person;
run;

           Tests of Fixed Effects
Source          NDF DDF Type III F Pr > F
GENDER         1    25     9.29    0.0054
AGE                3    25    34.45   0.0001
GENDER*AGE 3    25     2.93    0.0532

R
either
y=lme(distance~Sex*age, random=(~1|Subject), corr=corSymm(,~1|Subject),
data=dental)
anova(y)
or
z=lme(distance~Sex*age, random=(~1|Subject), corr=corSymm(),
data=dental)
anova(z)

gives the output

          numDF denDF  F-value    p-value
(Intercept)     1    75     4052.028  <.0001
Sex              1    25       8.462      0.0075
age               3    75      39.022    <.0001
Sex:age        3    75       2.868      0.0421

What am I doing wrong to replicate the unstructured linear mixed model
from
SAS?

Regards,

Charles

      [[alternative HTML version deleted]]

_______________________________________________
R-sig-mixed-models at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models

--
Kevin Wright

      [[alternative HTML version deleted]]

_______________________________________________
R-sig-mixed-models at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models

	[[alternative HTML version deleted]]

_______________________________________________
R-sig-mixed-models at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models

The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.
It seems to me that the SAS documentation ought to tell you what algorithm
it uses to compute the degrees of freedom. Perhaps they do that by
referencing a published paper. If so, just read the original source to get
the relevant formulas. The only reliable way to get the same DF that SAS
provides is to use the same formula for calculating it. There may or may not
be existing R code that implements the required formulas, but you need to
know what you're looking for in order to find it. 

Steven J. Pierce, Ph.D. 
Associate Director 
Center for Statistical Training & Consulting (CSTAT) 
Michigan State University 
E-mail: pierces1 at msu.edu 
Web: http://www.cstat.msu.edu 

-----Original Message-----
From: Charles Determan Jr [mailto:deter088 at umn.edu] 
Sent: Friday, January 27, 2012 7:08 AM
To: John Maindonald
Cc: r-sig-mixed-models at r-project.org
Subject: Re: [R-sig-ME] Trouble Replicating Unstructured Mixed Procedure in
R

Thank you John, I truly appreciate all the advice I have received.  At no
point to I feel entitled for someone to provide this for me.  I do hope
that others may find this thread useful as well.  If I may just as one more
question.  Is there an accepted way to calculate the degrees of freedom in
this case to come to the value of 25?  This has been something I have been
trying to determine.

Again, thank you to all for providing me with all the information you have,

Charles
My post may be redundant, but just in case I wanted to comment.  I
think Kevin's post
addressed a few things that are helpful to start with.

The original proc mixed syntax did not have a random statement. This
is similar to what
Pinheiro and Bates (Chapter 5) describe as an extended linear model.
Therefore, it seems we should not be able
to reproduce this with  lme and nlme. Instead, we use gls. I think
some people do not like to call this
a mixed effect model because the intercept or slope are not random,
but I can see why SAS would include
it with proc mixed because of the similarities. The SAS reference for
this is Littell (SAS for mixed models) Chapter 5.

proc mixed data=formixed;
class gender person;
model y = gender|age/solution;
repeated / type=un sub=person;
run;

gives me a similar result to

 gls( y ~ gender + age + gender*age , formixed, correlation =
corSymm(form = ~ 1 | person))

SAS output:

Intercept 15.84
gender 1.58
age 0.82
age*gender -.35

nlme output:

Coefficients:

(Intercept)      15.93
genderFemale      1.47
age               0.82
genderFemale:age -0.35
Dear Charles,

First, I hope you are not put off by the tone of some the responses in this
email thread. ?Sometimes people have strong opinions and it might come
across aggressively.

Second, be sure to understand that reproducing a SAS analysis with lme in
no way violates any legal agreements that SAS may have, if for no other
reason than you never signed an agreement with SAS! ?That bit in the EULA
about decompiling and reverse engineering means that people are prohibited
from creating a new version of PROC MIXED that does the same thing. ?The
nlme package uses different methods than SAS. E.g. different optimizers,
even uses a log-parameterization deep in the code so that negative variance
components cannot happen.

Third, by now you've probably figured out that PROC MIXED and lme have very
different ideas about degrees of freedom. ?Also, the loglikelihoods are on
different scales. ?For that reason, when I try to reproduce an analysis, I
find the best way to compare is to look at the variance components.

Here is what PROC MIXED says:

? ?Cov Parm Estimate ? ? Std Error ? ? ? Z ?Pr > |Z|
UN(1,1) ? ? 5.41545455 ? ?1.53172185 ? ?3.54 ? ?0.0004
UN(2,1) ? ? 2.71681818 ? ?1.09623989 ? ?2.48 ? ?0.0132
UN(2,2) ? ? 4.18477273 ? ?1.18363247 ? ?3.54 ? ?0.0004
UN(3,1) ? ? 3.91022727 ? ?1.41775367 ? ?2.76 ? ?0.0058
UN(3,2) ? ? 2.92715909 ? ?1.19304751 ? ?2.45 ? ?0.0141
UN(3,3) ? ? 6.45573864 ? ?1.82595863 ? ?3.54 ? ?0.0004
UN(4,1) ? ? 2.71022727 ? ?1.17209851 ? ?2.31 ? ?0.0208
UN(4,2) ? ? 3.31715909 ? ?1.12903016 ? ?2.94 ? ?0.0033
UN(4,3) ? ? 4.13073864 ? ?1.40356157 ? ?2.94 ? ?0.0033
UN(4,4) ? ? 4.98573864 ? ?1.41017984 ? ?3.54 ? ?0.0004
Residual ? ? ? ? 1.00000000 . ? ? ? . ? ? ? ? .

That's our target, and here is the lme code to get us there.

library(nlme)
orth <- as.data.frame(Orthodont)
m1 <- gls(distance ~ Sex*age,
? ? ? ? ?correlation=corSymm(form = ~ 1 | Subject),
? ? ? ? ?weights = varIdent(form = ~ 1 | age),
? ? ? ? ?data = orth)
cors <- corMatrix(m1$modelStruct$corStruct)[[1]]
vars <- c(1.0000000, 0.8788793, 1.0744596, 0.9586878)^2 * 2.329213^2
covs <- outer(vars,vars, function(x,y) sqrt(x)*sqrt(y))
cors*covs

Now, some explanations will surely help, so let's step through the code
with comments.

The Orthodont data is a groupedData object, which can be helpful, but
sometimes confusing, so coercing to a data.frame removes the formula--you
can see it with 'formula(Orthodont)'.

orth <- as.data.frame(Orthodont)

Since there is no "random" line in the PROC MIXED code, we don't want to
use 'lme' (also why I removed the formula from the data), but instead use
'gls' for generalized least squares. ?The 'corSymm' part specifies a
symmetric correlation matrix with 1 on the diagonal. ?Looking at the MIXED
parameters, the results are given a covariances, not correlations. ?Note
how the variances UN(1,1), UN(2,2,), etc are different, not constant along
the diagonal. ?We use the 'weights' statement to specify different stratum
variances.

m1 <- gls(distance ~ Sex*age,
? ? ? ? ?correlation=corSymm(form = ~ 1 | Subject),
? ? ? ? ?weights = varIdent(form = ~ 1 | age),
? ? ? ? ?data = orth)

Now we have to convert the correlations and standard deviations of lme into
covariance parameters of MIXED.

First, extract the correlation matrix from lme.

cors <- corMatrix(m1$modelStruct$corStruct)[[1]]

Now, hard-code the stratum std deviations and square them to get
variances. ?Multiply by the square of the residual std dev. ?(There's a way
to extract all these values from the fitted model instead of hand-typing
them, but I can't find them at the moment.)

vars <- c(1.0000000, 0.8788793, 1.0744596, 0.9586878)^2 * 2.329213^2

Now create a matrix of variances and covariances.

covs <- outer(vars,vars, function(x,y) sqrt(x)*sqrt(y))

Finally, multiply the correlations by the covariances. ?Let's compare
results:

PROC MIXED:

Row Col1 Col2 Col3 Col4
1 5.415 2.716 3.910 2.710
2 2.716 4.184 2.927 3.317
3 3.910 2.927 6.455 4.130
4 2.710 3.317 4.130 4.985

R> round(cors*covs,3)
? ? ?[,1] ?[,2] ?[,3] ?[,4]
[1,] 5.425 2.709 3.841 2.715
[2,] 2.709 4.191 2.975 3.314
[3,] 3.841 2.975 6.263 4.133
[4,] 2.715 3.314 4.133 4.986

In my experience, the small differences in the results are typical for
unstructured models.

Hope this helps.

Kevin Wright

On Tue, Jan 24, 2012 at 8:32 PM, Charles Determan Jr <deter088 at umn.edu>wrote:

Greetings,

I have been working on R for some time now and I have begun the endeavor of
trying to replicate some SAS code in R. ?I have scoured the forums but
haven't been able to find an answer. ?I hope one of you could be so kind as
to enlighten me.

I am attempting to replicate a repeated measures experiment using some
standard data. ?I have posted the SAS code and output directly from a
publication as well as my attempts in R to replicate it. ?My main issue
comes with the 'unstructured' component.

The 'dental' dataset from 'mixedQF' package,
equivalent to formixed data in SAS

? ?distance age Subject ? ?Sex
1 ? ? ? 26.0 ? 8 ? ? M01 ? Male
2 ? ? ? 25.0 ?10 ? ? M01 ? Male
3 ? ? ? 29.0 ?12 ? ? M01 ? Male
4 ? ? ? 31.0 ?14 ? ? M01 ? Male
5 ? ? ? 21.5 ? 8 ? ? M02 ? Male
6 ? ? ? 22.5 ?10 ? ? M02 ? Male
7 ? ? ? 23.0 ?12 ? ? M02 ? Male
8 ? ? ? 26.5 ?14 ? ? M02 ? Male
9 ? ? ? 23.0 ? 8 ? ? M03 ? Male
10 ? ? ?22.5 ?10 ? ? M03 ? Male
11 ? ? ?24.0 ?12 ? ? M03 ? Male
12 ? ? ?27.5 ?14 ? ? M03 ? Male
13 ? ? ?25.5 ? 8 ? ? M04 ? Male
14 ? ? ?27.5 ?10 ? ? M04 ? Male
15 ? ? ?26.5 ?12 ? ? M04 ? Male
16 ? ? ?27.0 ?14 ? ? M04 ? Male
17 ? ? ?20.0 ? 8 ? ? M05 ? Male
18 ? ? ?23.5 ?10 ? ? M05 ? Male
19 ? ? ?22.5 ?12 ? ? M05 ? Male
20 ? ? ?26.0 ?14 ? ? M05 ? Male
21 ? ? ?24.5 ? 8 ? ? M06 ? Male
22 ? ? ?25.5 ?10 ? ? M06 ? Male
23 ? ? ?27.0 ?12 ? ? M06 ? Male
24 ? ? ?28.5 ?14 ? ? M06 ? Male
25 ? ? ?22.0 ? 8 ? ? M07 ? Male
26 ? ? ?22.0 ?10 ? ? M07 ? Male
27 ? ? ?24.5 ?12 ? ? M07 ? Male
28 ? ? ?26.5 ?14 ? ? M07 ? Male
29 ? ? ?24.0 ? 8 ? ? M08 ? Male
30 ? ? ?21.5 ?10 ? ? M08 ? Male
31 ? ? ?24.5 ?12 ? ? M08 ? Male
32 ? ? ?25.5 ?14 ? ? M08 ? Male
33 ? ? ?23.0 ? 8 ? ? M09 ? Male
34 ? ? ?20.5 ?10 ? ? M09 ? Male
35 ? ? ?31.0 ?12 ? ? M09 ? Male
36 ? ? ?26.0 ?14 ? ? M09 ? Male
37 ? ? ?27.5 ? 8 ? ? M10 ? Male
38 ? ? ?28.0 ?10 ? ? M10 ? Male
39 ? ? ?31.0 ?12 ? ? M10 ? Male
40 ? ? ?31.5 ?14 ? ? M10 ? Male
41 ? ? ?23.0 ? 8 ? ? M11 ? Male
42 ? ? ?23.0 ?10 ? ? M11 ? Male
43 ? ? ?23.5 ?12 ? ? M11 ? Male
44 ? ? ?25.0 ?14 ? ? M11 ? Male
45 ? ? ?21.5 ? 8 ? ? M12 ? Male
46 ? ? ?23.5 ?10 ? ? M12 ? Male
47 ? ? ?24.0 ?12 ? ? M12 ? Male
48 ? ? ?28.0 ?14 ? ? M12 ? Male
49 ? ? ?17.0 ? 8 ? ? M13 ? Male
50 ? ? ?24.5 ?10 ? ? M13 ? Male
51 ? ? ?26.0 ?12 ? ? M13 ? Male
52 ? ? ?29.5 ?14 ? ? M13 ? Male
53 ? ? ?22.5 ? 8 ? ? M14 ? Male
54 ? ? ?25.5 ?10 ? ? M14 ? Male
55 ? ? ?25.5 ?12 ? ? M14 ? Male
56 ? ? ?26.0 ?14 ? ? M14 ? Male
57 ? ? ?23.0 ? 8 ? ? M15 ? Male
58 ? ? ?24.5 ?10 ? ? M15 ? Male
59 ? ? ?26.0 ?12 ? ? M15 ? Male
60 ? ? ?30.0 ?14 ? ? M15 ? Male
61 ? ? ?22.0 ? 8 ? ? M16 ? Male
62 ? ? ?21.5 ?10 ? ? M16 ? Male
63 ? ? ?23.5 ?12 ? ? M16 ? Male
64 ? ? ?25.0 ?14 ? ? M16 ? Male
65 ? ? ?21.0 ? 8 ? ? F01 Female
66 ? ? ?20.0 ?10 ? ? F01 Female
67 ? ? ?21.5 ?12 ? ? F01 Female
68 ? ? ?23.0 ?14 ? ? F01 Female
69 ? ? ?21.0 ? 8 ? ? F02 Female
70 ? ? ?21.5 ?10 ? ? F02 Female
71 ? ? ?24.0 ?12 ? ? F02 Female
72 ? ? ?25.5 ?14 ? ? F02 Female
73 ? ? ?20.5 ? 8 ? ? F03 Female
74 ? ? ?24.0 ?10 ? ? F03 Female
75 ? ? ?24.5 ?12 ? ? F03 Female
76 ? ? ?26.0 ?14 ? ? F03 Female
77 ? ? ?23.5 ? 8 ? ? F04 Female
78 ? ? ?24.5 ?10 ? ? F04 Female
79 ? ? ?25.0 ?12 ? ? F04 Female
80 ? ? ?26.5 ?14 ? ? F04 Female
81 ? ? ?21.5 ? 8 ? ? F05 Female
82 ? ? ?23.0 ?10 ? ? F05 Female
83 ? ? ?22.5 ?12 ? ? F05 Female
84 ? ? ?23.5 ?14 ? ? F05 Female
85 ? ? ?20.0 ? 8 ? ? F06 Female
86 ? ? ?21.0 ?10 ? ? F06 Female
87 ? ? ?21.0 ?12 ? ? F06 Female
88 ? ? ?22.5 ?14 ? ? F06 Female
89 ? ? ?21.5 ? 8 ? ? F07 Female
90 ? ? ?22.5 ?10 ? ? F07 Female
91 ? ? ?23.0 ?12 ? ? F07 Female
92 ? ? ?25.0 ?14 ? ? F07 Female
93 ? ? ?23.0 ? 8 ? ? F08 Female
94 ? ? ?23.0 ?10 ? ? F08 Female
95 ? ? ?23.5 ?12 ? ? F08 Female
96 ? ? ?24.0 ?14 ? ? F08 Female
97 ? ? ?20.0 ? 8 ? ? F09 Female
98 ? ? ?21.0 ?10 ? ? F09 Female
99 ? ? ?22.0 ?12 ? ? F09 Female
100 ? ? 21.5 ?14 ? ? F09 Female
101 ? ? 16.5 ? 8 ? ? F10 Female
102 ? ? 19.0 ?10 ? ? F10 Female
103 ? ? 19.0 ?12 ? ? F10 Female
104 ? ? 19.5 ?14 ? ? F10 Female
105 ? ? 24.5 ? 8 ? ? F11 Female
106 ? ? 25.0 ?10 ? ? F11 Female
107 ? ? 28.0 ?12 ? ? F11 Female
108 ? ? 28.0 ?14 ? ? F11 Female

*Mixed modeling and fixed effect test*
SAS
proc mixed data=formixed;
class gender age person;
model y = gender|age;
repeated / type=cs sub=person;
run;

output of interest to me
? ? ? ? ?Tests of Fixed Effects
Source ? ? ? ? ? ? NDF ? DDF ? ?Type III F ? ?Pr > F
GENDER ? ? ? ? ? 1 ? ? ? ?25 ? ? ? ?9.29 ? ? ? ?0.0054
AGE ? ? ? ? ? ? ? ? ?3 ? ? ? ?75 ? ? ? 35.35 ? ? ? 0.0001
GENDER*AGE ? 3 ? ? ? ?75 ? ? ? ?2.36 ? ? ? ?0.0781

R (nlme package)
y=lme(distance~Sex*age, random=(~1|Subject), data=dental)
anova(y)

? ? ? ? ? ?numDF denDF ?F-value p-value
(Intercept) ? ? 1 ? ?75 4123.156 ?<.0001
Sex ? ? ? ? ? ? ?1 ? ?25 ? ?9.292 ?0.0054
age ? ? ? ? ? ? ? 3 ? ?75 ? 40.032 ?<.0001
Sex:age ? ? ? ?3 ? ?75 ? ?2.362 ?0.0781

Now this isn't exact but it is extremely close, however when I try to
replicate the unstructured,

SAS
proc mixed data=formixed;
class gender age person;
model y = gender|age;
repeated / type=un sub=person;
run;

? ? ? ? ? ? Tests of Fixed Effects
Source ? ? ? ? ?NDF DDF Type III F Pr > F
GENDER ? ? ? ? 1 ? ?25 ? ? 9.29 ? ?0.0054
AGE ? ? ? ? ? ? ? ?3 ? ?25 ? ?34.45 ? 0.0001
GENDER*AGE 3 ? ?25 ? ? 2.93 ? ?0.0532

R
either
y=lme(distance~Sex*age, random=(~1|Subject), corr=corSymm(,~1|Subject),
data=dental)
anova(y)
or
z=lme(distance~Sex*age, random=(~1|Subject), corr=corSymm(), data=dental)
anova(z)

gives the output

? ? ? ? ? ?numDF denDF ?F-value ? ?p-value
(Intercept) ? ? 1 ? ?75 ? ? 4052.028 ?<.0001
Sex ? ? ? ? ? ? ?1 ? ?25 ? ? ? 8.462 ? ? ?0.0075
age ? ? ? ? ? ? ? 3 ? ?75 ? ? ?39.022 ? ?<.0001
Sex:age ? ? ? ?3 ? ?75 ? ? ? 2.868 ? ? ?0.0421

What am I doing wrong to replicate the unstructured linear mixed model from
SAS?

Regards,

Charles

? ? ? ?[[alternative HTML version deleted]]

_______________________________________________
R-sig-mixed-models at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models

--
Kevin Wright

? ? ? ?[[alternative HTML version deleted]]

_______________________________________________
R-sig-mixed-models at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models