Skip to content

Trouble Replicating Unstructured Mixed Procedure in R

19 messages · Charles Determan Jr, David Atkins, David Duffy +7 more

#
In the CS model, the F values for Gender and Gender*age are really close, but age is quite discrepant. That seems problematic.

-----Original Message-----
From: r-sig-mixed-models-bounces at r-project.org [mailto:r-sig-mixed-models-bounces at r-project.org] On Behalf Of Charles Determan Jr
Sent: Tuesday, January 24, 2012 8:32 PM
To: r-sig-mixed-models at r-project.org
Subject: [R-sig-ME] Trouble Replicating Unstructured Mixed Procedure in R

Greetings,

I have been working on R for some time now and I have begun the endeavor of
trying to replicate some SAS code in R.  I have scoured the forums but
haven't been able to find an answer.  I hope one of you could be so kind as
to enlighten me.

I am attempting to replicate a repeated measures experiment using some
standard data.  I have posted the SAS code and output directly from a
publication as well as my attempts in R to replicate it.  My main issue
comes with the 'unstructured' component.

The 'dental' dataset from 'mixedQF' package,
equivalent to formixed data in SAS

    distance age Subject    Sex
1       26.0   8     M01   Male
2       25.0  10     M01   Male
3       29.0  12     M01   Male
4       31.0  14     M01   Male
5       21.5   8     M02   Male
6       22.5  10     M02   Male
7       23.0  12     M02   Male
8       26.5  14     M02   Male
9       23.0   8     M03   Male
10      22.5  10     M03   Male
11      24.0  12     M03   Male
12      27.5  14     M03   Male
13      25.5   8     M04   Male
14      27.5  10     M04   Male
15      26.5  12     M04   Male
16      27.0  14     M04   Male
17      20.0   8     M05   Male
18      23.5  10     M05   Male
19      22.5  12     M05   Male
20      26.0  14     M05   Male
21      24.5   8     M06   Male
22      25.5  10     M06   Male
23      27.0  12     M06   Male
24      28.5  14     M06   Male
25      22.0   8     M07   Male
26      22.0  10     M07   Male
27      24.5  12     M07   Male
28      26.5  14     M07   Male
29      24.0   8     M08   Male
30      21.5  10     M08   Male
31      24.5  12     M08   Male
32      25.5  14     M08   Male
33      23.0   8     M09   Male
34      20.5  10     M09   Male
35      31.0  12     M09   Male
36      26.0  14     M09   Male
37      27.5   8     M10   Male
38      28.0  10     M10   Male
39      31.0  12     M10   Male
40      31.5  14     M10   Male
41      23.0   8     M11   Male
42      23.0  10     M11   Male
43      23.5  12     M11   Male
44      25.0  14     M11   Male
45      21.5   8     M12   Male
46      23.5  10     M12   Male
47      24.0  12     M12   Male
48      28.0  14     M12   Male
49      17.0   8     M13   Male
50      24.5  10     M13   Male
51      26.0  12     M13   Male
52      29.5  14     M13   Male
53      22.5   8     M14   Male
54      25.5  10     M14   Male
55      25.5  12     M14   Male
56      26.0  14     M14   Male
57      23.0   8     M15   Male
58      24.5  10     M15   Male
59      26.0  12     M15   Male
60      30.0  14     M15   Male
61      22.0   8     M16   Male
62      21.5  10     M16   Male
63      23.5  12     M16   Male
64      25.0  14     M16   Male
65      21.0   8     F01 Female
66      20.0  10     F01 Female
67      21.5  12     F01 Female
68      23.0  14     F01 Female
69      21.0   8     F02 Female
70      21.5  10     F02 Female
71      24.0  12     F02 Female
72      25.5  14     F02 Female
73      20.5   8     F03 Female
74      24.0  10     F03 Female
75      24.5  12     F03 Female
76      26.0  14     F03 Female
77      23.5   8     F04 Female
78      24.5  10     F04 Female
79      25.0  12     F04 Female
80      26.5  14     F04 Female
81      21.5   8     F05 Female
82      23.0  10     F05 Female
83      22.5  12     F05 Female
84      23.5  14     F05 Female
85      20.0   8     F06 Female
86      21.0  10     F06 Female
87      21.0  12     F06 Female
88      22.5  14     F06 Female
89      21.5   8     F07 Female
90      22.5  10     F07 Female
91      23.0  12     F07 Female
92      25.0  14     F07 Female
93      23.0   8     F08 Female
94      23.0  10     F08 Female
95      23.5  12     F08 Female
96      24.0  14     F08 Female
97      20.0   8     F09 Female
98      21.0  10     F09 Female
99      22.0  12     F09 Female
100     21.5  14     F09 Female
101     16.5   8     F10 Female
102     19.0  10     F10 Female
103     19.0  12     F10 Female
104     19.5  14     F10 Female
105     24.5   8     F11 Female
106     25.0  10     F11 Female
107     28.0  12     F11 Female
108     28.0  14     F11 Female

*Mixed modeling and fixed effect test*
SAS
proc mixed data=formixed;
class gender age person;
model y = gender|age;
repeated / type=cs sub=person;
run;

output of interest to me
          Tests of Fixed Effects
Source             NDF   DDF    Type III F    Pr > F
GENDER           1        25        9.29        0.0054
AGE                  3        75       35.35       0.0001
GENDER*AGE   3        75        2.36        0.0781

R (nlme package)
y=lme(distance~Sex*age, random=(~1|Subject), data=dental)
anova(y)

            numDF denDF  F-value p-value
(Intercept)     1    75 4123.156  <.0001
Sex              1    25    9.292  0.0054
age               3    75   40.032  <.0001
Sex:age        3    75    2.362  0.0781

Now this isn't exact but it is extremely close, however when I try to
replicate the unstructured,

SAS
proc mixed data=formixed;
class gender age person;
model y = gender|age;
repeated / type=un sub=person;
run;

             Tests of Fixed Effects
Source          NDF DDF Type III F Pr > F
GENDER         1    25     9.29    0.0054
AGE                3    25    34.45   0.0001
GENDER*AGE 3    25     2.93    0.0532

R
either
y=lme(distance~Sex*age, random=(~1|Subject), corr=corSymm(,~1|Subject),
data=dental)
anova(y)
or
z=lme(distance~Sex*age, random=(~1|Subject), corr=corSymm(), data=dental)
anova(z)

gives the output

            numDF denDF  F-value    p-value
(Intercept)     1    75     4052.028  <.0001
Sex              1    25       8.462      0.0075
age               3    75      39.022    <.0001
Sex:age        3    75       2.868      0.0421

What am I doing wrong to replicate the unstructured linear mixed model from
SAS?

Regards,

Charles


_______________________________________________
R-sig-mixed-models at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models

-----------------------------------------------------------------------
Confidentiality Notice: This e-mail message, including any attachments,
is for the sole use of the intended recipient(s) and may contain
privileged and confidential information.  Any unauthorized review, use,
disclosure or distribution is prohibited.  If you are not the intended
recipient, please contact the sender by reply e-mail and destroy
all copies of the original message.
#
Hi Charles--

So, I don't know SAS at all, so just providing the SAS code and anova 
table don't tell me (at least) all that much about what SAS is doing. 
Also keep in mind that the anova table summary will depend on the "type" 
of sums of squares (and there have been some diatribes about this and 
type III sums of squares in the past on this list).

I find it a bit more informative to look at the summary output from lme(r):

 > y=lme(distance~Sex*age, random=(~1|Subject), data=dental)
 >
 > summary(y)
Linear mixed-effects model fit by REML
  Data: dental
        AIC      BIC    logLik
   445.7572 461.6236 -216.8786

Random effects:
  Formula: ~1 | Subject
         (Intercept) Residual
StdDev:    1.816214 1.386382

Fixed effects: distance ~ Sex * age
                   Value Std.Error DF   t-value p-value
(Intercept)   16.340625 0.9813122 79 16.651810  0.0000
SexFemale      1.032102 1.5374208 25  0.671321  0.5082
age            0.784375 0.0775011 79 10.120823  0.0000
SexFemale:age -0.304830 0.1214209 79 -2.510520  0.0141
  Correlation:
               (Intr) SexFml age
SexFemale     -0.638
age           -0.869  0.555
SexFemale:age  0.555 -0.869 -0.638

Standardized Within-Group Residuals:
         Min          Q1         Med          Q3         Max
-3.59804400 -0.45461690  0.01578365  0.50244658  3.68620792

Number of Observations: 108
Number of Groups: 27

DAVE: so, we fit a random-intercept, residual error, and 4 fixed-effects 
(N = 27 people and 108 observations)

Here is your second model:

 > y=lme(distance~Sex*age, random=(~1|Subject), corr=corSymm(,~1|Subject),
+ data=dental)
 > summary(y)
Linear mixed-effects model fit by REML
  Data: dental
        AIC      BIC    logLik
   450.1706 481.9033 -213.0853

Random effects:
  Formula: ~1 | Subject
         (Intercept) Residual
StdDev:    1.827112 1.375353

Correlation Structure: General
  Formula: ~1 | Subject
  Parameter estimate(s):
  Correlation:
   1      2      3
2 -0.174
3 -0.002 -0.177
4 -0.342  0.306  0.227
Fixed effects: distance ~ Sex * age
                   Value Std.Error DF   t-value p-value
(Intercept)   15.932927 0.9978753 79 15.966852  0.0000
SexFemale      1.473662 1.5633701 25  0.942619  0.3549
age            0.824339 0.0824260 79 10.000962  0.0000
SexFemale:age -0.348100 0.1291367 79 -2.695594  0.0086
  Correlation:
               (Intr) SexFml age
SexFemale     -0.638
age           -0.874  0.558
SexFemale:age  0.558 -0.874 -0.638

Standardized Within-Group Residuals:
          Min           Q1          Med           Q3          Max
-3.180968424 -0.544002662  0.001195789  0.495264398  3.730242930

Number of Observations: 108
Number of Groups: 27

DAVE: Now, in addition to earlier parameters, we have an unstructured 
covariance matrix for repeated measures.  By all indications this is a 
*worse* fitting model...

Finally, I wonder whether SAS is fitting random-effects at all (no idea, 
just a guess).  If so, you might check it relative to a gls() fit, a la:

 > y2=gls(distance~Sex*age, corr=corSymm(,~1|Subject),
+ data=dental)
 > summary(y2)
Generalized least squares fit by REML
   Model: distance ~ Sex * age
   Data: dental
        AIC      BIC    logLik
   448.1706 477.2589 -213.0853

Correlation Structure: General
  Formula: ~1 | Subject
  Parameter estimate(s):
  Correlation:
   1     2     3
2 0.575
3 0.638 0.574
4 0.515 0.749 0.721

Coefficients:
                   Value Std.Error   t-value p-value
(Intercept)   15.932911 0.9978729 15.966874  0.0000
SexFemale      1.473679 1.5633664  0.942632  0.3481
age            0.824340 0.0824258 10.001000  0.0000
SexFemale:age -0.348102 0.1291364 -2.695611  0.0082

  Correlation:
               (Intr) SexFml age
SexFemale     -0.638
age           -0.874  0.558
SexFemale:age  0.558 -0.874 -0.638

Standardized residuals:
         Min          Q1         Med          Q3         Max
-2.41707752 -0.64439706 -0.07388955  0.58480683  2.26288228

Residual standard error: 2.286908
Degrees of freedom: 108 total; 104 residual

So, not sure if this is helpful, but perhaps you could provide 
additional output / commentary on what exactly SAS is doing (beyond 
anova table).

cheers, Dave


Greetings,

I have been working on R for some time now and I have begun the endeavor of
trying to replicate some SAS code in R.  I have scoured the forums but
haven't been able to find an answer.  I hope one of you could be so kind as
to enlighten me.

I am attempting to replicate a repeated measures experiment using some
standard data.  I have posted the SAS code and output directly from a
publication as well as my attempts in R to replicate it.  My main issue
comes with the 'unstructured' component.

The 'dental' dataset from 'mixedQF' package,
equivalent to formixed data in SAS

     distance age Subject    Sex
1       26.0   8     M01   Male
2       25.0  10     M01   Male
3       29.0  12     M01   Male
4       31.0  14     M01   Male
5       21.5   8     M02   Male
6       22.5  10     M02   Male
7       23.0  12     M02   Male
8       26.5  14     M02   Male
9       23.0   8     M03   Male
10      22.5  10     M03   Male
11      24.0  12     M03   Male
12      27.5  14     M03   Male
13      25.5   8     M04   Male
14      27.5  10     M04   Male
15      26.5  12     M04   Male
16      27.0  14     M04   Male
17      20.0   8     M05   Male
18      23.5  10     M05   Male
19      22.5  12     M05   Male
20      26.0  14     M05   Male
21      24.5   8     M06   Male
22      25.5  10     M06   Male
23      27.0  12     M06   Male
24      28.5  14     M06   Male
25      22.0   8     M07   Male
26      22.0  10     M07   Male
27      24.5  12     M07   Male
28      26.5  14     M07   Male
29      24.0   8     M08   Male
30      21.5  10     M08   Male
31      24.5  12     M08   Male
32      25.5  14     M08   Male
33      23.0   8     M09   Male
34      20.5  10     M09   Male
35      31.0  12     M09   Male
36      26.0  14     M09   Male
37      27.5   8     M10   Male
38      28.0  10     M10   Male
39      31.0  12     M10   Male
40      31.5  14     M10   Male
41      23.0   8     M11   Male
42      23.0  10     M11   Male
43      23.5  12     M11   Male
44      25.0  14     M11   Male
45      21.5   8     M12   Male
46      23.5  10     M12   Male
47      24.0  12     M12   Male
48      28.0  14     M12   Male
49      17.0   8     M13   Male
50      24.5  10     M13   Male
51      26.0  12     M13   Male
52      29.5  14     M13   Male
53      22.5   8     M14   Male
54      25.5  10     M14   Male
55      25.5  12     M14   Male
56      26.0  14     M14   Male
57      23.0   8     M15   Male
58      24.5  10     M15   Male
59      26.0  12     M15   Male
60      30.0  14     M15   Male
61      22.0   8     M16   Male
62      21.5  10     M16   Male
63      23.5  12     M16   Male
64      25.0  14     M16   Male
65      21.0   8     F01 Female
66      20.0  10     F01 Female
67      21.5  12     F01 Female
68      23.0  14     F01 Female
69      21.0   8     F02 Female
70      21.5  10     F02 Female
71      24.0  12     F02 Female
72      25.5  14     F02 Female
73      20.5   8     F03 Female
74      24.0  10     F03 Female
75      24.5  12     F03 Female
76      26.0  14     F03 Female
77      23.5   8     F04 Female
78      24.5  10     F04 Female
79      25.0  12     F04 Female
80      26.5  14     F04 Female
81      21.5   8     F05 Female
82      23.0  10     F05 Female
83      22.5  12     F05 Female
84      23.5  14     F05 Female
85      20.0   8     F06 Female
86      21.0  10     F06 Female
87      21.0  12     F06 Female
88      22.5  14     F06 Female
89      21.5   8     F07 Female
90      22.5  10     F07 Female
91      23.0  12     F07 Female
92      25.0  14     F07 Female
93      23.0   8     F08 Female
94      23.0  10     F08 Female
95      23.5  12     F08 Female
96      24.0  14     F08 Female
97      20.0   8     F09 Female
98      21.0  10     F09 Female
99      22.0  12     F09 Female
100     21.5  14     F09 Female
101     16.5   8     F10 Female
102     19.0  10     F10 Female
103     19.0  12     F10 Female
104     19.5  14     F10 Female
105     24.5   8     F11 Female
106     25.0  10     F11 Female
107     28.0  12     F11 Female
108     28.0  14     F11 Female

*Mixed modeling and fixed effect test*
SAS
proc mixed data=formixed;
class gender age person;
model y = gender|age;
repeated / type=cs sub=person;
run;

output of interest to me
           Tests of Fixed Effects
Source             NDF   DDF    Type III F    Pr > F
GENDER           1        25        9.29        0.0054
AGE                  3        75       35.35       0.0001
GENDER*AGE   3        75        2.36        0.0781

R (nlme package)
y=lme(distance~Sex*age, random=(~1|Subject), data=dental)
anova(y)

             numDF denDF  F-value p-value
(Intercept)     1    75 4123.156  <.0001
Sex              1    25    9.292  0.0054
age               3    75   40.032  <.0001
Sex:age        3    75    2.362  0.0781

Now this isn't exact but it is extremely close, however when I try to
replicate the unstructured,

SAS
proc mixed data=formixed;
class gender age person;
model y = gender|age;
repeated / type=un sub=person;
run;

              Tests of Fixed Effects
Source          NDF DDF Type III F Pr > F
GENDER         1    25     9.29    0.0054
AGE                3    25    34.45   0.0001
GENDER*AGE 3    25     2.93    0.0532

R
either
y=lme(distance~Sex*age, random=(~1|Subject), corr=corSymm(,~1|Subject),
data=dental)
anova(y)
or
z=lme(distance~Sex*age, random=(~1|Subject), corr=corSymm(), data=dental)
anova(z)

gives the output

             numDF denDF  F-value    p-value
(Intercept)     1    75     4052.028  <.0001
Sex              1    25       8.462      0.0075
age               3    75      39.022    <.0001
Sex:age        3    75       2.868      0.0421

What am I doing wrong to replicate the unstructured linear mixed model from
SAS?

Regards,

Charles
#
On Tue, 24 Jan 2012, Charles Determan Jr wrote:

            
This is also the Orthodont dataset, distributed with nlme.

As David Atkins pointed out, R defaults to Type I SS. so you would need to 
use, for example, the Anova() command from the car package.  The other 
thing is that the SAS F statistics are only approximate, depending on 
which covariance structure is chosen (perhaps John Maindonald or someone 
clever could comment), so SAS offers different possibilities for ddf eg

http://www2.sas.com/proceedings/sugi26/p262-26.pdf

while lme and lmer offer one or none.
#
It is well to note that type III sums of squares are problematic.
For testing the effects of a main effect, the null model is constraining
the main effect in a manner that depends on the parameterisation.

There are situations where it makes sense to fit interactions without
main effects, and it is clear what constraint on the main effect is the
relevant null (with an interaction between a factor and a variable,
does one want all lines to go though the same point, or through
perhaps the origin?), but that situation is unusual.  For lines that 
are separate or all through the one point, one does not need 
type III sums of squares.

Analyses often or frequently have enough genuine complications 
worrying (unless it is blindingly obvious that one ought to worry
about it) without the rarely relevant complication of attending to a 
type III sum of squares.  

I'd guess that SAS and lme are, effectively, making different
assumptions about the intended generalisation.  They are
clearly using different denominator degrees of freedom for F.
As one is looking for consistency across the 27 different youths,
SAS's denominator degrees of freedom for the interaction seem 
more or less right, pretty much equivalent to calculating slopes 
for females and slopes for males and using a t-test to compare 
them.  (Sure, in the analyses presented, age has been treated 
as a categorical variable, but the comment still applies.)

John Maindonald             email: john.maindonald at anu.edu.au
phone : +61 2 (6125)3473    fax  : +61 2(6125)5549
Centre for Mathematics & Its Applications, Room 1194,
John Dedman Mathematical Sciences Building (Building 27)
Australian National University, Canberra ACT 0200.
http://www.maths.anu.edu.au/~johnm
On 26/01/2012, at 1:54 PM, David Duffy wrote:

            
#
I am unfamiliar with this critique of Type III SS. Can you point me to a reference discussing the difficulties with Type III SS?

-----Original Message-----
From: r-sig-mixed-models-bounces at r-project.org [mailto:r-sig-mixed-models-bounces at r-project.org] On Behalf Of John Maindonald
Sent: Wednesday, January 25, 2012 11:19 PM
To: David Duffy
Cc: r-sig-mixed-models at r-project.org
Subject: Re: [R-sig-ME] Trouble Replicating Unstructured Mixed Procedure in R

It is well to note that type III sums of squares are problematic.
For testing the effects of a main effect, the null model is constraining
the main effect in a manner that depends on the parameterisation.

There are situations where it makes sense to fit interactions without
main effects, and it is clear what constraint on the main effect is the
relevant null (with an interaction between a factor and a variable,
does one want all lines to go though the same point, or through
perhaps the origin?), but that situation is unusual.  For lines that 
are separate or all through the one point, one does not need 
type III sums of squares.

Analyses often or frequently have enough genuine complications 
worrying (unless it is blindingly obvious that one ought to worry
about it) without the rarely relevant complication of attending to a 
type III sum of squares.  

I'd guess that SAS and lme are, effectively, making different
assumptions about the intended generalisation.  They are
clearly using different denominator degrees of freedom for F.
As one is looking for consistency across the 27 different youths,
SAS's denominator degrees of freedom for the interaction seem 
more or less right, pretty much equivalent to calculating slopes 
for females and slopes for males and using a t-test to compare 
them.  (Sure, in the analyses presented, age has been treated 
as a categorical variable, but the comment still applies.)

John Maindonald             email: john.maindonald at anu.edu.au
phone : +61 2 (6125)3473    fax  : +61 2(6125)5549
Centre for Mathematics & Its Applications, Room 1194,
John Dedman Mathematical Sciences Building (Building 27)
Australian National University, Canberra ACT 0200.
http://www.maths.anu.edu.au/~johnm
On 26/01/2012, at 1:54 PM, David Duffy wrote:

            
_______________________________________________
R-sig-mixed-models at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models

-----------------------------------------------------------------------
Confidentiality Notice: This e-mail message, including any attachments,
is for the sole use of the intended recipient(s) and may contain
privileged and confidential information.  Any unauthorized review, use,
disclosure or distribution is prohibited.  If you are not the intended
recipient, please contact the sender by reply e-mail and destroy
all copies of the original message.
#
I think:

http://www.stats.ox.ac.uk/pub/MASS3/Exegeses.pdf

HTH
Luca



Le 26/01/2012 15:52, Thompson,Paul a ?crit :
#
OK, I've looked at that reference. 

There are 2 aspects of an estimate like a SS. The first is the stability of the estimate, and the second is the interpretation of the estimate. The issues with the interpretation of the different estimates go back to 1970, and they are simply a matter of interpretation. The point of the Venables discussion is that he does not like Type III SS, not that they are wrong. He does not agree with the interpretation.

The issue here is the accuracy of the Type III or Type I or Type II or whatever. Accuracy comes before interpretation. If the r module and SAS do not arrive at the same estimates, that is an important thing. 

Once we agree upon computation, we can argue about interpretation. Charles Determan is inquiring as to computational accuracy. The use and interpretation of the various Type I, II, III, IV, LVX SS are secondary.

-----Original Message-----
From: r-sig-mixed-models-bounces at r-project.org [mailto:r-sig-mixed-models-bounces at r-project.org] On Behalf Of Luca Borger
Sent: Thursday, January 26, 2012 9:03 AM
To: r-sig-mixed-models at r-project.org
Subject: Re: [R-sig-ME] Trouble Replicating Unstructured Mixed Procedure in R

I think:

http://www.stats.ox.ac.uk/pub/MASS3/Exegeses.pdf

HTH
Luca



Le 26/01/2012 15:52, Thompson,Paul a ?crit :
_______________________________________________
R-sig-mixed-models at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models

-----------------------------------------------------------------------
Confidentiality Notice: This e-mail message, including any attachments,
is for the sole use of the intended recipient(s) and may contain
privileged and confidential information.  Any unauthorized review, use,
disclosure or distribution is prohibited.  If you are not the intended
recipient, please contact the sender by reply e-mail and destroy
all copies of the original message.
#
It is not really a matter of computational accuracy.  One can get highly
accurate values for an inappropriate statistic.  

Or if there is insistence on using the word, accuracy, what is the 
meaning?

i) the wrong formula is used?  Then in what sense is it 'wrong'?

ii) there is a numerical inaccuracy in the calculation?  This is almost
never an issue in a relatively simple calculation such as this, given
the care taken by the code writers in such matters.

iii) where an approximation is used, as in using an F-distribution
approximation, is the best choice of degrees of freedom  made to
for use of this approximation?  I judge that the degrees of freedom 
for lme's F-statistic for the interaction are not well chosen.  Users
really have to sort this out for themselves, rather than relying on
what may be a fairly wild approximation that appears in lm's
output.  Using 75df rather than 25df does not however make the 
difference that a choice between (e.g.) 5df and 25df would.

A further and more basic issue is whether the statistic that is 
provided is appropriate to the intended generalisation.  I'd take
this to be generalisation to another sample of youths from the
same population.  In order to understand why R and SAS are 
giving different F-statistics for the interaction, one needs to
understand just what variance-covariance structure is assumed
in each case.  One might extract the two estimates of the 
var-cov structure and compare them. Look for terms in one that
do not appear, or maybe that are zero, in the other.

Finally, it is not just that Venables does not like type III SS.
He is saying that they almost never correspond to a null
hypothesis that makes any sense.  Those who disagree try to 
write down the model to which the null hypothesis corresponds
in testing for the main effect of factor1 with a factor1:factor2
interaction.

John Maindonald             email: john.maindonald at anu.edu.au
phone : +61 2 (6125)3473    fax  : +61 2(6125)5549
Centre for Mathematics & Its Applications, Room 1194,
John Dedman Mathematical Sciences Building (Building 27)
Australian National University, Canberra ACT 0200.
http://www.maths.anu.edu.au/~johnm
On 27/01/2012, at 2:41 AM, Thompson,Paul wrote:

            
#
I've twigged that "unstructured" means a variance-covariance matrix 
that ignores the time structure, i.e., each element is estimated 
separately.  One can do this in lme4, thus:

Orthodont$Age <- factor(Orthodont$age)
+ data=Orthodont)
Analysis of Variance Table
        Df Sum Sq Mean Sq F value
Sex      1  9.591  9.5905 33.9645
Age      3 34.876 11.6252 41.1703
Sex:Age  3  2.930  0.9768  3.4594

The sum of squares for the Sex*Age interaction agrees with that from SAS.
lmer() leaves the user to work out an appropriate df for the F-statistic.  One
has, in agreement with the SAS output:
[1] 0.05318945

The main effects SS and F's are of course not expected to agree and 
indeed, in my strong view, true SAS type III SS's are inappropriate.

I regard this R analysis however as an inelegant use of the data, with low
power.  My book with John Braun (3rd edn and if I recall correctly, 2nd)
has an analysis that I am prepared to defend that uses lmer() from lme4.
Several years ago, I placed on the web a version of the analysis that uses
 lme() from nlme:
   http://maths.anu.edu.au/~johnm/r-book/xtras/mlm-lme.pdf
I'd forgotten that it was there.

This is a list for R users.  You will not necessarily find folk here with a high
level of SAS expertise, maybe not even enough to understand what the
SAS documentation means when it uses the term "unstructured".  Charles,
I think you've done pretty well in the amount of free advice that you have
received.  I think too that Steven McKinney's point is well made.  Kevin's
interpretation of the EULA is probably more or less correct, but the 
document comes across rather more ferociously than he suggests, and it
is not a good starting point for scientific assessment of the respective 
methodologies.  Many of us have other reasons why we are not very
interested in SAS does.  The implied threat in the EULA provides, even
if we are not party to any such agreement, just one more reason why SAS
is not to any large extent in our path ahead to the future.

John Maindonald             email: john.maindonald at anu.edu.au
phone : +61 2 (6125)3473    fax  : +61 2(6125)5549
Centre for Mathematics & Its Applications, Room 1194,
John Dedman Mathematical Sciences Building (Building 27)
Australian National University, Canberra ACT 0200.
http://www.maths.anu.edu.au/~johnm
On 27/01/2012, at 4:03 PM, Kevin Wright wrote:

            
#
Charles,

I guess that SAS is using the Satterthwaite formula to approximate 
denominator degrees of freedom (google 'Satterthwaite'). See Kenward and 
Roger (1997) Biometrics 53 983-997 for another approach.
Charles Determan Jr wrote:

  
    
#
It seems to me that the SAS documentation ought to tell you what algorithm
it uses to compute the degrees of freedom. Perhaps they do that by
referencing a published paper. If so, just read the original source to get
the relevant formulas. The only reliable way to get the same DF that SAS
provides is to use the same formula for calculating it. There may or may not
be existing R code that implements the required formulas, but you need to
know what you're looking for in order to find it. 

Steven J. Pierce, Ph.D. 
Associate Director 
Center for Statistical Training & Consulting (CSTAT) 
Michigan State University 
E-mail: pierces1 at msu.edu 
Web: http://www.cstat.msu.edu 


-----Original Message-----
From: Charles Determan Jr [mailto:deter088 at umn.edu] 
Sent: Friday, January 27, 2012 7:08 AM
To: John Maindonald
Cc: r-sig-mixed-models at r-project.org
Subject: Re: [R-sig-ME] Trouble Replicating Unstructured Mixed Procedure in
R

Thank you John, I truly appreciate all the advice I have received.  At no
point to I feel entitled for someone to provide this for me.  I do hope
that others may find this thread useful as well.  If I may just as one more
question.  Is there an accepted way to calculate the degrees of freedom in
this case to come to the value of 25?  This has been something I have been
trying to determine.

Again, thank you to all for providing me with all the information you have,

Charles
#
My post may be redundant, but just in case I wanted to comment.  I
think Kevin's post
addressed a few things that are helpful to start with.

The original proc mixed syntax did not have a random statement. This
is similar to what
Pinheiro and Bates (Chapter 5) describe as an extended linear model.
Therefore, it seems we should not be able
to reproduce this with  lme and nlme. Instead, we use gls. I think
some people do not like to call this
a mixed effect model because the intercept or slope are not random,
but I can see why SAS would include
it with proc mixed because of the similarities. The SAS reference for
this is Littell (SAS for mixed models) Chapter 5.

proc mixed data=formixed;
class gender person;
model y = gender|age/solution;
repeated / type=un sub=person;
run;

gives me a similar result to

 gls( y ~ gender + age + gender*age , formixed, correlation =
corSymm(form = ~ 1 | person))

SAS output:

Intercept 15.84
gender 1.58
age 0.82
age*gender -.35

nlme output:

Coefficients:

(Intercept)      15.93
genderFemale      1.47
age               0.82
genderFemale:age -0.35
On Fri, Jan 27, 2012 at 12:03 AM, Kevin Wright <kw.stat at gmail.com> wrote: