An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-sig-mixed-models/attachments/20120124/c00c0e1d/attachment-0001.pl>
Trouble Replicating Unstructured Mixed Procedure in R
19 messages · Charles Determan Jr, David Atkins, David Duffy +7 more
In the CS model, the F values for Gender and Gender*age are really close, but age is quite discrepant. That seems problematic.
-----Original Message-----
From: r-sig-mixed-models-bounces at r-project.org [mailto:r-sig-mixed-models-bounces at r-project.org] On Behalf Of Charles Determan Jr
Sent: Tuesday, January 24, 2012 8:32 PM
To: r-sig-mixed-models at r-project.org
Subject: [R-sig-ME] Trouble Replicating Unstructured Mixed Procedure in R
Greetings,
I have been working on R for some time now and I have begun the endeavor of
trying to replicate some SAS code in R. I have scoured the forums but
haven't been able to find an answer. I hope one of you could be so kind as
to enlighten me.
I am attempting to replicate a repeated measures experiment using some
standard data. I have posted the SAS code and output directly from a
publication as well as my attempts in R to replicate it. My main issue
comes with the 'unstructured' component.
The 'dental' dataset from 'mixedQF' package,
equivalent to formixed data in SAS
distance age Subject Sex
1 26.0 8 M01 Male
2 25.0 10 M01 Male
3 29.0 12 M01 Male
4 31.0 14 M01 Male
5 21.5 8 M02 Male
6 22.5 10 M02 Male
7 23.0 12 M02 Male
8 26.5 14 M02 Male
9 23.0 8 M03 Male
10 22.5 10 M03 Male
11 24.0 12 M03 Male
12 27.5 14 M03 Male
13 25.5 8 M04 Male
14 27.5 10 M04 Male
15 26.5 12 M04 Male
16 27.0 14 M04 Male
17 20.0 8 M05 Male
18 23.5 10 M05 Male
19 22.5 12 M05 Male
20 26.0 14 M05 Male
21 24.5 8 M06 Male
22 25.5 10 M06 Male
23 27.0 12 M06 Male
24 28.5 14 M06 Male
25 22.0 8 M07 Male
26 22.0 10 M07 Male
27 24.5 12 M07 Male
28 26.5 14 M07 Male
29 24.0 8 M08 Male
30 21.5 10 M08 Male
31 24.5 12 M08 Male
32 25.5 14 M08 Male
33 23.0 8 M09 Male
34 20.5 10 M09 Male
35 31.0 12 M09 Male
36 26.0 14 M09 Male
37 27.5 8 M10 Male
38 28.0 10 M10 Male
39 31.0 12 M10 Male
40 31.5 14 M10 Male
41 23.0 8 M11 Male
42 23.0 10 M11 Male
43 23.5 12 M11 Male
44 25.0 14 M11 Male
45 21.5 8 M12 Male
46 23.5 10 M12 Male
47 24.0 12 M12 Male
48 28.0 14 M12 Male
49 17.0 8 M13 Male
50 24.5 10 M13 Male
51 26.0 12 M13 Male
52 29.5 14 M13 Male
53 22.5 8 M14 Male
54 25.5 10 M14 Male
55 25.5 12 M14 Male
56 26.0 14 M14 Male
57 23.0 8 M15 Male
58 24.5 10 M15 Male
59 26.0 12 M15 Male
60 30.0 14 M15 Male
61 22.0 8 M16 Male
62 21.5 10 M16 Male
63 23.5 12 M16 Male
64 25.0 14 M16 Male
65 21.0 8 F01 Female
66 20.0 10 F01 Female
67 21.5 12 F01 Female
68 23.0 14 F01 Female
69 21.0 8 F02 Female
70 21.5 10 F02 Female
71 24.0 12 F02 Female
72 25.5 14 F02 Female
73 20.5 8 F03 Female
74 24.0 10 F03 Female
75 24.5 12 F03 Female
76 26.0 14 F03 Female
77 23.5 8 F04 Female
78 24.5 10 F04 Female
79 25.0 12 F04 Female
80 26.5 14 F04 Female
81 21.5 8 F05 Female
82 23.0 10 F05 Female
83 22.5 12 F05 Female
84 23.5 14 F05 Female
85 20.0 8 F06 Female
86 21.0 10 F06 Female
87 21.0 12 F06 Female
88 22.5 14 F06 Female
89 21.5 8 F07 Female
90 22.5 10 F07 Female
91 23.0 12 F07 Female
92 25.0 14 F07 Female
93 23.0 8 F08 Female
94 23.0 10 F08 Female
95 23.5 12 F08 Female
96 24.0 14 F08 Female
97 20.0 8 F09 Female
98 21.0 10 F09 Female
99 22.0 12 F09 Female
100 21.5 14 F09 Female
101 16.5 8 F10 Female
102 19.0 10 F10 Female
103 19.0 12 F10 Female
104 19.5 14 F10 Female
105 24.5 8 F11 Female
106 25.0 10 F11 Female
107 28.0 12 F11 Female
108 28.0 14 F11 Female
*Mixed modeling and fixed effect test*
SAS
proc mixed data=formixed;
class gender age person;
model y = gender|age;
repeated / type=cs sub=person;
run;
output of interest to me
Tests of Fixed Effects
Source NDF DDF Type III F Pr > F
GENDER 1 25 9.29 0.0054
AGE 3 75 35.35 0.0001
GENDER*AGE 3 75 2.36 0.0781
R (nlme package)
y=lme(distance~Sex*age, random=(~1|Subject), data=dental)
anova(y)
numDF denDF F-value p-value
(Intercept) 1 75 4123.156 <.0001
Sex 1 25 9.292 0.0054
age 3 75 40.032 <.0001
Sex:age 3 75 2.362 0.0781
Now this isn't exact but it is extremely close, however when I try to
replicate the unstructured,
SAS
proc mixed data=formixed;
class gender age person;
model y = gender|age;
repeated / type=un sub=person;
run;
Tests of Fixed Effects
Source NDF DDF Type III F Pr > F
GENDER 1 25 9.29 0.0054
AGE 3 25 34.45 0.0001
GENDER*AGE 3 25 2.93 0.0532
R
either
y=lme(distance~Sex*age, random=(~1|Subject), corr=corSymm(,~1|Subject),
data=dental)
anova(y)
or
z=lme(distance~Sex*age, random=(~1|Subject), corr=corSymm(), data=dental)
anova(z)
gives the output
numDF denDF F-value p-value
(Intercept) 1 75 4052.028 <.0001
Sex 1 25 8.462 0.0075
age 3 75 39.022 <.0001
Sex:age 3 75 2.868 0.0421
What am I doing wrong to replicate the unstructured linear mixed model from
SAS?
Regards,
Charles
_______________________________________________
R-sig-mixed-models at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
-----------------------------------------------------------------------
Confidentiality Notice: This e-mail message, including any attachments,
is for the sole use of the intended recipient(s) and may contain
privileged and confidential information. Any unauthorized review, use,
disclosure or distribution is prohibited. If you are not the intended
recipient, please contact the sender by reply e-mail and destroy
all copies of the original message.
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-sig-mixed-models/attachments/20120125/bbae2d1f/attachment-0001.pl>
Hi Charles--
So, I don't know SAS at all, so just providing the SAS code and anova
table don't tell me (at least) all that much about what SAS is doing.
Also keep in mind that the anova table summary will depend on the "type"
of sums of squares (and there have been some diatribes about this and
type III sums of squares in the past on this list).
I find it a bit more informative to look at the summary output from lme(r):
> y=lme(distance~Sex*age, random=(~1|Subject), data=dental)
>
> summary(y)
Linear mixed-effects model fit by REML
Data: dental
AIC BIC logLik
445.7572 461.6236 -216.8786
Random effects:
Formula: ~1 | Subject
(Intercept) Residual
StdDev: 1.816214 1.386382
Fixed effects: distance ~ Sex * age
Value Std.Error DF t-value p-value
(Intercept) 16.340625 0.9813122 79 16.651810 0.0000
SexFemale 1.032102 1.5374208 25 0.671321 0.5082
age 0.784375 0.0775011 79 10.120823 0.0000
SexFemale:age -0.304830 0.1214209 79 -2.510520 0.0141
Correlation:
(Intr) SexFml age
SexFemale -0.638
age -0.869 0.555
SexFemale:age 0.555 -0.869 -0.638
Standardized Within-Group Residuals:
Min Q1 Med Q3 Max
-3.59804400 -0.45461690 0.01578365 0.50244658 3.68620792
Number of Observations: 108
Number of Groups: 27
DAVE: so, we fit a random-intercept, residual error, and 4 fixed-effects
(N = 27 people and 108 observations)
Here is your second model:
> y=lme(distance~Sex*age, random=(~1|Subject), corr=corSymm(,~1|Subject),
+ data=dental)
> summary(y)
Linear mixed-effects model fit by REML
Data: dental
AIC BIC logLik
450.1706 481.9033 -213.0853
Random effects:
Formula: ~1 | Subject
(Intercept) Residual
StdDev: 1.827112 1.375353
Correlation Structure: General
Formula: ~1 | Subject
Parameter estimate(s):
Correlation:
1 2 3
2 -0.174
3 -0.002 -0.177
4 -0.342 0.306 0.227
Fixed effects: distance ~ Sex * age
Value Std.Error DF t-value p-value
(Intercept) 15.932927 0.9978753 79 15.966852 0.0000
SexFemale 1.473662 1.5633701 25 0.942619 0.3549
age 0.824339 0.0824260 79 10.000962 0.0000
SexFemale:age -0.348100 0.1291367 79 -2.695594 0.0086
Correlation:
(Intr) SexFml age
SexFemale -0.638
age -0.874 0.558
SexFemale:age 0.558 -0.874 -0.638
Standardized Within-Group Residuals:
Min Q1 Med Q3 Max
-3.180968424 -0.544002662 0.001195789 0.495264398 3.730242930
Number of Observations: 108
Number of Groups: 27
DAVE: Now, in addition to earlier parameters, we have an unstructured
covariance matrix for repeated measures. By all indications this is a
*worse* fitting model...
Finally, I wonder whether SAS is fitting random-effects at all (no idea,
just a guess). If so, you might check it relative to a gls() fit, a la:
> y2=gls(distance~Sex*age, corr=corSymm(,~1|Subject),
+ data=dental)
> summary(y2)
Generalized least squares fit by REML
Model: distance ~ Sex * age
Data: dental
AIC BIC logLik
448.1706 477.2589 -213.0853
Correlation Structure: General
Formula: ~1 | Subject
Parameter estimate(s):
Correlation:
1 2 3
2 0.575
3 0.638 0.574
4 0.515 0.749 0.721
Coefficients:
Value Std.Error t-value p-value
(Intercept) 15.932911 0.9978729 15.966874 0.0000
SexFemale 1.473679 1.5633664 0.942632 0.3481
age 0.824340 0.0824258 10.001000 0.0000
SexFemale:age -0.348102 0.1291364 -2.695611 0.0082
Correlation:
(Intr) SexFml age
SexFemale -0.638
age -0.874 0.558
SexFemale:age 0.558 -0.874 -0.638
Standardized residuals:
Min Q1 Med Q3 Max
-2.41707752 -0.64439706 -0.07388955 0.58480683 2.26288228
Residual standard error: 2.286908
Degrees of freedom: 108 total; 104 residual
So, not sure if this is helpful, but perhaps you could provide
additional output / commentary on what exactly SAS is doing (beyond
anova table).
cheers, Dave
Greetings,
I have been working on R for some time now and I have begun the endeavor of
trying to replicate some SAS code in R. I have scoured the forums but
haven't been able to find an answer. I hope one of you could be so kind as
to enlighten me.
I am attempting to replicate a repeated measures experiment using some
standard data. I have posted the SAS code and output directly from a
publication as well as my attempts in R to replicate it. My main issue
comes with the 'unstructured' component.
The 'dental' dataset from 'mixedQF' package,
equivalent to formixed data in SAS
distance age Subject Sex
1 26.0 8 M01 Male
2 25.0 10 M01 Male
3 29.0 12 M01 Male
4 31.0 14 M01 Male
5 21.5 8 M02 Male
6 22.5 10 M02 Male
7 23.0 12 M02 Male
8 26.5 14 M02 Male
9 23.0 8 M03 Male
10 22.5 10 M03 Male
11 24.0 12 M03 Male
12 27.5 14 M03 Male
13 25.5 8 M04 Male
14 27.5 10 M04 Male
15 26.5 12 M04 Male
16 27.0 14 M04 Male
17 20.0 8 M05 Male
18 23.5 10 M05 Male
19 22.5 12 M05 Male
20 26.0 14 M05 Male
21 24.5 8 M06 Male
22 25.5 10 M06 Male
23 27.0 12 M06 Male
24 28.5 14 M06 Male
25 22.0 8 M07 Male
26 22.0 10 M07 Male
27 24.5 12 M07 Male
28 26.5 14 M07 Male
29 24.0 8 M08 Male
30 21.5 10 M08 Male
31 24.5 12 M08 Male
32 25.5 14 M08 Male
33 23.0 8 M09 Male
34 20.5 10 M09 Male
35 31.0 12 M09 Male
36 26.0 14 M09 Male
37 27.5 8 M10 Male
38 28.0 10 M10 Male
39 31.0 12 M10 Male
40 31.5 14 M10 Male
41 23.0 8 M11 Male
42 23.0 10 M11 Male
43 23.5 12 M11 Male
44 25.0 14 M11 Male
45 21.5 8 M12 Male
46 23.5 10 M12 Male
47 24.0 12 M12 Male
48 28.0 14 M12 Male
49 17.0 8 M13 Male
50 24.5 10 M13 Male
51 26.0 12 M13 Male
52 29.5 14 M13 Male
53 22.5 8 M14 Male
54 25.5 10 M14 Male
55 25.5 12 M14 Male
56 26.0 14 M14 Male
57 23.0 8 M15 Male
58 24.5 10 M15 Male
59 26.0 12 M15 Male
60 30.0 14 M15 Male
61 22.0 8 M16 Male
62 21.5 10 M16 Male
63 23.5 12 M16 Male
64 25.0 14 M16 Male
65 21.0 8 F01 Female
66 20.0 10 F01 Female
67 21.5 12 F01 Female
68 23.0 14 F01 Female
69 21.0 8 F02 Female
70 21.5 10 F02 Female
71 24.0 12 F02 Female
72 25.5 14 F02 Female
73 20.5 8 F03 Female
74 24.0 10 F03 Female
75 24.5 12 F03 Female
76 26.0 14 F03 Female
77 23.5 8 F04 Female
78 24.5 10 F04 Female
79 25.0 12 F04 Female
80 26.5 14 F04 Female
81 21.5 8 F05 Female
82 23.0 10 F05 Female
83 22.5 12 F05 Female
84 23.5 14 F05 Female
85 20.0 8 F06 Female
86 21.0 10 F06 Female
87 21.0 12 F06 Female
88 22.5 14 F06 Female
89 21.5 8 F07 Female
90 22.5 10 F07 Female
91 23.0 12 F07 Female
92 25.0 14 F07 Female
93 23.0 8 F08 Female
94 23.0 10 F08 Female
95 23.5 12 F08 Female
96 24.0 14 F08 Female
97 20.0 8 F09 Female
98 21.0 10 F09 Female
99 22.0 12 F09 Female
100 21.5 14 F09 Female
101 16.5 8 F10 Female
102 19.0 10 F10 Female
103 19.0 12 F10 Female
104 19.5 14 F10 Female
105 24.5 8 F11 Female
106 25.0 10 F11 Female
107 28.0 12 F11 Female
108 28.0 14 F11 Female
*Mixed modeling and fixed effect test*
SAS
proc mixed data=formixed;
class gender age person;
model y = gender|age;
repeated / type=cs sub=person;
run;
output of interest to me
Tests of Fixed Effects
Source NDF DDF Type III F Pr > F
GENDER 1 25 9.29 0.0054
AGE 3 75 35.35 0.0001
GENDER*AGE 3 75 2.36 0.0781
R (nlme package)
y=lme(distance~Sex*age, random=(~1|Subject), data=dental)
anova(y)
numDF denDF F-value p-value
(Intercept) 1 75 4123.156 <.0001
Sex 1 25 9.292 0.0054
age 3 75 40.032 <.0001
Sex:age 3 75 2.362 0.0781
Now this isn't exact but it is extremely close, however when I try to
replicate the unstructured,
SAS
proc mixed data=formixed;
class gender age person;
model y = gender|age;
repeated / type=un sub=person;
run;
Tests of Fixed Effects
Source NDF DDF Type III F Pr > F
GENDER 1 25 9.29 0.0054
AGE 3 25 34.45 0.0001
GENDER*AGE 3 25 2.93 0.0532
R
either
y=lme(distance~Sex*age, random=(~1|Subject), corr=corSymm(,~1|Subject),
data=dental)
anova(y)
or
z=lme(distance~Sex*age, random=(~1|Subject), corr=corSymm(), data=dental)
anova(z)
gives the output
numDF denDF F-value p-value
(Intercept) 1 75 4052.028 <.0001
Sex 1 25 8.462 0.0075
age 3 75 39.022 <.0001
Sex:age 3 75 2.868 0.0421
What am I doing wrong to replicate the unstructured linear mixed model from
SAS?
Regards,
Charles
Dave Atkins, PhD Research Associate Professor Department of Psychiatry and Behavioral Science University of Washington datkins at u.washington.edu Center for the Study of Health and Risk Behaviors (CSHRB) 1100 NE 45th Street, Suite 300 Seattle, WA 98105 206-616-3879 http://depts.washington.edu/cshrb/ (Mon-Wed) Center for Healthcare Improvement, for Addictions, Mental Illness, Medically Vulnerable Populations (CHAMMP) 325 9th Avenue, 2HH-15 Box 359911 Seattle, WA 98104 http://www.chammp.org (Thurs)
On Tue, 24 Jan 2012, Charles Determan Jr wrote:
Greetings, I have been working on R for some time now and I have begun the endeavor of trying to replicate some SAS code in R. I have scoured the forums but
This is also the Orthodont dataset, distributed with nlme. As David Atkins pointed out, R defaults to Type I SS. so you would need to use, for example, the Anova() command from the car package. The other thing is that the SAS F statistics are only approximate, depending on which covariance structure is chosen (perhaps John Maindonald or someone clever could comment), so SAS offers different possibilities for ddf eg http://www2.sas.com/proceedings/sugi26/p262-26.pdf while lme and lmer offer one or none.
| David Duffy (MBBS PhD) ,-_|\ | email: davidD at qimr.edu.au ph: INT+61+7+3362-0217 fax: -0101 / * | Epidemiology Unit, Queensland Institute of Medical Research \_,-._/ | 300 Herston Rd, Brisbane, Queensland 4029, Australia GPG 4D0B994A v
It is well to note that type III sums of squares are problematic. For testing the effects of a main effect, the null model is constraining the main effect in a manner that depends on the parameterisation. There are situations where it makes sense to fit interactions without main effects, and it is clear what constraint on the main effect is the relevant null (with an interaction between a factor and a variable, does one want all lines to go though the same point, or through perhaps the origin?), but that situation is unusual. For lines that are separate or all through the one point, one does not need type III sums of squares. Analyses often or frequently have enough genuine complications worrying (unless it is blindingly obvious that one ought to worry about it) without the rarely relevant complication of attending to a type III sum of squares. I'd guess that SAS and lme are, effectively, making different assumptions about the intended generalisation. They are clearly using different denominator degrees of freedom for F. As one is looking for consistency across the 27 different youths, SAS's denominator degrees of freedom for the interaction seem more or less right, pretty much equivalent to calculating slopes for females and slopes for males and using a t-test to compare them. (Sure, in the analyses presented, age has been treated as a categorical variable, but the comment still applies.) John Maindonald email: john.maindonald at anu.edu.au phone : +61 2 (6125)3473 fax : +61 2(6125)5549 Centre for Mathematics & Its Applications, Room 1194, John Dedman Mathematical Sciences Building (Building 27) Australian National University, Canberra ACT 0200. http://www.maths.anu.edu.au/~johnm
On 26/01/2012, at 1:54 PM, David Duffy wrote:
On Tue, 24 Jan 2012, Charles Determan Jr wrote:
Greetings, I have been working on R for some time now and I have begun the endeavor of trying to replicate some SAS code in R. I have scoured the forums but
This is also the Orthodont dataset, distributed with nlme. As David Atkins pointed out, R defaults to Type I SS. so you would need to use, for example, the Anova() command from the car package. The other thing is that the SAS F statistics are only approximate, depending on which covariance structure is chosen (perhaps John Maindonald or someone clever could comment), so SAS offers different possibilities for ddf eg http://www2.sas.com/proceedings/sugi26/p262-26.pdf while lme and lmer offer one or none. -- | David Duffy (MBBS PhD) ,-_|\ | email: davidD at qimr.edu.au ph: INT+61+7+3362-0217 fax: -0101 / * | Epidemiology Unit, Queensland Institute of Medical Research \_,-._/ | 300 Herston Rd, Brisbane, Queensland 4029, Australia GPG 4D0B994A v
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-sig-mixed-models/attachments/20120126/f2c20fbd/attachment-0001.pl>
I am unfamiliar with this critique of Type III SS. Can you point me to a reference discussing the difficulties with Type III SS? -----Original Message----- From: r-sig-mixed-models-bounces at r-project.org [mailto:r-sig-mixed-models-bounces at r-project.org] On Behalf Of John Maindonald Sent: Wednesday, January 25, 2012 11:19 PM To: David Duffy Cc: r-sig-mixed-models at r-project.org Subject: Re: [R-sig-ME] Trouble Replicating Unstructured Mixed Procedure in R It is well to note that type III sums of squares are problematic. For testing the effects of a main effect, the null model is constraining the main effect in a manner that depends on the parameterisation. There are situations where it makes sense to fit interactions without main effects, and it is clear what constraint on the main effect is the relevant null (with an interaction between a factor and a variable, does one want all lines to go though the same point, or through perhaps the origin?), but that situation is unusual. For lines that are separate or all through the one point, one does not need type III sums of squares. Analyses often or frequently have enough genuine complications worrying (unless it is blindingly obvious that one ought to worry about it) without the rarely relevant complication of attending to a type III sum of squares. I'd guess that SAS and lme are, effectively, making different assumptions about the intended generalisation. They are clearly using different denominator degrees of freedom for F. As one is looking for consistency across the 27 different youths, SAS's denominator degrees of freedom for the interaction seem more or less right, pretty much equivalent to calculating slopes for females and slopes for males and using a t-test to compare them. (Sure, in the analyses presented, age has been treated as a categorical variable, but the comment still applies.) John Maindonald email: john.maindonald at anu.edu.au phone : +61 2 (6125)3473 fax : +61 2(6125)5549 Centre for Mathematics & Its Applications, Room 1194, John Dedman Mathematical Sciences Building (Building 27) Australian National University, Canberra ACT 0200. http://www.maths.anu.edu.au/~johnm
On 26/01/2012, at 1:54 PM, David Duffy wrote:
On Tue, 24 Jan 2012, Charles Determan Jr wrote:
Greetings, I have been working on R for some time now and I have begun the endeavor of trying to replicate some SAS code in R. I have scoured the forums but
This is also the Orthodont dataset, distributed with nlme. As David Atkins pointed out, R defaults to Type I SS. so you would need to use, for example, the Anova() command from the car package. The other thing is that the SAS F statistics are only approximate, depending on which covariance structure is chosen (perhaps John Maindonald or someone clever could comment), so SAS offers different possibilities for ddf eg http://www2.sas.com/proceedings/sugi26/p262-26.pdf while lme and lmer offer one or none. -- | David Duffy (MBBS PhD) ,-_|\ | email: davidD at qimr.edu.au ph: INT+61+7+3362-0217 fax: -0101 / * | Epidemiology Unit, Queensland Institute of Medical Research \_,-._/ | 300 Herston Rd, Brisbane, Queensland 4029, Australia GPG 4D0B994A v
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models ----------------------------------------------------------------------- Confidentiality Notice: This e-mail message, including any attachments, is for the sole use of the intended recipient(s) and may contain privileged and confidential information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply e-mail and destroy all copies of the original message.
I think: http://www.stats.ox.ac.uk/pub/MASS3/Exegeses.pdf HTH Luca Le 26/01/2012 15:52, Thompson,Paul a ?crit :
I am unfamiliar with this critique of Type III SS. Can you point me to a reference discussing the difficulties with Type III SS? -----Original Message----- From: r-sig-mixed-models-bounces at r-project.org [mailto:r-sig-mixed-models-bounces at r-project.org] On Behalf Of John Maindonald Sent: Wednesday, January 25, 2012 11:19 PM To: David Duffy Cc: r-sig-mixed-models at r-project.org Subject: Re: [R-sig-ME] Trouble Replicating Unstructured Mixed Procedure in R It is well to note that type III sums of squares are problematic. For testing the effects of a main effect, the null model is constraining the main effect in a manner that depends on the parameterisation. There are situations where it makes sense to fit interactions without main effects, and it is clear what constraint on the main effect is the relevant null (with an interaction between a factor and a variable, does one want all lines to go though the same point, or through perhaps the origin?), but that situation is unusual. For lines that are separate or all through the one point, one does not need type III sums of squares. Analyses often or frequently have enough genuine complications worrying (unless it is blindingly obvious that one ought to worry about it) without the rarely relevant complication of attending to a type III sum of squares. I'd guess that SAS and lme are, effectively, making different assumptions about the intended generalisation. They are clearly using different denominator degrees of freedom for F. As one is looking for consistency across the 27 different youths, SAS's denominator degrees of freedom for the interaction seem more or less right, pretty much equivalent to calculating slopes for females and slopes for males and using a t-test to compare them. (Sure, in the analyses presented, age has been treated as a categorical variable, but the comment still applies.) John Maindonald email: john.maindonald at anu.edu.au phone : +61 2 (6125)3473 fax : +61 2(6125)5549 Centre for Mathematics& Its Applications, Room 1194, John Dedman Mathematical Sciences Building (Building 27) Australian National University, Canberra ACT 0200. http://www.maths.anu.edu.au/~johnm On 26/01/2012, at 1:54 PM, David Duffy wrote:
On Tue, 24 Jan 2012, Charles Determan Jr wrote:
Greetings, I have been working on R for some time now and I have begun the endeavor of trying to replicate some SAS code in R. I have scoured the forums but
This is also the Orthodont dataset, distributed with nlme. As David Atkins pointed out, R defaults to Type I SS. so you would need to use, for example, the Anova() command from the car package. The other thing is that the SAS F statistics are only approximate, depending on which covariance structure is chosen (perhaps John Maindonald or someone clever could comment), so SAS offers different possibilities for ddf eg http://www2.sas.com/proceedings/sugi26/p262-26.pdf while lme and lmer offer one or none. -- | David Duffy (MBBS PhD) ,-_|\ | email: davidD at qimr.edu.au ph: INT+61+7+3362-0217 fax: -0101 / * | Epidemiology Unit, Queensland Institute of Medical Research \_,-._/ | 300 Herston Rd, Brisbane, Queensland 4029, Australia GPG 4D0B994A v
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models ----------------------------------------------------------------------- Confidentiality Notice: This e-mail message, including any attachments, is for the sole use of the intended recipient(s) and may contain privileged and confidential information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply e-mail and destroy all copies of the original message. _______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
OK, I've looked at that reference. There are 2 aspects of an estimate like a SS. The first is the stability of the estimate, and the second is the interpretation of the estimate. The issues with the interpretation of the different estimates go back to 1970, and they are simply a matter of interpretation. The point of the Venables discussion is that he does not like Type III SS, not that they are wrong. He does not agree with the interpretation. The issue here is the accuracy of the Type III or Type I or Type II or whatever. Accuracy comes before interpretation. If the r module and SAS do not arrive at the same estimates, that is an important thing. Once we agree upon computation, we can argue about interpretation. Charles Determan is inquiring as to computational accuracy. The use and interpretation of the various Type I, II, III, IV, LVX SS are secondary. -----Original Message----- From: r-sig-mixed-models-bounces at r-project.org [mailto:r-sig-mixed-models-bounces at r-project.org] On Behalf Of Luca Borger Sent: Thursday, January 26, 2012 9:03 AM To: r-sig-mixed-models at r-project.org Subject: Re: [R-sig-ME] Trouble Replicating Unstructured Mixed Procedure in R I think: http://www.stats.ox.ac.uk/pub/MASS3/Exegeses.pdf HTH Luca Le 26/01/2012 15:52, Thompson,Paul a ?crit :
I am unfamiliar with this critique of Type III SS. Can you point me to a reference discussing the difficulties with Type III SS? -----Original Message----- From: r-sig-mixed-models-bounces at r-project.org [mailto:r-sig-mixed-models-bounces at r-project.org] On Behalf Of John Maindonald Sent: Wednesday, January 25, 2012 11:19 PM To: David Duffy Cc: r-sig-mixed-models at r-project.org Subject: Re: [R-sig-ME] Trouble Replicating Unstructured Mixed Procedure in R It is well to note that type III sums of squares are problematic. For testing the effects of a main effect, the null model is constraining the main effect in a manner that depends on the parameterisation. There are situations where it makes sense to fit interactions without main effects, and it is clear what constraint on the main effect is the relevant null (with an interaction between a factor and a variable, does one want all lines to go though the same point, or through perhaps the origin?), but that situation is unusual. For lines that are separate or all through the one point, one does not need type III sums of squares. Analyses often or frequently have enough genuine complications worrying (unless it is blindingly obvious that one ought to worry about it) without the rarely relevant complication of attending to a type III sum of squares. I'd guess that SAS and lme are, effectively, making different assumptions about the intended generalisation. They are clearly using different denominator degrees of freedom for F. As one is looking for consistency across the 27 different youths, SAS's denominator degrees of freedom for the interaction seem more or less right, pretty much equivalent to calculating slopes for females and slopes for males and using a t-test to compare them. (Sure, in the analyses presented, age has been treated as a categorical variable, but the comment still applies.) John Maindonald email: john.maindonald at anu.edu.au phone : +61 2 (6125)3473 fax : +61 2(6125)5549 Centre for Mathematics& Its Applications, Room 1194, John Dedman Mathematical Sciences Building (Building 27) Australian National University, Canberra ACT 0200. http://www.maths.anu.edu.au/~johnm On 26/01/2012, at 1:54 PM, David Duffy wrote:
On Tue, 24 Jan 2012, Charles Determan Jr wrote:
Greetings, I have been working on R for some time now and I have begun the endeavor of trying to replicate some SAS code in R. I have scoured the forums but
This is also the Orthodont dataset, distributed with nlme. As David Atkins pointed out, R defaults to Type I SS. so you would need to use, for example, the Anova() command from the car package. The other thing is that the SAS F statistics are only approximate, depending on which covariance structure is chosen (perhaps John Maindonald or someone clever could comment), so SAS offers different possibilities for ddf eg http://www2.sas.com/proceedings/sugi26/p262-26.pdf while lme and lmer offer one or none. -- | David Duffy (MBBS PhD) ,-_|\ | email: davidD at qimr.edu.au ph: INT+61+7+3362-0217 fax: -0101 / * | Epidemiology Unit, Queensland Institute of Medical Research \_,-._/ | 300 Herston Rd, Brisbane, Queensland 4029, Australia GPG 4D0B994A v
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models ----------------------------------------------------------------------- Confidentiality Notice: This e-mail message, including any attachments, is for the sole use of the intended recipient(s) and may contain privileged and confidential information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply e-mail and destroy all copies of the original message. _______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models ----------------------------------------------------------------------- Confidentiality Notice: This e-mail message, including any attachments, is for the sole use of the intended recipient(s) and may contain privileged and confidential information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply e-mail and destroy all copies of the original message.
It is not really a matter of computational accuracy. One can get highly accurate values for an inappropriate statistic. Or if there is insistence on using the word, accuracy, what is the meaning? i) the wrong formula is used? Then in what sense is it 'wrong'? ii) there is a numerical inaccuracy in the calculation? This is almost never an issue in a relatively simple calculation such as this, given the care taken by the code writers in such matters. iii) where an approximation is used, as in using an F-distribution approximation, is the best choice of degrees of freedom made to for use of this approximation? I judge that the degrees of freedom for lme's F-statistic for the interaction are not well chosen. Users really have to sort this out for themselves, rather than relying on what may be a fairly wild approximation that appears in lm's output. Using 75df rather than 25df does not however make the difference that a choice between (e.g.) 5df and 25df would. A further and more basic issue is whether the statistic that is provided is appropriate to the intended generalisation. I'd take this to be generalisation to another sample of youths from the same population. In order to understand why R and SAS are giving different F-statistics for the interaction, one needs to understand just what variance-covariance structure is assumed in each case. One might extract the two estimates of the var-cov structure and compare them. Look for terms in one that do not appear, or maybe that are zero, in the other. Finally, it is not just that Venables does not like type III SS. He is saying that they almost never correspond to a null hypothesis that makes any sense. Those who disagree try to write down the model to which the null hypothesis corresponds in testing for the main effect of factor1 with a factor1:factor2 interaction. John Maindonald email: john.maindonald at anu.edu.au phone : +61 2 (6125)3473 fax : +61 2(6125)5549 Centre for Mathematics & Its Applications, Room 1194, John Dedman Mathematical Sciences Building (Building 27) Australian National University, Canberra ACT 0200. http://www.maths.anu.edu.au/~johnm
On 27/01/2012, at 2:41 AM, Thompson,Paul wrote:
OK, I've looked at that reference. There are 2 aspects of an estimate like a SS. The first is the stability of the estimate, and the second is the interpretation of the estimate. The issues with the interpretation of the different estimates go back to 1970, and they are simply a matter of interpretation. The point of the Venables discussion is that he does not like Type III SS, not that they are wrong. He does not agree with the interpretation. The issue here is the accuracy of the Type III or Type I or Type II or whatever. Accuracy comes before interpretation. If the r module and SAS do not arrive at the same estimates, that is an important thing. Once we agree upon computation, we can argue about interpretation. Charles Determan is inquiring as to computational accuracy. The use and interpretation of the various Type I, II, III, IV, LVX SS are secondary. -----Original Message----- From: r-sig-mixed-models-bounces at r-project.org [mailto:r-sig-mixed-models-bounces at r-project.org] On Behalf Of Luca Borger Sent: Thursday, January 26, 2012 9:03 AM To: r-sig-mixed-models at r-project.org Subject: Re: [R-sig-ME] Trouble Replicating Unstructured Mixed Procedure in R I think: http://www.stats.ox.ac.uk/pub/MASS3/Exegeses.pdf HTH Luca Le 26/01/2012 15:52, Thompson,Paul a ?crit :
I am unfamiliar with this critique of Type III SS. Can you point me to a reference discussing the difficulties with Type III SS? -----Original Message----- From: r-sig-mixed-models-bounces at r-project.org [mailto:r-sig-mixed-models-bounces at r-project.org] On Behalf Of John Maindonald Sent: Wednesday, January 25, 2012 11:19 PM To: David Duffy Cc: r-sig-mixed-models at r-project.org Subject: Re: [R-sig-ME] Trouble Replicating Unstructured Mixed Procedure in R It is well to note that type III sums of squares are problematic. For testing the effects of a main effect, the null model is constraining the main effect in a manner that depends on the parameterisation. There are situations where it makes sense to fit interactions without main effects, and it is clear what constraint on the main effect is the relevant null (with an interaction between a factor and a variable, does one want all lines to go though the same point, or through perhaps the origin?), but that situation is unusual. For lines that are separate or all through the one point, one does not need type III sums of squares. Analyses often or frequently have enough genuine complications worrying (unless it is blindingly obvious that one ought to worry about it) without the rarely relevant complication of attending to a type III sum of squares. I'd guess that SAS and lme are, effectively, making different assumptions about the intended generalisation. They are clearly using different denominator degrees of freedom for F. As one is looking for consistency across the 27 different youths, SAS's denominator degrees of freedom for the interaction seem more or less right, pretty much equivalent to calculating slopes for females and slopes for males and using a t-test to compare them. (Sure, in the analyses presented, age has been treated as a categorical variable, but the comment still applies.) John Maindonald email: john.maindonald at anu.edu.au phone : +61 2 (6125)3473 fax : +61 2(6125)5549 Centre for Mathematics& Its Applications, Room 1194, John Dedman Mathematical Sciences Building (Building 27) Australian National University, Canberra ACT 0200. http://www.maths.anu.edu.au/~johnm On 26/01/2012, at 1:54 PM, David Duffy wrote:
On Tue, 24 Jan 2012, Charles Determan Jr wrote:
Greetings, I have been working on R for some time now and I have begun the endeavor of trying to replicate some SAS code in R. I have scoured the forums but
This is also the Orthodont dataset, distributed with nlme. As David Atkins pointed out, R defaults to Type I SS. so you would need to use, for example, the Anova() command from the car package. The other thing is that the SAS F statistics are only approximate, depending on which covariance structure is chosen (perhaps John Maindonald or someone clever could comment), so SAS offers different possibilities for ddf eg http://www2.sas.com/proceedings/sugi26/p262-26.pdf while lme and lmer offer one or none. -- | David Duffy (MBBS PhD) ,-_|\ | email: davidD at qimr.edu.au ph: INT+61+7+3362-0217 fax: -0101 / * | Epidemiology Unit, Queensland Institute of Medical Research \_,-._/ | 300 Herston Rd, Brisbane, Queensland 4029, Australia GPG 4D0B994A v
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models ----------------------------------------------------------------------- Confidentiality Notice: This e-mail message, including any attachments, is for the sole use of the intended recipient(s) and may contain privileged and confidential information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply e-mail and destroy all copies of the original message. _______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models ----------------------------------------------------------------------- Confidentiality Notice: This e-mail message, including any attachments, is for the sole use of the intended recipient(s) and may contain privileged and confidential information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply e-mail and destroy all copies of the original message. _______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-sig-mixed-models/attachments/20120126/26eaf876/attachment-0001.pl>
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-sig-mixed-models/attachments/20120126/75056ae5/attachment-0001.pl>
I've twigged that "unstructured" means a variance-covariance matrix that ignores the time structure, i.e., each element is estimated separately. One can do this in lme4, thus: Orthodont$Age <- factor(Orthodont$age)
orth.lmer <- lmer(distance ~ Sex*Age+((Sex*Age)|Subject),
+ data=Orthodont)
anova(orth.lmer)
Analysis of Variance Table
Df Sum Sq Mean Sq F value
Sex 1 9.591 9.5905 33.9645
Age 3 34.876 11.6252 41.1703
Sex:Age 3 2.930 0.9768 3.4594
The sum of squares for the Sex*Age interaction agrees with that from SAS.
lmer() leaves the user to work out an appropriate df for the F-statistic. One
has, in agreement with the SAS output:
1-pf(2.93, 3, 25)
[1] 0.05318945 The main effects SS and F's are of course not expected to agree and indeed, in my strong view, true SAS type III SS's are inappropriate. I regard this R analysis however as an inelegant use of the data, with low power. My book with John Braun (3rd edn and if I recall correctly, 2nd) has an analysis that I am prepared to defend that uses lmer() from lme4. Several years ago, I placed on the web a version of the analysis that uses lme() from nlme: http://maths.anu.edu.au/~johnm/r-book/xtras/mlm-lme.pdf I'd forgotten that it was there. This is a list for R users. You will not necessarily find folk here with a high level of SAS expertise, maybe not even enough to understand what the SAS documentation means when it uses the term "unstructured". Charles, I think you've done pretty well in the amount of free advice that you have received. I think too that Steven McKinney's point is well made. Kevin's interpretation of the EULA is probably more or less correct, but the document comes across rather more ferociously than he suggests, and it is not a good starting point for scientific assessment of the respective methodologies. Many of us have other reasons why we are not very interested in SAS does. The implied threat in the EULA provides, even if we are not party to any such agreement, just one more reason why SAS is not to any large extent in our path ahead to the future. John Maindonald email: john.maindonald at anu.edu.au phone : +61 2 (6125)3473 fax : +61 2(6125)5549 Centre for Mathematics & Its Applications, Room 1194, John Dedman Mathematical Sciences Building (Building 27) Australian National University, Canberra ACT 0200. http://www.maths.anu.edu.au/~johnm
On 27/01/2012, at 4:03 PM, Kevin Wright wrote:
Dear Charles,
First, I hope you are not put off by the tone of some the responses in this
email thread. Sometimes people have strong opinions and it might come
across aggressively.
Second, be sure to understand that reproducing a SAS analysis with lme in
no way violates any legal agreements that SAS may have, if for no other
reason than you never signed an agreement with SAS! That bit in the EULA
about decompiling and reverse engineering means that people are prohibited
from creating a new version of PROC MIXED that does the same thing. The
nlme package uses different methods than SAS. E.g. different optimizers,
even uses a log-parameterization deep in the code so that negative variance
components cannot happen.
Third, by now you've probably figured out that PROC MIXED and lme have very
different ideas about degrees of freedom. Also, the loglikelihoods are on
different scales. For that reason, when I try to reproduce an analysis, I
find the best way to compare is to look at the variance components.
Here is what PROC MIXED says:
Cov Parm Estimate Std Error Z Pr > |Z|
UN(1,1) 5.41545455 1.53172185 3.54 0.0004
UN(2,1) 2.71681818 1.09623989 2.48 0.0132
UN(2,2) 4.18477273 1.18363247 3.54 0.0004
UN(3,1) 3.91022727 1.41775367 2.76 0.0058
UN(3,2) 2.92715909 1.19304751 2.45 0.0141
UN(3,3) 6.45573864 1.82595863 3.54 0.0004
UN(4,1) 2.71022727 1.17209851 2.31 0.0208
UN(4,2) 3.31715909 1.12903016 2.94 0.0033
UN(4,3) 4.13073864 1.40356157 2.94 0.0033
UN(4,4) 4.98573864 1.41017984 3.54 0.0004
Residual 1.00000000 . . .
That's our target, and here is the lme code to get us there.
library(nlme)
orth <- as.data.frame(Orthodont)
m1 <- gls(distance ~ Sex*age,
correlation=corSymm(form = ~ 1 | Subject),
weights = varIdent(form = ~ 1 | age),
data = orth)
cors <- corMatrix(m1$modelStruct$corStruct)[[1]]
vars <- c(1.0000000, 0.8788793, 1.0744596, 0.9586878)^2 * 2.329213^2
covs <- outer(vars,vars, function(x,y) sqrt(x)*sqrt(y))
cors*covs
Now, some explanations will surely help, so let's step through the code
with comments.
The Orthodont data is a groupedData object, which can be helpful, but
sometimes confusing, so coercing to a data.frame removes the formula--you
can see it with 'formula(Orthodont)'.
orth <- as.data.frame(Orthodont)
Since there is no "random" line in the PROC MIXED code, we don't want to
use 'lme' (also why I removed the formula from the data), but instead use
'gls' for generalized least squares. The 'corSymm' part specifies a
symmetric correlation matrix with 1 on the diagonal. Looking at the MIXED
parameters, the results are given a covariances, not correlations. Note
how the variances UN(1,1), UN(2,2,), etc are different, not constant along
the diagonal. We use the 'weights' statement to specify different stratum
variances.
m1 <- gls(distance ~ Sex*age,
correlation=corSymm(form = ~ 1 | Subject),
weights = varIdent(form = ~ 1 | age),
data = orth)
Now we have to convert the correlations and standard deviations of lme into
covariance parameters of MIXED.
First, extract the correlation matrix from lme.
cors <- corMatrix(m1$modelStruct$corStruct)[[1]]
Now, hard-code the stratum std deviations and square them to get
variances. Multiply by the square of the residual std dev. (There's a way
to extract all these values from the fitted model instead of hand-typing
them, but I can't find them at the moment.)
vars <- c(1.0000000, 0.8788793, 1.0744596, 0.9586878)^2 * 2.329213^2
Now create a matrix of variances and covariances.
covs <- outer(vars,vars, function(x,y) sqrt(x)*sqrt(y))
Finally, multiply the correlations by the covariances. Let's compare
results:
PROC MIXED:
Row Col1 Col2 Col3 Col4
1 5.415 2.716 3.910 2.710
2 2.716 4.184 2.927 3.317
3 3.910 2.927 6.455 4.130
4 2.710 3.317 4.130 4.985
R> round(cors*covs,3)
[,1] [,2] [,3] [,4]
[1,] 5.425 2.709 3.841 2.715
[2,] 2.709 4.191 2.975 3.314
[3,] 3.841 2.975 6.263 4.133
[4,] 2.715 3.314 4.133 4.986
In my experience, the small differences in the results are typical for
unstructured models.
Hope this helps.
Kevin Wright
On Tue, Jan 24, 2012 at 8:32 PM, Charles Determan Jr <deter088 at umn.edu>wrote:
Greetings,
I have been working on R for some time now and I have begun the endeavor of
trying to replicate some SAS code in R. I have scoured the forums but
haven't been able to find an answer. I hope one of you could be so kind as
to enlighten me.
I am attempting to replicate a repeated measures experiment using some
standard data. I have posted the SAS code and output directly from a
publication as well as my attempts in R to replicate it. My main issue
comes with the 'unstructured' component.
The 'dental' dataset from 'mixedQF' package,
equivalent to formixed data in SAS
distance age Subject Sex
1 26.0 8 M01 Male
2 25.0 10 M01 Male
3 29.0 12 M01 Male
4 31.0 14 M01 Male
5 21.5 8 M02 Male
6 22.5 10 M02 Male
7 23.0 12 M02 Male
8 26.5 14 M02 Male
9 23.0 8 M03 Male
10 22.5 10 M03 Male
11 24.0 12 M03 Male
12 27.5 14 M03 Male
13 25.5 8 M04 Male
14 27.5 10 M04 Male
15 26.5 12 M04 Male
16 27.0 14 M04 Male
17 20.0 8 M05 Male
18 23.5 10 M05 Male
19 22.5 12 M05 Male
20 26.0 14 M05 Male
21 24.5 8 M06 Male
22 25.5 10 M06 Male
23 27.0 12 M06 Male
24 28.5 14 M06 Male
25 22.0 8 M07 Male
26 22.0 10 M07 Male
27 24.5 12 M07 Male
28 26.5 14 M07 Male
29 24.0 8 M08 Male
30 21.5 10 M08 Male
31 24.5 12 M08 Male
32 25.5 14 M08 Male
33 23.0 8 M09 Male
34 20.5 10 M09 Male
35 31.0 12 M09 Male
36 26.0 14 M09 Male
37 27.5 8 M10 Male
38 28.0 10 M10 Male
39 31.0 12 M10 Male
40 31.5 14 M10 Male
41 23.0 8 M11 Male
42 23.0 10 M11 Male
43 23.5 12 M11 Male
44 25.0 14 M11 Male
45 21.5 8 M12 Male
46 23.5 10 M12 Male
47 24.0 12 M12 Male
48 28.0 14 M12 Male
49 17.0 8 M13 Male
50 24.5 10 M13 Male
51 26.0 12 M13 Male
52 29.5 14 M13 Male
53 22.5 8 M14 Male
54 25.5 10 M14 Male
55 25.5 12 M14 Male
56 26.0 14 M14 Male
57 23.0 8 M15 Male
58 24.5 10 M15 Male
59 26.0 12 M15 Male
60 30.0 14 M15 Male
61 22.0 8 M16 Male
62 21.5 10 M16 Male
63 23.5 12 M16 Male
64 25.0 14 M16 Male
65 21.0 8 F01 Female
66 20.0 10 F01 Female
67 21.5 12 F01 Female
68 23.0 14 F01 Female
69 21.0 8 F02 Female
70 21.5 10 F02 Female
71 24.0 12 F02 Female
72 25.5 14 F02 Female
73 20.5 8 F03 Female
74 24.0 10 F03 Female
75 24.5 12 F03 Female
76 26.0 14 F03 Female
77 23.5 8 F04 Female
78 24.5 10 F04 Female
79 25.0 12 F04 Female
80 26.5 14 F04 Female
81 21.5 8 F05 Female
82 23.0 10 F05 Female
83 22.5 12 F05 Female
84 23.5 14 F05 Female
85 20.0 8 F06 Female
86 21.0 10 F06 Female
87 21.0 12 F06 Female
88 22.5 14 F06 Female
89 21.5 8 F07 Female
90 22.5 10 F07 Female
91 23.0 12 F07 Female
92 25.0 14 F07 Female
93 23.0 8 F08 Female
94 23.0 10 F08 Female
95 23.5 12 F08 Female
96 24.0 14 F08 Female
97 20.0 8 F09 Female
98 21.0 10 F09 Female
99 22.0 12 F09 Female
100 21.5 14 F09 Female
101 16.5 8 F10 Female
102 19.0 10 F10 Female
103 19.0 12 F10 Female
104 19.5 14 F10 Female
105 24.5 8 F11 Female
106 25.0 10 F11 Female
107 28.0 12 F11 Female
108 28.0 14 F11 Female
*Mixed modeling and fixed effect test*
SAS
proc mixed data=formixed;
class gender age person;
model y = gender|age;
repeated / type=cs sub=person;
run;
output of interest to me
Tests of Fixed Effects
Source NDF DDF Type III F Pr > F
GENDER 1 25 9.29 0.0054
AGE 3 75 35.35 0.0001
GENDER*AGE 3 75 2.36 0.0781
R (nlme package)
y=lme(distance~Sex*age, random=(~1|Subject), data=dental)
anova(y)
numDF denDF F-value p-value
(Intercept) 1 75 4123.156 <.0001
Sex 1 25 9.292 0.0054
age 3 75 40.032 <.0001
Sex:age 3 75 2.362 0.0781
Now this isn't exact but it is extremely close, however when I try to
replicate the unstructured,
SAS
proc mixed data=formixed;
class gender age person;
model y = gender|age;
repeated / type=un sub=person;
run;
Tests of Fixed Effects
Source NDF DDF Type III F Pr > F
GENDER 1 25 9.29 0.0054
AGE 3 25 34.45 0.0001
GENDER*AGE 3 25 2.93 0.0532
R
either
y=lme(distance~Sex*age, random=(~1|Subject), corr=corSymm(,~1|Subject),
data=dental)
anova(y)
or
z=lme(distance~Sex*age, random=(~1|Subject), corr=corSymm(), data=dental)
anova(z)
gives the output
numDF denDF F-value p-value
(Intercept) 1 75 4052.028 <.0001
Sex 1 25 8.462 0.0075
age 3 75 39.022 <.0001
Sex:age 3 75 2.868 0.0421
What am I doing wrong to replicate the unstructured linear mixed model from
SAS?
Regards,
Charles
[[alternative HTML version deleted]]
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
-- Kevin Wright [[alternative HTML version deleted]]
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-sig-mixed-models/attachments/20120127/b326ed53/attachment-0001.pl>
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-sig-mixed-models/attachments/20120127/ebcca832/attachment-0001.pl>
Charles, I guess that SAS is using the Satterthwaite formula to approximate denominator degrees of freedom (google 'Satterthwaite'). See Kenward and Roger (1997) Biometrics 53 983-997 for another approach.
Charles Determan Jr wrote:
Thank you John, I truly appreciate all the advice I have received. At no point to I feel entitled for someone to provide this for me. I do hope that others may find this thread useful as well. If I may just as one more question. Is there an accepted way to calculate the degrees of freedom in this case to come to the value of 25? This has been something I have been trying to determine. Again, thank you to all for providing me with all the information you have, Charles On Fri, Jan 27, 2012 at 12:10 AM, John Maindonald < john.maindonald at anu.edu.au> wrote:
I've twigged that "unstructured" means a variance-covariance matrix that ignores the time structure, i.e., each element is estimated separately. One can do this in lme4, thus: Orthodont$Age <- factor(Orthodont$age)
orth.lmer <- lmer(distance ~ Sex*Age+((Sex*Age)|Subject),
+ data=Orthodont)
anova(orth.lmer)
Analysis of Variance Table
Df Sum Sq Mean Sq F value
Sex 1 9.591 9.5905 33.9645
Age 3 34.876 11.6252 41.1703
Sex:Age 3 2.930 0.9768 3.4594
The sum of squares for the Sex*Age interaction agrees with that from SAS.
lmer() leaves the user to work out an appropriate df for the F-statistic.
One
has, in agreement with the SAS output:
1-pf(2.93, 3, 25)
[1] 0.05318945 The main effects SS and F's are of course not expected to agree and indeed, in my strong view, true SAS type III SS's are inappropriate. I regard this R analysis however as an inelegant use of the data, with low power. My book with John Braun (3rd edn and if I recall correctly, 2nd) has an analysis that I am prepared to defend that uses lmer() from lme4. Several years ago, I placed on the web a version of the analysis that uses lme() from nlme: http://maths.anu.edu.au/~johnm/r-book/xtras/mlm-lme.pdf I'd forgotten that it was there. This is a list for R users. You will not necessarily find folk here with a high level of SAS expertise, maybe not even enough to understand what the SAS documentation means when it uses the term "unstructured". Charles, I think you've done pretty well in the amount of free advice that you have received. I think too that Steven McKinney's point is well made. Kevin's interpretation of the EULA is probably more or less correct, but the document comes across rather more ferociously than he suggests, and it is not a good starting point for scientific assessment of the respective methodologies. Many of us have other reasons why we are not very interested in SAS does. The implied threat in the EULA provides, even if we are not party to any such agreement, just one more reason why SAS is not to any large extent in our path ahead to the future. John Maindonald email: john.maindonald at anu.edu.au phone : +61 2 (6125)3473 fax : +61 2(6125)5549 Centre for Mathematics & Its Applications, Room 1194, John Dedman Mathematical Sciences Building (Building 27) Australian National University, Canberra ACT 0200. http://www.maths.anu.edu.au/~johnm On 27/01/2012, at 4:03 PM, Kevin Wright wrote:
Dear Charles, First, I hope you are not put off by the tone of some the responses in
this
email thread. Sometimes people have strong opinions and it might come across aggressively. Second, be sure to understand that reproducing a SAS analysis with lme in no way violates any legal agreements that SAS may have, if for no other reason than you never signed an agreement with SAS! That bit in the EULA about decompiling and reverse engineering means that people are
prohibited
from creating a new version of PROC MIXED that does the same thing. The nlme package uses different methods than SAS. E.g. different optimizers, even uses a log-parameterization deep in the code so that negative
variance
components cannot happen. Third, by now you've probably figured out that PROC MIXED and lme have
very
different ideas about degrees of freedom. Also, the loglikelihoods are
on
different scales. For that reason, when I try to reproduce an analysis,
I
find the best way to compare is to look at the variance components.
Here is what PROC MIXED says:
Cov Parm Estimate Std Error Z Pr > |Z|
UN(1,1) 5.41545455 1.53172185 3.54 0.0004
UN(2,1) 2.71681818 1.09623989 2.48 0.0132
UN(2,2) 4.18477273 1.18363247 3.54 0.0004
UN(3,1) 3.91022727 1.41775367 2.76 0.0058
UN(3,2) 2.92715909 1.19304751 2.45 0.0141
UN(3,3) 6.45573864 1.82595863 3.54 0.0004
UN(4,1) 2.71022727 1.17209851 2.31 0.0208
UN(4,2) 3.31715909 1.12903016 2.94 0.0033
UN(4,3) 4.13073864 1.40356157 2.94 0.0033
UN(4,4) 4.98573864 1.41017984 3.54 0.0004
Residual 1.00000000 . . .
That's our target, and here is the lme code to get us there.
library(nlme)
orth <- as.data.frame(Orthodont)
m1 <- gls(distance ~ Sex*age,
correlation=corSymm(form = ~ 1 | Subject),
weights = varIdent(form = ~ 1 | age),
data = orth)
cors <- corMatrix(m1$modelStruct$corStruct)[[1]]
vars <- c(1.0000000, 0.8788793, 1.0744596, 0.9586878)^2 * 2.329213^2
covs <- outer(vars,vars, function(x,y) sqrt(x)*sqrt(y))
cors*covs
Now, some explanations will surely help, so let's step through the code
with comments.
The Orthodont data is a groupedData object, which can be helpful, but
sometimes confusing, so coercing to a data.frame removes the formula--you
can see it with 'formula(Orthodont)'.
orth <- as.data.frame(Orthodont)
Since there is no "random" line in the PROC MIXED code, we don't want to
use 'lme' (also why I removed the formula from the data), but instead use
'gls' for generalized least squares. The 'corSymm' part specifies a
symmetric correlation matrix with 1 on the diagonal. Looking at the
MIXED
parameters, the results are given a covariances, not correlations. Note how the variances UN(1,1), UN(2,2,), etc are different, not constant
along
the diagonal. We use the 'weights' statement to specify different
stratum
variances.
m1 <- gls(distance ~ Sex*age,
correlation=corSymm(form = ~ 1 | Subject),
weights = varIdent(form = ~ 1 | age),
data = orth)
Now we have to convert the correlations and standard deviations of lme
into
covariance parameters of MIXED. First, extract the correlation matrix from lme. cors <- corMatrix(m1$modelStruct$corStruct)[[1]] Now, hard-code the stratum std deviations and square them to get variances. Multiply by the square of the residual std dev. (There's a
way
to extract all these values from the fitted model instead of hand-typing
them, but I can't find them at the moment.)
vars <- c(1.0000000, 0.8788793, 1.0744596, 0.9586878)^2 * 2.329213^2
Now create a matrix of variances and covariances.
covs <- outer(vars,vars, function(x,y) sqrt(x)*sqrt(y))
Finally, multiply the correlations by the covariances. Let's compare
results:
PROC MIXED:
Row Col1 Col2 Col3 Col4
1 5.415 2.716 3.910 2.710
2 2.716 4.184 2.927 3.317
3 3.910 2.927 6.455 4.130
4 2.710 3.317 4.130 4.985
R> round(cors*covs,3)
[,1] [,2] [,3] [,4]
[1,] 5.425 2.709 3.841 2.715
[2,] 2.709 4.191 2.975 3.314
[3,] 3.841 2.975 6.263 4.133
[4,] 2.715 3.314 4.133 4.986
In my experience, the small differences in the results are typical for
unstructured models.
Hope this helps.
Kevin Wright
On Tue, Jan 24, 2012 at 8:32 PM, Charles Determan Jr <deter088 at umn.edu
wrote:
Greetings, I have been working on R for some time now and I have begun the
endeavor of
trying to replicate some SAS code in R. I have scoured the forums but haven't been able to find an answer. I hope one of you could be so
kind as
to enlighten me.
I am attempting to replicate a repeated measures experiment using some
standard data. I have posted the SAS code and output directly from a
publication as well as my attempts in R to replicate it. My main issue
comes with the 'unstructured' component.
The 'dental' dataset from 'mixedQF' package,
equivalent to formixed data in SAS
distance age Subject Sex
1 26.0 8 M01 Male
2 25.0 10 M01 Male
3 29.0 12 M01 Male
4 31.0 14 M01 Male
5 21.5 8 M02 Male
6 22.5 10 M02 Male
7 23.0 12 M02 Male
8 26.5 14 M02 Male
9 23.0 8 M03 Male
10 22.5 10 M03 Male
11 24.0 12 M03 Male
12 27.5 14 M03 Male
13 25.5 8 M04 Male
14 27.5 10 M04 Male
15 26.5 12 M04 Male
16 27.0 14 M04 Male
17 20.0 8 M05 Male
18 23.5 10 M05 Male
19 22.5 12 M05 Male
20 26.0 14 M05 Male
21 24.5 8 M06 Male
22 25.5 10 M06 Male
23 27.0 12 M06 Male
24 28.5 14 M06 Male
25 22.0 8 M07 Male
26 22.0 10 M07 Male
27 24.5 12 M07 Male
28 26.5 14 M07 Male
29 24.0 8 M08 Male
30 21.5 10 M08 Male
31 24.5 12 M08 Male
32 25.5 14 M08 Male
33 23.0 8 M09 Male
34 20.5 10 M09 Male
35 31.0 12 M09 Male
36 26.0 14 M09 Male
37 27.5 8 M10 Male
38 28.0 10 M10 Male
39 31.0 12 M10 Male
40 31.5 14 M10 Male
41 23.0 8 M11 Male
42 23.0 10 M11 Male
43 23.5 12 M11 Male
44 25.0 14 M11 Male
45 21.5 8 M12 Male
46 23.5 10 M12 Male
47 24.0 12 M12 Male
48 28.0 14 M12 Male
49 17.0 8 M13 Male
50 24.5 10 M13 Male
51 26.0 12 M13 Male
52 29.5 14 M13 Male
53 22.5 8 M14 Male
54 25.5 10 M14 Male
55 25.5 12 M14 Male
56 26.0 14 M14 Male
57 23.0 8 M15 Male
58 24.5 10 M15 Male
59 26.0 12 M15 Male
60 30.0 14 M15 Male
61 22.0 8 M16 Male
62 21.5 10 M16 Male
63 23.5 12 M16 Male
64 25.0 14 M16 Male
65 21.0 8 F01 Female
66 20.0 10 F01 Female
67 21.5 12 F01 Female
68 23.0 14 F01 Female
69 21.0 8 F02 Female
70 21.5 10 F02 Female
71 24.0 12 F02 Female
72 25.5 14 F02 Female
73 20.5 8 F03 Female
74 24.0 10 F03 Female
75 24.5 12 F03 Female
76 26.0 14 F03 Female
77 23.5 8 F04 Female
78 24.5 10 F04 Female
79 25.0 12 F04 Female
80 26.5 14 F04 Female
81 21.5 8 F05 Female
82 23.0 10 F05 Female
83 22.5 12 F05 Female
84 23.5 14 F05 Female
85 20.0 8 F06 Female
86 21.0 10 F06 Female
87 21.0 12 F06 Female
88 22.5 14 F06 Female
89 21.5 8 F07 Female
90 22.5 10 F07 Female
91 23.0 12 F07 Female
92 25.0 14 F07 Female
93 23.0 8 F08 Female
94 23.0 10 F08 Female
95 23.5 12 F08 Female
96 24.0 14 F08 Female
97 20.0 8 F09 Female
98 21.0 10 F09 Female
99 22.0 12 F09 Female
100 21.5 14 F09 Female
101 16.5 8 F10 Female
102 19.0 10 F10 Female
103 19.0 12 F10 Female
104 19.5 14 F10 Female
105 24.5 8 F11 Female
106 25.0 10 F11 Female
107 28.0 12 F11 Female
108 28.0 14 F11 Female
*Mixed modeling and fixed effect test*
SAS
proc mixed data=formixed;
class gender age person;
model y = gender|age;
repeated / type=cs sub=person;
run;
output of interest to me
Tests of Fixed Effects
Source NDF DDF Type III F Pr > F
GENDER 1 25 9.29 0.0054
AGE 3 75 35.35 0.0001
GENDER*AGE 3 75 2.36 0.0781
R (nlme package)
y=lme(distance~Sex*age, random=(~1|Subject), data=dental)
anova(y)
numDF denDF F-value p-value
(Intercept) 1 75 4123.156 <.0001
Sex 1 25 9.292 0.0054
age 3 75 40.032 <.0001
Sex:age 3 75 2.362 0.0781
Now this isn't exact but it is extremely close, however when I try to
replicate the unstructured,
SAS
proc mixed data=formixed;
class gender age person;
model y = gender|age;
repeated / type=un sub=person;
run;
Tests of Fixed Effects
Source NDF DDF Type III F Pr > F
GENDER 1 25 9.29 0.0054
AGE 3 25 34.45 0.0001
GENDER*AGE 3 25 2.93 0.0532
R
either
y=lme(distance~Sex*age, random=(~1|Subject), corr=corSymm(,~1|Subject),
data=dental)
anova(y)
or
z=lme(distance~Sex*age, random=(~1|Subject), corr=corSymm(),
data=dental)
anova(z)
gives the output
numDF denDF F-value p-value
(Intercept) 1 75 4052.028 <.0001
Sex 1 25 8.462 0.0075
age 3 75 39.022 <.0001
Sex:age 3 75 2.868 0.0421
What am I doing wrong to replicate the unstructured linear mixed model
from
SAS?
Regards,
Charles
[[alternative HTML version deleted]]
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
--
Kevin Wright
[[alternative HTML version deleted]]
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
[[alternative HTML version deleted]]
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.
It seems to me that the SAS documentation ought to tell you what algorithm it uses to compute the degrees of freedom. Perhaps they do that by referencing a published paper. If so, just read the original source to get the relevant formulas. The only reliable way to get the same DF that SAS provides is to use the same formula for calculating it. There may or may not be existing R code that implements the required formulas, but you need to know what you're looking for in order to find it. Steven J. Pierce, Ph.D. Associate Director Center for Statistical Training & Consulting (CSTAT) Michigan State University E-mail: pierces1 at msu.edu Web: http://www.cstat.msu.edu -----Original Message----- From: Charles Determan Jr [mailto:deter088 at umn.edu] Sent: Friday, January 27, 2012 7:08 AM To: John Maindonald Cc: r-sig-mixed-models at r-project.org Subject: Re: [R-sig-ME] Trouble Replicating Unstructured Mixed Procedure in R Thank you John, I truly appreciate all the advice I have received. At no point to I feel entitled for someone to provide this for me. I do hope that others may find this thread useful as well. If I may just as one more question. Is there an accepted way to calculate the degrees of freedom in this case to come to the value of 25? This has been something I have been trying to determine. Again, thank you to all for providing me with all the information you have, Charles
My post may be redundant, but just in case I wanted to comment. I think Kevin's post addressed a few things that are helpful to start with. The original proc mixed syntax did not have a random statement. This is similar to what Pinheiro and Bates (Chapter 5) describe as an extended linear model. Therefore, it seems we should not be able to reproduce this with lme and nlme. Instead, we use gls. I think some people do not like to call this a mixed effect model because the intercept or slope are not random, but I can see why SAS would include it with proc mixed because of the similarities. The SAS reference for this is Littell (SAS for mixed models) Chapter 5. proc mixed data=formixed; class gender person; model y = gender|age/solution; repeated / type=un sub=person; run; gives me a similar result to gls( y ~ gender + age + gender*age , formixed, correlation = corSymm(form = ~ 1 | person)) SAS output: Intercept 15.84 gender 1.58 age 0.82 age*gender -.35 nlme output: Coefficients: (Intercept) 15.93 genderFemale 1.47 age 0.82 genderFemale:age -0.35
On Fri, Jan 27, 2012 at 12:03 AM, Kevin Wright <kw.stat at gmail.com> wrote:
Dear Charles, First, I hope you are not put off by the tone of some the responses in this email thread. ?Sometimes people have strong opinions and it might come across aggressively. Second, be sure to understand that reproducing a SAS analysis with lme in no way violates any legal agreements that SAS may have, if for no other reason than you never signed an agreement with SAS! ?That bit in the EULA about decompiling and reverse engineering means that people are prohibited from creating a new version of PROC MIXED that does the same thing. ?The nlme package uses different methods than SAS. E.g. different optimizers, even uses a log-parameterization deep in the code so that negative variance components cannot happen. Third, by now you've probably figured out that PROC MIXED and lme have very different ideas about degrees of freedom. ?Also, the loglikelihoods are on different scales. ?For that reason, when I try to reproduce an analysis, I find the best way to compare is to look at the variance components. Here is what PROC MIXED says: ? ?Cov Parm Estimate ? ? Std Error ? ? ? Z ?Pr > |Z| UN(1,1) ? ? 5.41545455 ? ?1.53172185 ? ?3.54 ? ?0.0004 UN(2,1) ? ? 2.71681818 ? ?1.09623989 ? ?2.48 ? ?0.0132 UN(2,2) ? ? 4.18477273 ? ?1.18363247 ? ?3.54 ? ?0.0004 UN(3,1) ? ? 3.91022727 ? ?1.41775367 ? ?2.76 ? ?0.0058 UN(3,2) ? ? 2.92715909 ? ?1.19304751 ? ?2.45 ? ?0.0141 UN(3,3) ? ? 6.45573864 ? ?1.82595863 ? ?3.54 ? ?0.0004 UN(4,1) ? ? 2.71022727 ? ?1.17209851 ? ?2.31 ? ?0.0208 UN(4,2) ? ? 3.31715909 ? ?1.12903016 ? ?2.94 ? ?0.0033 UN(4,3) ? ? 4.13073864 ? ?1.40356157 ? ?2.94 ? ?0.0033 UN(4,4) ? ? 4.98573864 ? ?1.41017984 ? ?3.54 ? ?0.0004 Residual ? ? ? ? 1.00000000 . ? ? ? . ? ? ? ? . That's our target, and here is the lme code to get us there. library(nlme) orth <- as.data.frame(Orthodont) m1 <- gls(distance ~ Sex*age, ? ? ? ? ?correlation=corSymm(form = ~ 1 | Subject), ? ? ? ? ?weights = varIdent(form = ~ 1 | age), ? ? ? ? ?data = orth) cors <- corMatrix(m1$modelStruct$corStruct)[[1]] vars <- c(1.0000000, 0.8788793, 1.0744596, 0.9586878)^2 * 2.329213^2 covs <- outer(vars,vars, function(x,y) sqrt(x)*sqrt(y)) cors*covs Now, some explanations will surely help, so let's step through the code with comments. The Orthodont data is a groupedData object, which can be helpful, but sometimes confusing, so coercing to a data.frame removes the formula--you can see it with 'formula(Orthodont)'. orth <- as.data.frame(Orthodont) Since there is no "random" line in the PROC MIXED code, we don't want to use 'lme' (also why I removed the formula from the data), but instead use 'gls' for generalized least squares. ?The 'corSymm' part specifies a symmetric correlation matrix with 1 on the diagonal. ?Looking at the MIXED parameters, the results are given a covariances, not correlations. ?Note how the variances UN(1,1), UN(2,2,), etc are different, not constant along the diagonal. ?We use the 'weights' statement to specify different stratum variances. m1 <- gls(distance ~ Sex*age, ? ? ? ? ?correlation=corSymm(form = ~ 1 | Subject), ? ? ? ? ?weights = varIdent(form = ~ 1 | age), ? ? ? ? ?data = orth) Now we have to convert the correlations and standard deviations of lme into covariance parameters of MIXED. First, extract the correlation matrix from lme. cors <- corMatrix(m1$modelStruct$corStruct)[[1]] Now, hard-code the stratum std deviations and square them to get variances. ?Multiply by the square of the residual std dev. ?(There's a way to extract all these values from the fitted model instead of hand-typing them, but I can't find them at the moment.) vars <- c(1.0000000, 0.8788793, 1.0744596, 0.9586878)^2 * 2.329213^2 Now create a matrix of variances and covariances. covs <- outer(vars,vars, function(x,y) sqrt(x)*sqrt(y)) Finally, multiply the correlations by the covariances. ?Let's compare results: PROC MIXED: Row Col1 Col2 Col3 Col4 1 5.415 2.716 3.910 2.710 2 2.716 4.184 2.927 3.317 3 3.910 2.927 6.455 4.130 4 2.710 3.317 4.130 4.985 R> round(cors*covs,3) ? ? ?[,1] ?[,2] ?[,3] ?[,4] [1,] 5.425 2.709 3.841 2.715 [2,] 2.709 4.191 2.975 3.314 [3,] 3.841 2.975 6.263 4.133 [4,] 2.715 3.314 4.133 4.986 In my experience, the small differences in the results are typical for unstructured models. Hope this helps. Kevin Wright On Tue, Jan 24, 2012 at 8:32 PM, Charles Determan Jr <deter088 at umn.edu>wrote:
Greetings, I have been working on R for some time now and I have begun the endeavor of trying to replicate some SAS code in R. ?I have scoured the forums but haven't been able to find an answer. ?I hope one of you could be so kind as to enlighten me. I am attempting to replicate a repeated measures experiment using some standard data. ?I have posted the SAS code and output directly from a publication as well as my attempts in R to replicate it. ?My main issue comes with the 'unstructured' component. The 'dental' dataset from 'mixedQF' package, equivalent to formixed data in SAS ? ?distance age Subject ? ?Sex 1 ? ? ? 26.0 ? 8 ? ? M01 ? Male 2 ? ? ? 25.0 ?10 ? ? M01 ? Male 3 ? ? ? 29.0 ?12 ? ? M01 ? Male 4 ? ? ? 31.0 ?14 ? ? M01 ? Male 5 ? ? ? 21.5 ? 8 ? ? M02 ? Male 6 ? ? ? 22.5 ?10 ? ? M02 ? Male 7 ? ? ? 23.0 ?12 ? ? M02 ? Male 8 ? ? ? 26.5 ?14 ? ? M02 ? Male 9 ? ? ? 23.0 ? 8 ? ? M03 ? Male 10 ? ? ?22.5 ?10 ? ? M03 ? Male 11 ? ? ?24.0 ?12 ? ? M03 ? Male 12 ? ? ?27.5 ?14 ? ? M03 ? Male 13 ? ? ?25.5 ? 8 ? ? M04 ? Male 14 ? ? ?27.5 ?10 ? ? M04 ? Male 15 ? ? ?26.5 ?12 ? ? M04 ? Male 16 ? ? ?27.0 ?14 ? ? M04 ? Male 17 ? ? ?20.0 ? 8 ? ? M05 ? Male 18 ? ? ?23.5 ?10 ? ? M05 ? Male 19 ? ? ?22.5 ?12 ? ? M05 ? Male 20 ? ? ?26.0 ?14 ? ? M05 ? Male 21 ? ? ?24.5 ? 8 ? ? M06 ? Male 22 ? ? ?25.5 ?10 ? ? M06 ? Male 23 ? ? ?27.0 ?12 ? ? M06 ? Male 24 ? ? ?28.5 ?14 ? ? M06 ? Male 25 ? ? ?22.0 ? 8 ? ? M07 ? Male 26 ? ? ?22.0 ?10 ? ? M07 ? Male 27 ? ? ?24.5 ?12 ? ? M07 ? Male 28 ? ? ?26.5 ?14 ? ? M07 ? Male 29 ? ? ?24.0 ? 8 ? ? M08 ? Male 30 ? ? ?21.5 ?10 ? ? M08 ? Male 31 ? ? ?24.5 ?12 ? ? M08 ? Male 32 ? ? ?25.5 ?14 ? ? M08 ? Male 33 ? ? ?23.0 ? 8 ? ? M09 ? Male 34 ? ? ?20.5 ?10 ? ? M09 ? Male 35 ? ? ?31.0 ?12 ? ? M09 ? Male 36 ? ? ?26.0 ?14 ? ? M09 ? Male 37 ? ? ?27.5 ? 8 ? ? M10 ? Male 38 ? ? ?28.0 ?10 ? ? M10 ? Male 39 ? ? ?31.0 ?12 ? ? M10 ? Male 40 ? ? ?31.5 ?14 ? ? M10 ? Male 41 ? ? ?23.0 ? 8 ? ? M11 ? Male 42 ? ? ?23.0 ?10 ? ? M11 ? Male 43 ? ? ?23.5 ?12 ? ? M11 ? Male 44 ? ? ?25.0 ?14 ? ? M11 ? Male 45 ? ? ?21.5 ? 8 ? ? M12 ? Male 46 ? ? ?23.5 ?10 ? ? M12 ? Male 47 ? ? ?24.0 ?12 ? ? M12 ? Male 48 ? ? ?28.0 ?14 ? ? M12 ? Male 49 ? ? ?17.0 ? 8 ? ? M13 ? Male 50 ? ? ?24.5 ?10 ? ? M13 ? Male 51 ? ? ?26.0 ?12 ? ? M13 ? Male 52 ? ? ?29.5 ?14 ? ? M13 ? Male 53 ? ? ?22.5 ? 8 ? ? M14 ? Male 54 ? ? ?25.5 ?10 ? ? M14 ? Male 55 ? ? ?25.5 ?12 ? ? M14 ? Male 56 ? ? ?26.0 ?14 ? ? M14 ? Male 57 ? ? ?23.0 ? 8 ? ? M15 ? Male 58 ? ? ?24.5 ?10 ? ? M15 ? Male 59 ? ? ?26.0 ?12 ? ? M15 ? Male 60 ? ? ?30.0 ?14 ? ? M15 ? Male 61 ? ? ?22.0 ? 8 ? ? M16 ? Male 62 ? ? ?21.5 ?10 ? ? M16 ? Male 63 ? ? ?23.5 ?12 ? ? M16 ? Male 64 ? ? ?25.0 ?14 ? ? M16 ? Male 65 ? ? ?21.0 ? 8 ? ? F01 Female 66 ? ? ?20.0 ?10 ? ? F01 Female 67 ? ? ?21.5 ?12 ? ? F01 Female 68 ? ? ?23.0 ?14 ? ? F01 Female 69 ? ? ?21.0 ? 8 ? ? F02 Female 70 ? ? ?21.5 ?10 ? ? F02 Female 71 ? ? ?24.0 ?12 ? ? F02 Female 72 ? ? ?25.5 ?14 ? ? F02 Female 73 ? ? ?20.5 ? 8 ? ? F03 Female 74 ? ? ?24.0 ?10 ? ? F03 Female 75 ? ? ?24.5 ?12 ? ? F03 Female 76 ? ? ?26.0 ?14 ? ? F03 Female 77 ? ? ?23.5 ? 8 ? ? F04 Female 78 ? ? ?24.5 ?10 ? ? F04 Female 79 ? ? ?25.0 ?12 ? ? F04 Female 80 ? ? ?26.5 ?14 ? ? F04 Female 81 ? ? ?21.5 ? 8 ? ? F05 Female 82 ? ? ?23.0 ?10 ? ? F05 Female 83 ? ? ?22.5 ?12 ? ? F05 Female 84 ? ? ?23.5 ?14 ? ? F05 Female 85 ? ? ?20.0 ? 8 ? ? F06 Female 86 ? ? ?21.0 ?10 ? ? F06 Female 87 ? ? ?21.0 ?12 ? ? F06 Female 88 ? ? ?22.5 ?14 ? ? F06 Female 89 ? ? ?21.5 ? 8 ? ? F07 Female 90 ? ? ?22.5 ?10 ? ? F07 Female 91 ? ? ?23.0 ?12 ? ? F07 Female 92 ? ? ?25.0 ?14 ? ? F07 Female 93 ? ? ?23.0 ? 8 ? ? F08 Female 94 ? ? ?23.0 ?10 ? ? F08 Female 95 ? ? ?23.5 ?12 ? ? F08 Female 96 ? ? ?24.0 ?14 ? ? F08 Female 97 ? ? ?20.0 ? 8 ? ? F09 Female 98 ? ? ?21.0 ?10 ? ? F09 Female 99 ? ? ?22.0 ?12 ? ? F09 Female 100 ? ? 21.5 ?14 ? ? F09 Female 101 ? ? 16.5 ? 8 ? ? F10 Female 102 ? ? 19.0 ?10 ? ? F10 Female 103 ? ? 19.0 ?12 ? ? F10 Female 104 ? ? 19.5 ?14 ? ? F10 Female 105 ? ? 24.5 ? 8 ? ? F11 Female 106 ? ? 25.0 ?10 ? ? F11 Female 107 ? ? 28.0 ?12 ? ? F11 Female 108 ? ? 28.0 ?14 ? ? F11 Female *Mixed modeling and fixed effect test* SAS proc mixed data=formixed; class gender age person; model y = gender|age; repeated / type=cs sub=person; run; output of interest to me ? ? ? ? ?Tests of Fixed Effects Source ? ? ? ? ? ? NDF ? DDF ? ?Type III F ? ?Pr > F GENDER ? ? ? ? ? 1 ? ? ? ?25 ? ? ? ?9.29 ? ? ? ?0.0054 AGE ? ? ? ? ? ? ? ? ?3 ? ? ? ?75 ? ? ? 35.35 ? ? ? 0.0001 GENDER*AGE ? 3 ? ? ? ?75 ? ? ? ?2.36 ? ? ? ?0.0781 R (nlme package) y=lme(distance~Sex*age, random=(~1|Subject), data=dental) anova(y) ? ? ? ? ? ?numDF denDF ?F-value p-value (Intercept) ? ? 1 ? ?75 4123.156 ?<.0001 Sex ? ? ? ? ? ? ?1 ? ?25 ? ?9.292 ?0.0054 age ? ? ? ? ? ? ? 3 ? ?75 ? 40.032 ?<.0001 Sex:age ? ? ? ?3 ? ?75 ? ?2.362 ?0.0781 Now this isn't exact but it is extremely close, however when I try to replicate the unstructured, SAS proc mixed data=formixed; class gender age person; model y = gender|age; repeated / type=un sub=person; run; ? ? ? ? ? ? Tests of Fixed Effects Source ? ? ? ? ?NDF DDF Type III F Pr > F GENDER ? ? ? ? 1 ? ?25 ? ? 9.29 ? ?0.0054 AGE ? ? ? ? ? ? ? ?3 ? ?25 ? ?34.45 ? 0.0001 GENDER*AGE 3 ? ?25 ? ? 2.93 ? ?0.0532 R either y=lme(distance~Sex*age, random=(~1|Subject), corr=corSymm(,~1|Subject), data=dental) anova(y) or z=lme(distance~Sex*age, random=(~1|Subject), corr=corSymm(), data=dental) anova(z) gives the output ? ? ? ? ? ?numDF denDF ?F-value ? ?p-value (Intercept) ? ? 1 ? ?75 ? ? 4052.028 ?<.0001 Sex ? ? ? ? ? ? ?1 ? ?25 ? ? ? 8.462 ? ? ?0.0075 age ? ? ? ? ? ? ? 3 ? ?75 ? ? ?39.022 ? ?<.0001 Sex:age ? ? ? ?3 ? ?75 ? ? ? 2.868 ? ? ?0.0421 What am I doing wrong to replicate the unstructured linear mixed model from SAS? Regards, Charles ? ? ? ?[[alternative HTML version deleted]]
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
-- Kevin Wright ? ? ? ?[[alternative HTML version deleted]]
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models