Since SAS does not publish its source code,
replicating SAS code is not always possible
(nor always desirable).
R code is completely open, so can be studied,
debated and replicated or modified - very useful when
people want to engage in scientific discussions
of statistical issues. Doing good science and
data analysis is challenging when you are working with
a black box of mysterious computer code. That's
why statisticians have worked so hard for years to
set up open source computational tools such as R.
Steven McKinney
Statistician
Molecular Oncology and Breast Cancer Program
British Columbia Cancer Research Centre
-----Original Message-----
From: r-sig-mixed-models-bounces at r-project.org [mailto:r-sig-mixed-models-
bounces at r-project.org] On Behalf Of Charles Determan Jr
Sent: January-26-12 4:07 PM
To: John Maindonald
Cc: r-sig-mixed-models at r-project.org
Subject: Re: [R-sig-ME] Trouble Replicating Unstructured Mixed Procedure in
R
The only thing I am looking for is the appropriate R code to replicate the
SAS analysis shown in the previously mentioned paper. That is all I ask.
What should the code be in order to analyze this 'dental' data to replicate
the 'UN' or 'unstructured' analysis in the prior paper.
Regards,
Charles
On Thu, Jan 26, 2012 at 4:37 PM, John Maindonald
<john.maindonald at anu.edu.au
wrote:
It is not really a matter of computational accuracy. One can get highly
accurate values for an inappropriate statistic.
Or if there is insistence on using the word, accuracy, what is the
meaning?
i) the wrong formula is used? Then in what sense is it 'wrong'?
ii) there is a numerical inaccuracy in the calculation? This is almost
never an issue in a relatively simple calculation such as this, given
the care taken by the code writers in such matters.
iii) where an approximation is used, as in using an F-distribution
approximation, is the best choice of degrees of freedom made to
for use of this approximation? I judge that the degrees of freedom
for lme's F-statistic for the interaction are not well chosen. Users
really have to sort this out for themselves, rather than relying on
what may be a fairly wild approximation that appears in lm's
output. Using 75df rather than 25df does not however make the
difference that a choice between (e.g.) 5df and 25df would.
A further and more basic issue is whether the statistic that is
provided is appropriate to the intended generalisation. I'd take
this to be generalisation to another sample of youths from the
same population. In order to understand why R and SAS are
giving different F-statistics for the interaction, one needs to
understand just what variance-covariance structure is assumed
in each case. One might extract the two estimates of the
var-cov structure and compare them. Look for terms in one that
do not appear, or maybe that are zero, in the other.
Finally, it is not just that Venables does not like type III SS.
He is saying that they almost never correspond to a null
hypothesis that makes any sense. Those who disagree try to
write down the model to which the null hypothesis corresponds
in testing for the main effect of factor1 with a factor1:factor2
interaction.
John Maindonald email: john.maindonald at anu.edu.au
phone : +61 2 (6125)3473 fax : +61 2(6125)5549
Centre for Mathematics & Its Applications, Room 1194,
John Dedman Mathematical Sciences Building (Building 27)
Australian National University, Canberra ACT 0200.
http://www.maths.anu.edu.au/~johnm
On 27/01/2012, at 2:41 AM, Thompson,Paul wrote:
OK, I've looked at that reference.
There are 2 aspects of an estimate like a SS. The first is the
stability
of the estimate, and the second is the interpretation of the estimate.
The
issues with the interpretation of the different estimates go back to
1970,
and they are simply a matter of interpretation. The point of the Venables
discussion is that he does not like Type III SS, not that they are wrong.
He does not agree with the interpretation.
The issue here is the accuracy of the Type III or Type I or Type II or
whatever. Accuracy comes before interpretation. If the r module and SAS
do
not arrive at the same estimates, that is an important thing.
Once we agree upon computation, we can argue about interpretation.
Charles Determan is inquiring as to computational accuracy. The use and
interpretation of the various Type I, II, III, IV, LVX SS are secondary.
-----Original Message-----
From: r-sig-mixed-models-bounces at r-project.org [mailto:
r-sig-mixed-models-bounces at r-project.org] On Behalf Of Luca Borger
Sent: Thursday, January 26, 2012 9:03 AM
To: r-sig-mixed-models at r-project.org
Subject: Re: [R-sig-ME] Trouble Replicating Unstructured Mixed
I am unfamiliar with this critique of Type III SS. Can you point me to
a reference discussing the difficulties with Type III SS?
-----Original Message-----
From: r-sig-mixed-models-bounces at r-project.org [mailto:
r-sig-mixed-models-bounces at r-project.org] On Behalf Of John Maindonald
Sent: Wednesday, January 25, 2012 11:19 PM
To: David Duffy
Cc: r-sig-mixed-models at r-project.org
Subject: Re: [R-sig-ME] Trouble Replicating Unstructured Mixed
Procedure in R
It is well to note that type III sums of squares are problematic.
For testing the effects of a main effect, the null model is
constraining
the main effect in a manner that depends on the parameterisation.
There are situations where it makes sense to fit interactions without
main effects, and it is clear what constraint on the main effect is
the
relevant null (with an interaction between a factor and a variable,
does one want all lines to go though the same point, or through
perhaps the origin?), but that situation is unusual. For lines that
are separate or all through the one point, one does not need
type III sums of squares.
Analyses often or frequently have enough genuine complications
worrying (unless it is blindingly obvious that one ought to worry
about it) without the rarely relevant complication of attending to a
type III sum of squares.
I'd guess that SAS and lme are, effectively, making different
assumptions about the intended generalisation. They are
clearly using different denominator degrees of freedom for F.
As one is looking for consistency across the 27 different youths,
SAS's denominator degrees of freedom for the interaction seem
more or less right, pretty much equivalent to calculating slopes
for females and slopes for males and using a t-test to compare
them. (Sure, in the analyses presented, age has been treated
as a categorical variable, but the comment still applies.)
John Maindonald email: john.maindonald at anu.edu.au
phone : +61 2 (6125)3473 fax : +61 2(6125)5549
Centre for Mathematics& Its Applications, Room 1194,
John Dedman Mathematical Sciences Building (Building 27)
Australian National University, Canberra ACT 0200.
http://www.maths.anu.edu.au/~johnm
On 26/01/2012, at 1:54 PM, David Duffy wrote:
On Tue, 24 Jan 2012, Charles Determan Jr wrote:
Greetings,
I have been working on R for some time now and I have begun the
endeavor of
trying to replicate some SAS code in R. I have scoured the forums
but
This is also the Orthodont dataset, distributed with nlme.
As David Atkins pointed out, R defaults to Type I SS. so you would
need to use, for example, the Anova() command from the car package. The
other thing is that the SAS F statistics are only approximate, depending
on
which covariance structure is chosen (perhaps John Maindonald or someone
clever could comment), so SAS offers different possibilities for ddf eg
_______________________________________________
R-sig-mixed-models at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
----------------------------------------------------------------------
-
Confidentiality Notice: This e-mail message, including any
attachments,
is for the sole use of the intended recipient(s) and may contain
privileged and confidential information. Any unauthorized review,
use,
disclosure or distribution is prohibited. If you are not the intended
recipient, please contact the sender by reply e-mail and destroy
all copies of the original message.
_______________________________________________
R-sig-mixed-models at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
-----------------------------------------------------------------------
Confidentiality Notice: This e-mail message, including any attachments,
is for the sole use of the intended recipient(s) and may contain
privileged and confidential information. Any unauthorized review, use,
disclosure or distribution is prohibited. If you are not the intended
recipient, please contact the sender by reply e-mail and destroy
all copies of the original message.
_______________________________________________
R-sig-mixed-models at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
Here's a typical agreement that users of SAS must agree to:
Subject to the provisions contained herein, EMPLOYEE may use the SAS copyrighted computer software products which LICENSEE has provided in accordance with its agreement with SAS.
EMPLOYEE acknowledges that these products are copyrighted and that SAS retains all title and ownership rights to the products. EMPLOYEE agrees not to copy or permit others to copy the products, in whole or in part.
EMPLOYEE agrees to use the products under this agreement only on a computer which is owned or leased by LICENSEE and controlled by LICENSEE. EMPLOYEE further agrees that the products must remain under EMPLOYEE's control, and that resale or other transfer is explicitly prohibited.
EMPLOYEE agrees to use the products only for EMPLOYEE's or LICENSEE's own data processing requirements, and not for commercial time-sharing, rental or service bureau use.
EMPLOYEE agrees not to create, or attempt to create, or permit or help others to create, the source code from the products furnished under this agreement. EMPLOYEE agrees that it will not reverse engineer or decompile the products.
(source: http://www.mcmaster.ca/uts/software_downloads/docs/SAS/saslicendform.doc )
Note that last paragraph. You can find it in other SAS end user license agreements.
So anyone who tries to "replicate PROC MIXED for repeated measures set as unstructured in R"
is then subject to legal action by the largest wealthiest statistical software company ever in
existence. I personally am not up for that challenge, especially when the code has
debateable merits.
I'd rather write code from scratch using sound statistical first principles,
which I can do thanks to the amazing amount of hard work by the R core group,
none of whom have ever asked me to sign any agreement (though they do insist
that I distribute source code and the GNU General Public License with any
copies I modify and/or distribute).
Steven McKinney
Statistician
Molecular Oncology and Breast Cancer Program
British Columbia Cancer Research Centre
From: Charles Determan Jr [mailto:deter088 at umn.edu]
Sent: January-26-12 4:50 PM
To: Steven McKinney
Cc: r-sig-mixed-models at r-project.org
Subject: Re: [R-sig-ME] Trouble Replicating Unstructured Mixed Procedure in R
So am I to assume that this implies that there isn't any known way to replicate PROC MIXED for repeated measures set as unstructured in R?
Charles
On Thu, Jan 26, 2012 at 6:36 PM, Steven McKinney <smckinney at bccrc.ca> wrote:
Since SAS does not publish its source code,
replicating SAS code is not always possible
(nor always desirable).
R code is completely open, so can be studied,
debated and replicated or modified - very useful when
people want to engage in scientific discussions
of statistical issues. Doing good science and
data analysis is challenging when you are working with
a black box of mysterious computer code. That's
why statisticians have worked so hard for years to
set up open source computational tools such as R.
Steven McKinney
Statistician
Molecular Oncology and Breast Cancer Program
British Columbia Cancer Research Centre
-----Original Message-----
From: r-sig-mixed-models-bounces at r-project.org [mailto:r-sig-mixed-models-
bounces at r-project.org] On Behalf Of Charles Determan Jr
Sent: January-26-12 4:07 PM
To: John Maindonald
Cc: r-sig-mixed-models at r-project.org
Subject: Re: [R-sig-ME] Trouble Replicating Unstructured Mixed Procedure in
R
The only thing I am looking for is the appropriate R code to replicate the
SAS analysis shown in the previously mentioned paper. That is all I ask.
What should the code be in order to analyze this 'dental' data to replicate
the 'UN' or 'unstructured' analysis in the prior paper.
Regards,
Charles
On Thu, Jan 26, 2012 at 4:37 PM, John Maindonald
<john.maindonald at anu.edu.au
wrote:
It is not really a matter of computational accuracy. One can get highly
accurate values for an inappropriate statistic.
Or if there is insistence on using the word, accuracy, what is the
meaning?
i) the wrong formula is used? Then in what sense is it 'wrong'?
ii) there is a numerical inaccuracy in the calculation? This is almost
never an issue in a relatively simple calculation such as this, given
the care taken by the code writers in such matters.
iii) where an approximation is used, as in using an F-distribution
approximation, is the best choice of degrees of freedom made to
for use of this approximation? I judge that the degrees of freedom
for lme's F-statistic for the interaction are not well chosen. Users
really have to sort this out for themselves, rather than relying on
what may be a fairly wild approximation that appears in lm's
output. Using 75df rather than 25df does not however make the
difference that a choice between (e.g.) 5df and 25df would.
A further and more basic issue is whether the statistic that is
provided is appropriate to the intended generalisation. I'd take
this to be generalisation to another sample of youths from the
same population. In order to understand why R and SAS are
giving different F-statistics for the interaction, one needs to
understand just what variance-covariance structure is assumed
in each case. One might extract the two estimates of the
var-cov structure and compare them. Look for terms in one that
do not appear, or maybe that are zero, in the other.
Finally, it is not just that Venables does not like type III SS.
He is saying that they almost never correspond to a null
hypothesis that makes any sense. Those who disagree try to
write down the model to which the null hypothesis corresponds
in testing for the main effect of factor1 with a factor1:factor2
interaction.
John Maindonald email: john.maindonald at anu.edu.au
phone : +61 2 (6125)3473 fax : +61 2(6125)5549
Centre for Mathematics & Its Applications, Room 1194,
John Dedman Mathematical Sciences Building (Building 27)
Australian National University, Canberra ACT 0200.
http://www.maths.anu.edu.au/~johnm
On 27/01/2012, at 2:41 AM, Thompson,Paul wrote:
OK, I've looked at that reference.
There are 2 aspects of an estimate like a SS. The first is the
stability
of the estimate, and the second is the interpretation of the estimate.
The
issues with the interpretation of the different estimates go back to
1970,
and they are simply a matter of interpretation. The point of the Venables
discussion is that he does not like Type III SS, not that they are wrong.
He does not agree with the interpretation.
The issue here is the accuracy of the Type III or Type I or Type II or
whatever. Accuracy comes before interpretation. If the r module and SAS
do
not arrive at the same estimates, that is an important thing.
Once we agree upon computation, we can argue about interpretation.
Charles Determan is inquiring as to computational accuracy. The use and
interpretation of the various Type I, II, III, IV, LVX SS are secondary.
-----Original Message-----
From: r-sig-mixed-models-bounces at r-project.org [mailto:
r-sig-mixed-models-bounces at r-project.org] On Behalf Of Luca Borger
Sent: Thursday, January 26, 2012 9:03 AM
To: r-sig-mixed-models at r-project.org
Subject: Re: [R-sig-ME] Trouble Replicating Unstructured Mixed
I am unfamiliar with this critique of Type III SS. Can you point me to
a reference discussing the difficulties with Type III SS?
-----Original Message-----
From: r-sig-mixed-models-bounces at r-project.org [mailto:
r-sig-mixed-models-bounces at r-project.org] On Behalf Of John Maindonald
Sent: Wednesday, January 25, 2012 11:19 PM
To: David Duffy
Cc: r-sig-mixed-models at r-project.org
Subject: Re: [R-sig-ME] Trouble Replicating Unstructured Mixed
Procedure in R
It is well to note that type III sums of squares are problematic.
For testing the effects of a main effect, the null model is
constraining
the main effect in a manner that depends on the parameterisation.
There are situations where it makes sense to fit interactions without
main effects, and it is clear what constraint on the main effect is
the
relevant null (with an interaction between a factor and a variable,
does one want all lines to go though the same point, or through
perhaps the origin?), but that situation is unusual. For lines that
are separate or all through the one point, one does not need
type III sums of squares.
Analyses often or frequently have enough genuine complications
worrying (unless it is blindingly obvious that one ought to worry
about it) without the rarely relevant complication of attending to a
type III sum of squares.
I'd guess that SAS and lme are, effectively, making different
assumptions about the intended generalisation. They are
clearly using different denominator degrees of freedom for F.
As one is looking for consistency across the 27 different youths,
SAS's denominator degrees of freedom for the interaction seem
more or less right, pretty much equivalent to calculating slopes
for females and slopes for males and using a t-test to compare
them. (Sure, in the analyses presented, age has been treated
as a categorical variable, but the comment still applies.)
John Maindonald email: john.maindonald at anu.edu.au
phone : +61 2 (6125)3473 fax : +61 2(6125)5549
Centre for Mathematics& Its Applications, Room 1194,
John Dedman Mathematical Sciences Building (Building 27)
Australian National University, Canberra ACT 0200.
http://www.maths.anu.edu.au/~johnm
On 26/01/2012, at 1:54 PM, David Duffy wrote:
On Tue, 24 Jan 2012, Charles Determan Jr wrote:
Greetings,
I have been working on R for some time now and I have begun the
endeavor of
trying to replicate some SAS code in R. I have scoured the forums
but
This is also the Orthodont dataset, distributed with nlme.
As David Atkins pointed out, R defaults to Type I SS. so you would
need to use, for example, the Anova() command from the car package. The
other thing is that the SAS F statistics are only approximate, depending
on
which covariance structure is chosen (perhaps John Maindonald or someone
clever could comment), so SAS offers different possibilities for ddf eg
_______________________________________________
R-sig-mixed-models at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
----------------------------------------------------------------------
-
Confidentiality Notice: This e-mail message, including any
attachments,
is for the sole use of the intended recipient(s) and may contain
privileged and confidential information. Any unauthorized review,
use,
disclosure or distribution is prohibited. If you are not the intended
recipient, please contact the sender by reply e-mail and destroy
all copies of the original message.
_______________________________________________
R-sig-mixed-models at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
-----------------------------------------------------------------------
Confidentiality Notice: This e-mail message, including any attachments,
is for the sole use of the intended recipient(s) and may contain
privileged and confidential information. Any unauthorized review, use,
disclosure or distribution is prohibited. If you are not the intended
recipient, please contact the sender by reply e-mail and destroy
all copies of the original message.
_______________________________________________
R-sig-mixed-models at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
I see, thank you Steven for your response. Perhaps I should start a new
question on here for what people would recommend currently in R for
analyzing a repeated measures data set.
If you Google on "Orthodont" and R, you will find several analyses of this
dataset, including one in John Maindonald's book. In those analyses, age
has been treated as a continuous covariate, rather than the SAS example's
approach.
| David Duffy (MBBS PhD) ,-_|\
| email: davidD at qimr.edu.au ph: INT+61+7+3362-0217 fax: -0101 / *
| Epidemiology Unit, Queensland Institute of Medical Research \_,-._/
| 300 Herston Rd, Brisbane, Queensland 4029, Australia GPG 4D0B994A v
Hi Charles,
Caveat emptor: I have not read John Maindonalds analysis of this data.
There may well be problems with this, but here are some of the things
I would try with the data.
By the way, if you want us to help you fit the same model as SAS, it
would help to know what SAS is fitting. If you could provide the
formula for the model and covariance structure, that would help. If
you do not know, perhaps first try to replicate in SAS using something
more explicit than the 'repeated' option.
Cheers,
Josh
dat <- structure(list(distance = c(26, 25, 29, 31, 21.5, 22.5, 23, 26.5,
23, 22.5, 24, 27.5, 25.5, 27.5, 26.5, 27, 20, 23.5, 22.5, 26,
24.5, 25.5, 27, 28.5, 22, 22, 24.5, 26.5, 24, 21.5, 24.5, 25.5,
23, 20.5, 31, 26, 27.5, 28, 31, 31.5, 23, 23, 23.5, 25, 21.5,
23.5, 24, 28, 17, 24.5, 26, 29.5, 22.5, 25.5, 25.5, 26, 23, 24.5,
26, 30, 22, 21.5, 23.5, 25, 21, 20, 21.5, 23, 21, 21.5, 24, 25.5,
20.5, 24, 24.5, 26, 23.5, 24.5, 25, 26.5, 21.5, 23, 22.5, 23.5,
20, 21, 21, 22.5, 21.5, 22.5, 23, 25, 23, 23, 23.5, 24, 20, 21,
22, 21.5, 16.5, 19, 19, 19.5, 24.5, 25, 28, 28), age = c(8L,
10L, 12L, 14L, 8L, 10L, 12L, 14L, 8L, 10L, 12L, 14L, 8L, 10L,
12L, 14L, 8L, 10L, 12L, 14L, 8L, 10L, 12L, 14L, 8L, 10L, 12L,
14L, 8L, 10L, 12L, 14L, 8L, 10L, 12L, 14L, 8L, 10L, 12L, 14L,
8L, 10L, 12L, 14L, 8L, 10L, 12L, 14L, 8L, 10L, 12L, 14L, 8L,
10L, 12L, 14L, 8L, 10L, 12L, 14L, 8L, 10L, 12L, 14L, 8L, 10L,
12L, 14L, 8L, 10L, 12L, 14L, 8L, 10L, 12L, 14L, 8L, 10L, 12L,
14L, 8L, 10L, 12L, 14L, 8L, 10L, 12L, 14L, 8L, 10L, 12L, 14L,
8L, 10L, 12L, 14L, 8L, 10L, 12L, 14L, 8L, 10L, 12L, 14L, 8L,
10L, 12L, 14L), Subject = structure(c(12L, 12L, 12L, 12L, 13L,
13L, 13L, 13L, 14L, 14L, 14L, 14L, 15L, 15L, 15L, 15L, 16L, 16L,
16L, 16L, 17L, 17L, 17L, 17L, 18L, 18L, 18L, 18L, 19L, 19L, 19L,
19L, 20L, 20L, 20L, 20L, 21L, 21L, 21L, 21L, 22L, 22L, 22L, 22L,
23L, 23L, 23L, 23L, 24L, 24L, 24L, 24L, 25L, 25L, 25L, 25L, 26L,
26L, 26L, 26L, 27L, 27L, 27L, 27L, 1L, 1L, 1L, 1L, 2L, 2L, 2L,
2L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 5L, 5L, 5L, 5L, 6L, 6L, 6L,
6L, 7L, 7L, 7L, 7L, 8L, 8L, 8L, 8L, 9L, 9L, 9L, 9L, 10L, 10L,
10L, 10L, 11L, 11L, 11L, 11L), .Label = c("F01", "F02", "F03",
"F04", "F05", "F06", "F07", "F08", "F09", "F10", "F11", "M01",
"M02", "M03", "M04", "M05", "M06", "M07", "M08", "M09", "M10",
"M11", "M12", "M13", "M14", "M15", "M16"), class = "factor"),
Sex = structure(c(2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("Female", "Male"
), class = "factor")), .Names = c("distance", "age", "Subject",
"Sex"), class = "data.frame", row.names = c("1", "2", "3", "4",
"5", "6", "7", "8", "9", "10", "11", "12", "13", "14", "15",
"16", "17", "18", "19", "20", "21", "22", "23", "24", "25", "26",
"27", "28", "29", "30", "31", "32", "33", "34", "35", "36", "37",
"38", "39", "40", "41", "42", "43", "44", "45", "46", "47", "48",
"49", "50", "51", "52", "53", "54", "55", "56", "57", "58", "59",
"60", "61", "62", "63", "64", "65", "66", "67", "68", "69", "70",
"71", "72", "73", "74", "75", "76", "77", "78", "79", "80", "81",
"82", "83", "84", "85", "86", "87", "88", "89", "90", "91", "92",
"93", "94", "95", "96", "97", "98", "99", "100", "101", "102",
"103", "104", "105", "106", "107", "108"))
require(ggplot2)
require(mgcv)
## intercepts and slopes seem different between sexes
## but there is no evidence of a nonlinear relationship between
## age and distance
ggplot(dat, aes(x = age, y = distance, colour = Sex)) +
geom_point() +
stat_smooth(method = "gam", formula = y ~ s(x))
## reorder data by initial distance value
dat$Subject <- with(dat, reorder(Subject,
distance[ifelse(age == min(age), TRUE, NA)],
FUN = mean, na.rm = TRUE))
## slight evidence that lower intercepts may be
## associated with more positive slopes
ggplot(dat, aes(x = age, y = distance, colour = Sex)) +
geom_point() +
stat_smooth(method = "gam", formula = y ~ s(x)) +
facet_wrap(~ Subject)
## lme4 package for mixed effects models
require(lme4)
mnull <- lmer(distance ~ 1 + (1 | Subject), data = dat)
m1 <- update(mnull, . ~ . + age * Sex)
m2 <- update(m2, . ~ . + (0 + age | Subject))
m3 <- lmer(distance ~ age * Sex + (1 + age | Subject), data = dat)
## compare different models, m1 seems good
anova(mnull, m1, m2, m3)
plot(dat$distance, fitted(m1))
## examine residuals and random effects
qqnorm(resid(m1))
plot(dat$age, resid(m1))
qqnorm(ranef(m1)$Subject[,1])
## view the summary
summary(m1)
## Linear mixed model fit by REML
## Formula: distance ~ (1 | Subject) + age + Sex + age:Sex
## Data: dat
## AIC BIC logLik deviance REMLdev
## 445.8 461.9 -216.9 428.7 433.8
## Random effects:
## Groups Name Variance Std.Dev.
## Subject (Intercept) 3.2986 1.8162
## Residual 1.9221 1.3864
## Number of obs: 108, groups: Subject, 27
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 17.37273 1.18349 14.679
## age 0.47955 0.09347 5.130
## SexMale -1.03210 1.53740 -0.671
## age:SexMale 0.30483 0.12142 2.511
## Correlation of Fixed Effects:
## (Intr) age SexMal
## age -0.869
## SexMale -0.770 0.669
## age:SexMale 0.669 -0.770 -0.869
On Thu, Jan 26, 2012 at 5:37 PM, Charles Determan Jr <deter088 at umn.edu> wrote:
I see, thank you Steven for your response. ?Perhaps I should start a new
question on here for what people would recommend currently in R for
analyzing a repeated measures data set. ?Would that be an appropriate
request without infringing upon any possible legal ramifications? ?Perhaps
there is a slightly different method that is built on 'sound statistical
first principles'. ?Or does anyone currently following this thread know an
appropriate repeated measures analysis of this 'dental' data that would be
similar to the SAS results?
Charles
On Thu, Jan 26, 2012 at 7:20 PM, Steven McKinney <smckinney at bccrc.ca> wrote:
Here's a typical agreement that users of SAS must agree to:
Subject to the provisions contained herein, EMPLOYEE may use the SAS
copyrighted computer software products which LICENSEE has provided in
accordance with its agreement with SAS.
EMPLOYEE acknowledges that these products are copyrighted and that SAS
retains all title and ownership rights to the products. ?EMPLOYEE agrees
not to copy or permit others to copy the products, in whole or in part.
EMPLOYEE agrees to use the products under this agreement only on a
computer which is owned or leased by LICENSEE and controlled by LICENSEE.
?EMPLOYEE further agrees that the products must remain under EMPLOYEE's
control, and that resale or other transfer is explicitly prohibited.
EMPLOYEE agrees to use the products only for EMPLOYEE's or LICENSEE's own
data processing requirements, and not for commercial time-sharing, rental
or service bureau use.
EMPLOYEE agrees not to create, or attempt to create, or permit or help
others to create, the source code from the products furnished under this
agreement. ?EMPLOYEE agrees that it will not reverse engineer or decompile
the products.
(source:
http://www.mcmaster.ca/uts/software_downloads/docs/SAS/saslicendform.doc )
Note that last paragraph. ?You can find it in other SAS end user license
agreements.
So anyone who tries to "replicate PROC MIXED for repeated measures set as
unstructured in R"
is then subject to legal action by the largest wealthiest statistical
software company ever in
existence. ?I personally am not up for that challenge, especially when the
code has
debateable merits.
I'd rather write code from scratch using sound statistical first
principles,
which I can do thanks to the amazing amount of hard work by the R core
group,
none of whom have ever asked me to sign any agreement (though they do
insist
that I distribute source code and the GNU General Public License with any
copies I modify and/or distribute).
Steven McKinney
Statistician
Molecular Oncology and Breast Cancer Program
British Columbia Cancer Research Centre
From: Charles Determan Jr [mailto:deter088 at umn.edu]
Sent: January-26-12 4:50 PM
To: Steven McKinney
Cc: r-sig-mixed-models at r-project.org
Subject: Re: [R-sig-ME] Trouble Replicating Unstructured Mixed Procedure
in R
So am I to assume that this implies that there isn't any known way to
replicate PROC MIXED for repeated measures set as unstructured in R?
Charles
On Thu, Jan 26, 2012 at 6:36 PM, Steven McKinney <smckinney at bccrc.ca>
wrote:
Since SAS does not publish its source code,
replicating SAS code is not always possible
(nor always desirable).
R code is completely open, so can be studied,
debated and replicated or modified - very useful when
people want to engage in scientific discussions
of statistical issues. ?Doing good science and
data analysis is challenging when you are working with
a black box of mysterious computer code. ?That's
why statisticians have worked so hard for years to
set up open source computational tools such as R.
Steven McKinney
Statistician
Molecular Oncology and Breast Cancer Program
British Columbia Cancer Research Centre
-----Original Message-----
From: r-sig-mixed-models-bounces at r-project.org [mailto:
r-sig-mixed-models-
bounces at r-project.org] On Behalf Of Charles Determan Jr
Sent: January-26-12 4:07 PM
To: John Maindonald
Cc: r-sig-mixed-models at r-project.org
Subject: Re: [R-sig-ME] Trouble Replicating Unstructured Mixed Procedure
in
R
The only thing I am looking for is the appropriate R code to replicate
the
SAS analysis shown in the previously mentioned paper. ?That is all I ask.
What should the code be in order to analyze this 'dental' data to
replicate
the 'UN' or 'unstructured' analysis in the prior paper.
Regards,
Charles
On Thu, Jan 26, 2012 at 4:37 PM, John Maindonald
<john.maindonald at anu.edu.au
wrote:
It is not really a matter of computational accuracy. ?One can get
highly
accurate values for an inappropriate statistic.
Or if there is insistence on using the word, accuracy, what is the
meaning?
i) the wrong formula is used? ?Then in what sense is it 'wrong'?
ii) there is a numerical inaccuracy in the calculation? ?This is almost
never an issue in a relatively simple calculation such as this, given
the care taken by the code writers in such matters.
iii) where an approximation is used, as in using an F-distribution
approximation, is the best choice of degrees of freedom ?made to
for use of this approximation? ?I judge that the degrees of freedom
for lme's F-statistic for the interaction are not well chosen. ?Users
really have to sort this out for themselves, rather than relying on
what may be a fairly wild approximation that appears in lm's
output. ?Using 75df rather than 25df does not however make the
difference that a choice between (e.g.) 5df and 25df would.
A further and more basic issue is whether the statistic that is
provided is appropriate to the intended generalisation. ?I'd take
this to be generalisation to another sample of youths from the
same population. ?In order to understand why R and SAS are
giving different F-statistics for the interaction, one needs to
understand just what variance-covariance structure is assumed
in each case. ?One might extract the two estimates of the
var-cov structure and compare them. Look for terms in one that
do not appear, or maybe that are zero, in the other.
Finally, it is not just that Venables does not like type III SS.
He is saying that they almost never correspond to a null
hypothesis that makes any sense. ?Those who disagree try to
write down the model to which the null hypothesis corresponds
in testing for the main effect of factor1 with a factor1:factor2
interaction.
John Maindonald ? ? ? ? ? ? email: john.maindonald at anu.edu.au
phone : +61 2 (6125)3473 ? ?fax ?: +61 2(6125)5549
Centre for Mathematics & Its Applications, Room 1194,
John Dedman Mathematical Sciences Building (Building 27)
Australian National University, Canberra ACT 0200.
http://www.maths.anu.edu.au/~johnm
On 27/01/2012, at 2:41 AM, Thompson,Paul wrote:
OK, I've looked at that reference.
There are 2 aspects of an estimate like a SS. The first is the
stability
of the estimate, and the second is the interpretation of the estimate.
The
issues with the interpretation of the different estimates go back to
1970,
and they are simply a matter of interpretation. The point of the
Venables
discussion is that he does not like Type III SS, not that they are
wrong.
He does not agree with the interpretation.
The issue here is the accuracy of the Type III or Type I or Type II
or
whatever. Accuracy comes before interpretation. If the r module and SAS
do
not arrive at the same estimates, that is an important thing.
Once we agree upon computation, we can argue about interpretation.
Charles Determan is inquiring as to computational accuracy. The use and
interpretation of the various Type I, II, III, IV, LVX SS are
secondary.
-----Original Message-----
From: r-sig-mixed-models-bounces at r-project.org [mailto:
r-sig-mixed-models-bounces at r-project.org] On Behalf Of Luca Borger
Sent: Thursday, January 26, 2012 9:03 AM
To: r-sig-mixed-models at r-project.org
Subject: Re: [R-sig-ME] Trouble Replicating Unstructured Mixed
I am unfamiliar with this critique of Type III SS. Can you point me
to
a reference discussing the difficulties with Type III SS?
-----Original Message-----
From: r-sig-mixed-models-bounces at r-project.org [mailto:
r-sig-mixed-models-bounces at r-project.org] On Behalf Of John Maindonald
Sent: Wednesday, January 25, 2012 11:19 PM
To: David Duffy
Cc: r-sig-mixed-models at r-project.org
Subject: Re: [R-sig-ME] Trouble Replicating Unstructured Mixed
Procedure in R
It is well to note that type III sums of squares are problematic.
For testing the effects of a main effect, the null model is
constraining
the main effect in a manner that depends on the parameterisation.
There are situations where it makes sense to fit interactions
without
main effects, and it is clear what constraint on the main effect is
the
relevant null (with an interaction between a factor and a variable,
does one want all lines to go though the same point, or through
perhaps the origin?), but that situation is unusual. ?For lines that
are separate or all through the one point, one does not need
type III sums of squares.
Analyses often or frequently have enough genuine complications
worrying (unless it is blindingly obvious that one ought to worry
about it) without the rarely relevant complication of attending to a
type III sum of squares.
I'd guess that SAS and lme are, effectively, making different
assumptions about the intended generalisation. ?They are
clearly using different denominator degrees of freedom for F.
As one is looking for consistency across the 27 different youths,
SAS's denominator degrees of freedom for the interaction seem
more or less right, pretty much equivalent to calculating slopes
for females and slopes for males and using a t-test to compare
them. ?(Sure, in the analyses presented, age has been treated
as a categorical variable, but the comment still applies.)
John Maindonald ? ? ? ? ? ? email: john.maindonald at anu.edu.au
phone : +61 2 (6125)3473 ? ?fax ?: +61 2(6125)5549
Centre for Mathematics& ?Its Applications, Room 1194,
John Dedman Mathematical Sciences Building (Building 27)
Australian National University, Canberra ACT 0200.
http://www.maths.anu.edu.au/~johnm
On 26/01/2012, at 1:54 PM, David Duffy wrote:
On Tue, 24 Jan 2012, Charles Determan Jr wrote:
Greetings,
I have been working on R for some time now and I have begun the
endeavor of
trying to replicate some SAS code in R. ?I have scoured the forums
but
This is also the Orthodont dataset, distributed with nlme.
As David Atkins pointed out, R defaults to Type I SS. so you would
need to use, for example, the Anova() command from the car package.
?The
other thing is that the SAS F statistics are only approximate,
depending
on
which covariance structure is chosen (perhaps John Maindonald or
someone
clever could comment), so SAS offers different possibilities for ddf eg
Joshua Wiley
Ph.D. Student, Health Psychology
Programmer Analyst II, Statistical Consulting Group
University of California, Los Angeles
https://joshuawiley.com/