I am a SAS user, and I have just finished a near-final draft of an article about identifying and specifying fixed and random effects for mixed models in SAS. At the moment, all I say about R is this: "To all users of the R stats package, my apologies: a few years ago, I found the mixed model in R too limiting, and I struggled with the coding. I hope someone with R smarts will consider adapting my programs and publishing them here." I really would like to know if the limitations (or at least, what I consider to be limitations) have been addressed, so I can be more specific in my article. Specifically can anyone answer these questions? 1. Can you specify negative variance for random effects in R? (That doesn't apply to the variances representing residuals, which are never negative). 2. Can you get trustworthy estimates of standard errors for the random-effect and residual variances in R? (By trustworthy, I guess I mean the same as SAS's, but it could mean that someone has shown that the confidence intervals derived with the SEs have good coverage, assuming normality for the variances representing random effects and chi-squared for variances representing residuals.) 3. Does R have the equivalent of the repeated statement in SAS, whereby you can specify a repeated measure with an unstructured or other covariance matrix? (e.g., repeated Time/subject=SubjectID type=un. This statement works more reliably than the random statement in SAS with small sample sizes, but it doesn't produce residual variances, not directly anyway.) 4. Does R have the equivalent of the group= option in SAS, whereby you can specify separate estimates of random-effect variances and covariances, and/or separate estimates of residuals, for different groups? (e.g., random SubjectID/group=Sex; repeated/group=Sex;) The draft article is available at https://sportsci.org/2023/EffectsModels.htm. It's not published yet (i.e., not linked to the Sportscience homepage yet). I would be grateful for any feedback. I can incorporate more about R and would love someone to provide R code for the programs I have published. See below for the title and abstract. Will How to Identify and Specify Fixed and Random Effects for Linear Mixed Models in the Statistical Analysis System Most analyses require linear mixed modeling to properly account not only for mean effects but also for sources of variability and error in the data. In linear mixed models, means and their differences are specified with fixed effects, while variabilities and errors are specified with random effects (including residuals) and are summarized as standard deviations. In this tutorial article, I explain how variables represent effects in linear mixed models and how identifying clusters of observations in a dataset is the key to identifying the fixed and random effects. I also provide programs written in the language of the Statistical analysis system to simulate data and analyze them with the general linear mixed model, Proc Mixed, which is used when the dependent variable is continuous. The analyses include simple linear regression, reliability and time series, controlled trials, and combined within- and between-subject modeling. Finally, I explain how dependent variables representing counts or proportions require generalized linear mixed models, realized in SAS with Proc Glimmix, where the main difference is the specification of the residuals.
Random effects in R vs SAS
4 messages · Will Hopkins, John Maindonald, Mollie Brooks
Will, an immediate comment is that what may be accommodated are variance ?components'. They are not the variances of any identifiable quantity. It is surely misleading to call them variances. For residuals, the component is at the same time a variance. With a suitably balanced experimental design, one can work with the function aov() to obtain a breakdown from which the negative component can be extracted, The following inflates within block variances ? it mirrors a context where blocks have been designed to be more heterogenous than plots generally.
set.seed(29)
library(lme4)
df0 <- data.frame(block=rep(1:100, rep(10,100)), trt=rep(1:5,200),
y=rnorm(1000,4,1))
## Plot variances are all set to 1.0
df0$y <- df0$y + df0$trt*0.1 ## Differences between treatments
## Add systematic within block variation
## 10 different systematic effects for the 10 different plots in
## a block are randomly assigned
systeff <- 0.4*(1:10)
df <- df0
for(i in 1:100)df$y[df$block==i] <- df$y[df$block==i]+sample(systeff)
## sample does the random assignment
df$block = factor(df$block); df$trt =factor(df$trt)
y.aov <- aov(y ~ trt +Error(block), data=df)
summary(y.aov)
Error: block
Df Sum Sq Mean Sq F value Pr(>F)
Residuals 99 127.7 1.29
Error: Within
Df Sum Sq Mean Sq F value Pr(>F)
trt 4 23.5 5.865 2.302 0.0569
Residuals 896 2282.8 2.548
y.lmer <- lmer(y~trt + (1|block), data=df)
boundary (singular) fit: see help('isSingular?)
summary(y.lmer)
Linear mixed model fit by REML ['lmerMod']
Formula: y ~ trt + (1 | block)
Data: df
REML criterion at convergence: 3776.3
Scaled residuals:
Min 1Q Median 3Q Max
-2.78060 -0.72441 0.04553 0.70139 2.79109
Random effects:
Groups Name Variance Std.Dev.
block (Intercept) 0.000 0.000
Residual 2.537 1.593
Number of obs: 1000, groups: block, 100
. . .
library(nlme) df.lme <- lme(y~trt, random=~1|block, data=df) ## No complaints! This is a worry.
summary(df.lme)
Linear mixed-effects model fit by REML
Data: df
AIC BIC logLik
3790.336 3824.655 -1888.168
Random effects:
Formula: ~1 | block
(Intercept) Residual
StdDev: 6.727467e-05 1.592659
Fixed effects: y ~ trt
Value Std.Error DF t-value p-value
(Intercept) 6.096161 0.1126180 896 54.13131 0.0000
trt2 0.305667 0.1592659 896 1.91922 0.0553
trt3 0.418814 0.1592659 896 2.62965 0.0087
trt4 0.515445 0.1592659 896 3.23638 0.0013
trt5 0.561525 0.1592659 896 3.52571 0.0004
. . .
In principle, one should be able to set up a within blocks correlation
structure that accommodates the negative component of variance.
John Maindonald
Statistics Reseach Associates
Wellington NZ.
On 18/08/2023, at 17:06, Will Hopkins <willthekiwi at gmail.com> wrote: I am a SAS user, and I have just finished a near-final draft of an article about identifying and specifying fixed and random effects for mixed models in SAS. At the moment, all I say about R is this: "To all users of the R stats package, my apologies: a few years ago, I found the mixed model in R too limiting, and I struggled with the coding. I hope someone with R smarts will consider adapting my programs and publishing them here." I really would like to know if the limitations (or at least, what I consider to be limitations) have been addressed, so I can be more specific in my article. Specifically can anyone answer these questions? 1. Can you specify negative variance for random effects in R? (That doesn't apply to the variances representing residuals, which are never negative). 2. Can you get trustworthy estimates of standard errors for the random-effect and residual variances in R? (By trustworthy, I guess I mean the same as SAS's, but it could mean that someone has shown that the confidence intervals derived with the SEs have good coverage, assuming normality for the variances representing random effects and chi-squared for variances representing residuals.) 3. Does R have the equivalent of the repeated statement in SAS, whereby you can specify a repeated measure with an unstructured or other covariance matrix? (e.g., repeated Time/subject=SubjectID type=un. This statement works more reliably than the random statement in SAS with small sample sizes, but it doesn't produce residual variances, not directly anyway.) 4. Does R have the equivalent of the group= option in SAS, whereby you can specify separate estimates of random-effect variances and covariances, and/or separate estimates of residuals, for different groups? (e.g., random SubjectID/group=Sex; repeated/group=Sex;) The draft article is available at https://sportsci.org/2023/EffectsModels.htm. It's not published yet (i.e., not linked to the Sportscience homepage yet). I would be grateful for any feedback. I can incorporate more about R and would love someone to provide R code for the programs I have published. See below for the title and abstract. Will How to Identify and Specify Fixed and Random Effects for Linear Mixed Models in the Statistical Analysis System Most analyses require linear mixed modeling to properly account not only for mean effects but also for sources of variability and error in the data. In linear mixed models, means and their differences are specified with fixed effects, while variabilities and errors are specified with random effects (including residuals) and are summarized as standard deviations. In this tutorial article, I explain how variables represent effects in linear mixed models and how identifying clusters of observations in a dataset is the key to identifying the fixed and random effects. I also provide programs written in the language of the Statistical analysis system to simulate data and analyze them with the general linear mixed model, Proc Mixed, which is used when the dependent variable is continuous. The analyses include simple linear regression, reliability and time series, controlled trials, and combined within- and between-subject modeling. Finally, I explain how dependent variables representing counts or proportions require generalized linear mixed models, realized in SAS with Proc Glimmix, where the main difference is the specification of the residuals. [[alternative HTML version deleted]]
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
6 days later
On 18 Aug 2023, at 07.06, Will Hopkins <willthekiwi at gmail.com> wrote: I am a SAS user, and I have just finished a near-final draft of an article about identifying and specifying fixed and random effects for mixed models in SAS. At the moment, all I say about R is this: "To all users of the R stats package, my apologies: a few years ago, I found the mixed model in R too limiting, and I struggled with the coding. I hope someone with R smarts will consider adapting my programs and publishing them here." I really would like to know if the limitations (or at least, what I consider to be limitations) have been addressed, so I can be more specific in my article. Specifically can anyone answer these questions? 1. Can you specify negative variance for random effects in R? (That doesn't apply to the variances representing residuals, which are never negative). 2. Can you get trustworthy estimates of standard errors for the random-effect and residual variances in R? (By trustworthy, I guess I mean the same as SAS's, but it could mean that someone has shown that the confidence intervals derived with the SEs have good coverage, assuming normality for the variances representing random effects and chi-squared for variances representing residuals.) 3. Does R have the equivalent of the repeated statement in SAS, whereby you can specify a repeated measure with an unstructured or other covariance matrix? (e.g., repeated Time/subject=SubjectID type=un. This statement works more reliably than the random statement in SAS with small sample sizes, but it doesn't produce residual variances, not directly anyway.)
I am not a SAS user, but I guess that their unstructured covariance matrix might be similar to the one described here https://cran.r-project.org/web/packages/glmmTMB/vignettes/covstruct.html
4. Does R have the equivalent of the group= option in SAS, whereby you can specify separate estimates of random-effect variances and covariances, and/or separate estimates of residuals, for different groups? (e.g., random SubjectID/group=Sex; repeated/group=Sex;)
This section of this website explains how a variety of random effects can be specified in many R packages https://bbolker.github.io/mixedmodels-misc/glmmFAQ.html#model-specification Mollie
The draft article is available at https://sportsci.org/2023/EffectsModels.htm. It's not published yet (i.e., not linked to the Sportscience homepage yet). I would be grateful for any feedback. I can incorporate more about R and would love someone to provide R code for the programs I have published. See below for the title and abstract. Will How to Identify and Specify Fixed and Random Effects for Linear Mixed Models in the Statistical Analysis System Most analyses require linear mixed modeling to properly account not only for mean effects but also for sources of variability and error in the data. In linear mixed models, means and their differences are specified with fixed effects, while variabilities and errors are specified with random effects (including residuals) and are summarized as standard deviations. In this tutorial article, I explain how variables represent effects in linear mixed models and how identifying clusters of observations in a dataset is the key to identifying the fixed and random effects. I also provide programs written in the language of the Statistical analysis system to simulate data and analyze them with the general linear mixed model, Proc Mixed, which is used when the dependent variable is continuous. The analyses include simple linear regression, reliability and time series, controlled trials, and combined within- and between-subject modeling. Finally, I explain how dependent variables representing counts or proportions require generalized linear mixed models, realized in SAS with Proc Glimmix, where the main difference is the specification of the residuals. [[alternative HTML version deleted]]
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
On 18 Aug 2023, at 07.06, Will Hopkins <willthekiwi at gmail.com> wrote: I am a SAS user, and I have just finished a near-final draft of an article about identifying and specifying fixed and random effects for mixed models in SAS. At the moment, all I say about R is this: "To all users of the R stats package, my apologies: a few years ago, I found the mixed model in R too limiting, and I struggled with the coding. I hope someone with R smarts will consider adapting my programs and publishing them here." I really would like to know if the limitations (or at least, what I consider to be limitations) have been addressed, so I can be more specific in my article. Specifically can anyone answer these questions? 1. Can you specify negative variance for random effects in R? (That doesn't apply to the variances representing residuals, which are never negative). 2. Can you get trustworthy estimates of standard errors for the random-effect and residual variances in R? (By trustworthy, I guess I mean the same as SAS's, but it could mean that someone has shown that the confidence intervals derived with the SEs have good coverage, assuming normality for the variances representing random effects and chi-squared for variances representing residuals.) 3. Does R have the equivalent of the repeated statement in SAS, whereby you can specify a repeated measure with an unstructured or other covariance matrix? (e.g., repeated Time/subject=SubjectID type=un. This statement works more reliably than the random statement in SAS with small sample sizes, but it doesn't produce residual variances, not directly anyway.)
I am not a SAS user, but I guess that their unstructured covariance matrix might be similar to the one described here https://cran.r-project.org/web/packages/glmmTMB/vignettes/covstruct.html
4. Does R have the equivalent of the group= option in SAS, whereby you can specify separate estimates of random-effect variances and covariances, and/or separate estimates of residuals, for different groups? (e.g., random SubjectID/group=Sex; repeated/group=Sex;)
This section of this website explains how a variety of random effects can be specified in many R packages https://bbolker.github.io/mixedmodels-misc/glmmFAQ.html#model-specification Mollie
The draft article is available at https://sportsci.org/2023/EffectsModels.htm. It's not published yet (i.e., not linked to the Sportscience homepage yet). I would be grateful for any feedback. I can incorporate more about R and would love someone to provide R code for the programs I have published. See below for the title and abstract. Will How to Identify and Specify Fixed and Random Effects for Linear Mixed Models in the Statistical Analysis System Most analyses require linear mixed modeling to properly account not only for mean effects but also for sources of variability and error in the data. In linear mixed models, means and their differences are specified with fixed effects, while variabilities and errors are specified with random effects (including residuals) and are summarized as standard deviations. In this tutorial article, I explain how variables represent effects in linear mixed models and how identifying clusters of observations in a dataset is the key to identifying the fixed and random effects. I also provide programs written in the language of the Statistical analysis system to simulate data and analyze them with the general linear mixed model, Proc Mixed, which is used when the dependent variable is continuous. The analyses include simple linear regression, reliability and time series, controlled trials, and combined within- and between-subject modeling. Finally, I explain how dependent variables representing counts or proportions require generalized linear mixed models, realized in SAS with Proc Glimmix, where the main difference is the specification of the residuals. [[alternative HTML version deleted]]
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models