Greetings, I'm trying to fit a linear mixed effect model using nlme::lme() that includes heteroskedastic variances by group, where the variance for one group is fixed to a specific value. I *think* I can do this using varIdent() and setting lmeControl(sigma = x) for a specified value x. However, I've run into what is evidently an old problem: that there doesn't appear to be any way to control the reference level of varIdent(). As was noted in a very old post to R-SIG-mixed-models ( https://stat.ethz.ch/pipermail/r-sig-mixed-models/2013q3/020582.html), the reference level of varIdent() within an lme() call is set to the most common level of the grouping factor. For reasons that are too involved to get into, I need to set the reference level to something else (in fact, the least frequently occurring level), so it seems I'm stuck. Curiously, the behavior of varIdent() within a gls() call is a bit different. As documented in this StackOverflow ( https://stackoverflow.com/questions/75929847/reference-category-in-varident-function-nlme-package-depends-on-data-order), the reference level used within gls() depends on the sort order of the data. I've included below a minimal working example that demonstrates the difference in behavior. Does anyone know of any work-around for setting the reference level of varIdent() within an lme() call? I'd be grateful for any leads/suggestions on what I could do (up to and including to just use a different package already). Thanks, James ---------------------------------------------------------------------------- library(nlme) # Make reversed factor table(Orthodont$Sex) Orthodont$Sex_rev <- Orthodont$Sex levels(Orthodont$Sex_rev) <- rev(levels(Orthodont$Sex)) table(Orthodont$Sex_rev) Ortho_male_first <- Orthodont[order(Orthodont$Sex),] Ortho_female_first <- Orthodont[order(Orthodont$Sex, decreasing = TRUE),] # Fit gls models with heteroskedastic variance gls1 <- gls( distance ~ age + Sex, weights = varIdent(form = ~ 1 | Sex), data = Ortho_male_first ) gls2 <- update(gls1, data = Ortho_female_first) gls3 <- update(gls1, weights = varIdent(form = ~ 1 |Sex_rev)) gls4 <- update(gls3, data = Ortho_female_first) # Parameterization depends on sort order coef(gls1$modelStruct$varStruct) coef(gls2$modelStruct$varStruct) # But not on factor level order coef(gls3$modelStruct$varStruct) coef(gls4$modelStruct$varStruct) # Fit lme models with heteroskedastic level-1 variance lme1 <- lme( distance ~ age + Sex, random = ~ 1, weights = varIdent(form = ~ 1 | Sex), data = Ortho_male_first ) lme2 <- update(lme1, data = Ortho_female_first) lme3 <- update(lme1, weights = varIdent(form = ~ 1 |Sex_rev)) lme4 <- update(lme3, data = Ortho_female_first) # Parameterization depends neither on sort order nor on factor level order coef(lme1$modelStruct$varStruct) coef(lme2$modelStruct$varStruct) coef(lme3$modelStruct$varStruct) coef(lme4$modelStruct$varStruct)
reference category in varIdent function (once again)
5 messages · Sebastian Meyer, James Pustejovsky
2 days later
I think it is unfortunate that varIdent() selects the reference category based on how the data is sorted (using the first level that occurs). There is also a corresponding bug report <https://bugs.r-project.org/show_bug.cgi?id=18505>. Your example illustrates an additional issue with this behaviour, because lme() internally reorders the data by the grouping factor(s) before initialization, so the order of that factor ('Subject' in your example, with first level "M16", so male) will indirectly determine the reference level of varIdent(), regardless of how your data or strata levels are ordered originally. It seems that only if there are more than two strata, the reference level for varIdent() can be chosen via a named initial parameter 'value', leaving out the desired reference level, but I haven't tested if this works as intended with both gls() and lme(). It would then make sense to support such a choice also in the case of only two strata, so for a single parameter. Best wishes, Sebastian Meyer Am 20.12.24 um 20:39 schrieb James Pustejovsky:
Greetings, I'm trying to fit a linear mixed effect model using nlme::lme() that includes heteroskedastic variances by group, where the variance for one group is fixed to a specific value. I *think* I can do this using varIdent() and setting lmeControl(sigma = x) for a specified value x. However, I've run into what is evidently an old problem: that there doesn't appear to be any way to control the reference level of varIdent(). As was noted in a very old post to R-SIG-mixed-models ( https://stat.ethz.ch/pipermail/r-sig-mixed-models/2013q3/020582.html), the reference level of varIdent() within an lme() call is set to the most common level of the grouping factor. For reasons that are too involved to get into, I need to set the reference level to something else (in fact, the least frequently occurring level), so it seems I'm stuck. Curiously, the behavior of varIdent() within a gls() call is a bit different. As documented in this StackOverflow ( https://stackoverflow.com/questions/75929847/reference-category-in-varident-function-nlme-package-depends-on-data-order), the reference level used within gls() depends on the sort order of the data. I've included below a minimal working example that demonstrates the difference in behavior. Does anyone know of any work-around for setting the reference level of varIdent() within an lme() call? I'd be grateful for any leads/suggestions on what I could do (up to and including to just use a different package already). Thanks, James ---------------------------------------------------------------------------- library(nlme) # Make reversed factor table(Orthodont$Sex) Orthodont$Sex_rev <- Orthodont$Sex levels(Orthodont$Sex_rev) <- rev(levels(Orthodont$Sex)) table(Orthodont$Sex_rev) Ortho_male_first <- Orthodont[order(Orthodont$Sex),] Ortho_female_first <- Orthodont[order(Orthodont$Sex, decreasing = TRUE),] # Fit gls models with heteroskedastic variance gls1 <- gls( distance ~ age + Sex, weights = varIdent(form = ~ 1 | Sex), data = Ortho_male_first ) gls2 <- update(gls1, data = Ortho_female_first) gls3 <- update(gls1, weights = varIdent(form = ~ 1 |Sex_rev)) gls4 <- update(gls3, data = Ortho_female_first) # Parameterization depends on sort order coef(gls1$modelStruct$varStruct) coef(gls2$modelStruct$varStruct) # But not on factor level order coef(gls3$modelStruct$varStruct) coef(gls4$modelStruct$varStruct) # Fit lme models with heteroskedastic level-1 variance lme1 <- lme( distance ~ age + Sex, random = ~ 1, weights = varIdent(form = ~ 1 | Sex), data = Ortho_male_first ) lme2 <- update(lme1, data = Ortho_female_first) lme3 <- update(lme1, weights = varIdent(form = ~ 1 |Sex_rev)) lme4 <- update(lme3, data = Ortho_female_first) # Parameterization depends neither on sort order nor on factor level order coef(lme1$modelStruct$varStruct) coef(lme2$modelStruct$varStruct) coef(lme3$modelStruct$varStruct) coef(lme4$modelStruct$varStruct) [[alternative HTML version deleted]]
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
6 days later
Hi Sebastian, Thanks for your reply. This helped me figure out how to control the reference level for the particular model I'm working on. I wrote up what I've learned (in an excessive degree of detail) here: https://jepusto.com/posts/varIdent-function-in-nlme/ James
On Mon, Dec 23, 2024 at 10:40?AM Sebastian Meyer <seb.meyer at fau.de> wrote:
I think it is unfortunate that varIdent() selects the reference category based on how the data is sorted (using the first level that occurs). There is also a corresponding bug report <https://bugs.r-project.org/show_bug.cgi?id=18505>. Your example illustrates an additional issue with this behaviour, because lme() internally reorders the data by the grouping factor(s) before initialization, so the order of that factor ('Subject' in your example, with first level "M16", so male) will indirectly determine the reference level of varIdent(), regardless of how your data or strata levels are ordered originally. It seems that only if there are more than two strata, the reference level for varIdent() can be chosen via a named initial parameter 'value', leaving out the desired reference level, but I haven't tested if this works as intended with both gls() and lme(). It would then make sense to support such a choice also in the case of only two strata, so for a single parameter. Best wishes, Sebastian Meyer Am 20.12.24 um 20:39 schrieb James Pustejovsky:
Greetings, I'm trying to fit a linear mixed effect model using nlme::lme() that includes heteroskedastic variances by group, where the variance for one group is fixed to a specific value. I *think* I can do this using varIdent() and setting lmeControl(sigma = x) for a specified value x. However, I've run into what is evidently an old problem: that there
doesn't
appear to be any way to control the reference level of varIdent(). As was noted in a very old post to R-SIG-mixed-models ( https://stat.ethz.ch/pipermail/r-sig-mixed-models/2013q3/020582.html),
the
reference level of varIdent() within an lme() call is set to the most common level of the grouping factor. For reasons that are too involved to get into, I need to set the reference level to something else (in fact,
the
least frequently occurring level), so it seems I'm stuck. Curiously, the behavior of varIdent() within a gls() call is a bit different. As documented in this StackOverflow (
the reference level used within gls() depends on the sort order of the data. I've included below a minimal working example that demonstrates the difference in behavior. Does anyone know of any work-around for setting the reference level of varIdent() within an lme() call? I'd be grateful for any
leads/suggestions
on what I could do (up to and including to just use a different package already). Thanks, James
----------------------------------------------------------------------------
library(nlme) # Make reversed factor table(Orthodont$Sex) Orthodont$Sex_rev <- Orthodont$Sex levels(Orthodont$Sex_rev) <- rev(levels(Orthodont$Sex)) table(Orthodont$Sex_rev) Ortho_male_first <- Orthodont[order(Orthodont$Sex),] Ortho_female_first <- Orthodont[order(Orthodont$Sex, decreasing = TRUE),] # Fit gls models with heteroskedastic variance gls1 <- gls( distance ~ age + Sex, weights = varIdent(form = ~ 1 | Sex), data = Ortho_male_first ) gls2 <- update(gls1, data = Ortho_female_first) gls3 <- update(gls1, weights = varIdent(form = ~ 1 |Sex_rev)) gls4 <- update(gls3, data = Ortho_female_first) # Parameterization depends on sort order coef(gls1$modelStruct$varStruct) coef(gls2$modelStruct$varStruct) # But not on factor level order coef(gls3$modelStruct$varStruct) coef(gls4$modelStruct$varStruct) # Fit lme models with heteroskedastic level-1 variance lme1 <- lme( distance ~ age + Sex, random = ~ 1, weights = varIdent(form = ~ 1 | Sex), data = Ortho_male_first ) lme2 <- update(lme1, data = Ortho_female_first) lme3 <- update(lme1, weights = varIdent(form = ~ 1 |Sex_rev)) lme4 <- update(lme3, data = Ortho_female_first) # Parameterization depends neither on sort order nor on factor level
order
coef(lme1$modelStruct$varStruct)
coef(lme2$modelStruct$varStruct)
coef(lme3$modelStruct$varStruct)
coef(lme4$modelStruct$varStruct)
[[alternative HTML version deleted]]
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
Glad to hear that helped. Your exemplification of current varIdent behaviour is also helpful, thanks. I'll revisit the bug with a closer look next year. In principle, it would be better for the reference level to be based on the factor levels, but as you say, nlme's code base is very old, which means many publications (implicitly) rely on the current behaviour, so semantics or defaults won't be changed easily. Still, bugs should be fixed -- and it should be possible to choose the reference level. Best regards, Sebastian Meyer PS: The getVarCov() bug that is linked in your report (to another post) was fixed a while ago in 2020 (nlme version 3.1-150), i.e., the `all.equal(V_raw, V_sorted)` tests now give TRUE. It might be worth noting this somewhere. Am 29.12.24 um 22:40 schrieb James Pustejovsky:
Hi Sebastian, Thanks for your reply. This helped me figure out how to control the reference level for the particular model I'm working on. I wrote up what I've learned (in an excessive degree of detail) here: https://jepusto.com/posts/varIdent-function-in-nlme/ <https://jepusto.com/posts/varIdent-function-in-nlme/> James On Mon, Dec 23, 2024 at 10:40?AM Sebastian Meyer <seb.meyer at fau.de <mailto:seb.meyer at fau.de>> wrote: I think it is unfortunate that varIdent() selects the reference category based on how the data is sorted (using the first level that occurs). There is also a corresponding bug report <https://bugs.r-project.org/show_bug.cgi?id=18505 <https://bugs.r-project.org/show_bug.cgi?id=18505>>. Your example illustrates an additional issue with this behaviour, because lme() internally reorders the data by the grouping factor(s) before initialization, so the order of that factor ('Subject' in your example, with first level "M16", so male) will indirectly determine the reference level of varIdent(), regardless of how your data or strata levels are ordered originally. It seems that only if there are more than two strata, the reference level for varIdent() can be chosen via a named initial parameter 'value', leaving out the desired reference level, but I haven't tested if this works as intended with both gls() and lme(). It would then make sense to support such a choice also in the case of only two strata, so for a single parameter. Best wishes, ? ? ? ? Sebastian Meyer Am 20.12.24 um 20:39 schrieb James Pustejovsky:
> Greetings,
>
> I'm trying to fit a linear mixed effect model using nlme::lme() that
> includes heteroskedastic variances by group, where the variance
for one
> group is fixed to a specific value. I *think* I can do this using
> varIdent() and setting lmeControl(sigma = x) for a specified value x.
> However, I've run into what is evidently an old problem: that
there doesn't
> appear to be any way to control the reference level of varIdent().
>
> As was noted in a very old post to R-SIG-mixed-models (
>
https://stat.ethz.ch/pipermail/r-sig-mixed-models/2013q3/020582.html <https://stat.ethz.ch/pipermail/r-sig-mixed-models/2013q3/020582.html>), the
> reference level of varIdent() within an lme() call is set to the most
> common level of the grouping factor. For reasons that are too
involved to
> get into, I need to set the reference level to something else (in
fact, the
> least frequently occurring level), so it seems I'm stuck.
>
> Curiously, the behavior of varIdent() within a gls() call is a bit
> different. As documented in this StackOverflow (
>
https://stackoverflow.com/questions/75929847/reference-category-in-varident-function-nlme-package-depends-on-data-order <https://stackoverflow.com/questions/75929847/reference-category-in-varident-function-nlme-package-depends-on-data-order>),
> the reference level used within gls() depends on the sort order
of the
> data. I've included below a minimal working example that
demonstrates the
> difference in behavior.
>
> Does anyone know of any work-around for setting the reference
level of
> varIdent() within an lme() call? I'd be grateful for any
leads/suggestions
> on what I could do (up to and including to just use a different
package
> already).
>
> Thanks,
> James
>
>
>
----------------------------------------------------------------------------
> library(nlme)
>
> # Make reversed factor
> table(Orthodont$Sex)
> Orthodont$Sex_rev <- Orthodont$Sex
> levels(Orthodont$Sex_rev) <- rev(levels(Orthodont$Sex))
> table(Orthodont$Sex_rev)
>
> Ortho_male_first <- Orthodont[order(Orthodont$Sex),]
> Ortho_female_first <- Orthodont[order(Orthodont$Sex, decreasing =
TRUE),]
>
> # Fit gls models with heteroskedastic variance
> gls1 <- gls(
>? ? distance ~ age + Sex,
>? ? weights = varIdent(form = ~ 1 | Sex),
>? ? data = Ortho_male_first
> )
> gls2 <- update(gls1, data = Ortho_female_first)
> gls3 <- update(gls1, weights = varIdent(form = ~ 1 |Sex_rev))
> gls4 <- update(gls3, data = Ortho_female_first)
>
> # Parameterization depends on sort order
> coef(gls1$modelStruct$varStruct)
> coef(gls2$modelStruct$varStruct)
> # But not on factor level order
> coef(gls3$modelStruct$varStruct)
> coef(gls4$modelStruct$varStruct)
>
> # Fit lme models with heteroskedastic level-1 variance
>
> lme1 <- lme(
>? ? distance ~ age + Sex,
>? ? random = ~ 1,
>? ? weights = varIdent(form = ~ 1 | Sex),
>? ? data = Ortho_male_first
> )
> lme2 <- update(lme1, data = Ortho_female_first)
> lme3 <- update(lme1, weights = varIdent(form = ~ 1 |Sex_rev))
> lme4 <- update(lme3, data = Ortho_female_first)
>
> # Parameterization depends neither on sort order nor on factor
level order
> coef(lme1$modelStruct$varStruct)
> coef(lme2$modelStruct$varStruct)
> coef(lme3$modelStruct$varStruct)
> coef(lme4$modelStruct$varStruct)
>
>? ? ? ?[[alternative HTML version deleted]]
>
> _______________________________________________
> R-sig-mixed-models at r-project.org
<mailto:R-sig-mixed-models at r-project.org> mailing list
Regarding your P.S., I've added a callout to the post noting that the bug has been squashed. https://jepusto.com/posts/bug-in-nlme-getVarCov/ Kind Regards, James
On Sun, Dec 29, 2024 at 5:23?PM Sebastian Meyer <seb.meyer at fau.de> wrote:
Glad to hear that helped. Your exemplification of current varIdent
behaviour is also helpful, thanks. I'll revisit the bug with a closer
look next year.
In principle, it would be better for the reference level to be based on
the factor levels, but as you say, nlme's code base is very old, which
means many publications (implicitly) rely on the current behaviour, so
semantics or defaults won't be changed easily. Still, bugs should be
fixed -- and it should be possible to choose the reference level.
Best regards,
Sebastian Meyer
PS: The getVarCov() bug that is linked in your report (to another post)
was fixed a while ago in 2020 (nlme version 3.1-150), i.e., the
`all.equal(V_raw, V_sorted)` tests now give TRUE. It might be worth
noting this somewhere.
Am 29.12.24 um 22:40 schrieb James Pustejovsky:
Hi Sebastian, Thanks for your reply. This helped me figure out how to control the reference level for the particular model I'm working on. I wrote up what I've learned (in an excessive degree of detail) here: https://jepusto.com/posts/varIdent-function-in-nlme/ <https://jepusto.com/posts/varIdent-function-in-nlme/> James On Mon, Dec 23, 2024 at 10:40?AM Sebastian Meyer <seb.meyer at fau.de <mailto:seb.meyer at fau.de>> wrote: I think it is unfortunate that varIdent() selects the reference category based on how the data is sorted (using the first level that occurs). There is also a corresponding bug report <https://bugs.r-project.org/show_bug.cgi?id=18505 <https://bugs.r-project.org/show_bug.cgi?id=18505>>. Your example illustrates an additional issue with this behaviour, because lme() internally reorders the data by the grouping factor(s) before initialization, so the order of that factor ('Subject' in your example, with first level "M16", so male) will indirectly determine
the
reference level of varIdent(), regardless of how your data or strata
levels are ordered originally.
It seems that only if there are more than two strata, the reference
level for varIdent() can be chosen via a named initial parameter
'value', leaving out the desired reference level, but I haven't
tested
if this works as intended with both gls() and lme(). It would then
make
sense to support such a choice also in the case of only two strata,
so
for a single parameter.
Best wishes,
Sebastian Meyer
Am 20.12.24 um 20:39 schrieb James Pustejovsky:
> Greetings,
>
> I'm trying to fit a linear mixed effect model using nlme::lme()
that
> includes heteroskedastic variances by group, where the variance
for one
> group is fixed to a specific value. I *think* I can do this using
> varIdent() and setting lmeControl(sigma = x) for a specified
value x.
> However, I've run into what is evidently an old problem: that
there doesn't
> appear to be any way to control the reference level of varIdent().
>
> As was noted in a very old post to R-SIG-mixed-models (
>
> reference level of varIdent() within an lme() call is set to the
most
> common level of the grouping factor. For reasons that are too
involved to
> get into, I need to set the reference level to something else (in
fact, the
> least frequently occurring level), so it seems I'm stuck.
>
> Curiously, the behavior of varIdent() within a gls() call is a bit
> different. As documented in this StackOverflow (
>
https://stackoverflow.com/questions/75929847/reference-category-in-varident-function-nlme-package-depends-on-data-order < https://stackoverflow.com/questions/75929847/reference-category-in-varident-function-nlme-package-depends-on-data-order
),
> the reference level used within gls() depends on the sort order
of the
> data. I've included below a minimal working example that
demonstrates the
> difference in behavior.
>
> Does anyone know of any work-around for setting the reference
level of
> varIdent() within an lme() call? I'd be grateful for any
leads/suggestions
> on what I could do (up to and including to just use a different
package
> already).
>
> Thanks,
> James
>
>
>
----------------------------------------------------------------------------
> library(nlme)
>
> # Make reversed factor
> table(Orthodont$Sex)
> Orthodont$Sex_rev <- Orthodont$Sex
> levels(Orthodont$Sex_rev) <- rev(levels(Orthodont$Sex))
> table(Orthodont$Sex_rev)
>
> Ortho_male_first <- Orthodont[order(Orthodont$Sex),]
> Ortho_female_first <- Orthodont[order(Orthodont$Sex, decreasing =
TRUE),]
>
> # Fit gls models with heteroskedastic variance
> gls1 <- gls(
> distance ~ age + Sex,
> weights = varIdent(form = ~ 1 | Sex),
> data = Ortho_male_first
> )
> gls2 <- update(gls1, data = Ortho_female_first)
> gls3 <- update(gls1, weights = varIdent(form = ~ 1 |Sex_rev))
> gls4 <- update(gls3, data = Ortho_female_first)
>
> # Parameterization depends on sort order
> coef(gls1$modelStruct$varStruct)
> coef(gls2$modelStruct$varStruct)
> # But not on factor level order
> coef(gls3$modelStruct$varStruct)
> coef(gls4$modelStruct$varStruct)
>
> # Fit lme models with heteroskedastic level-1 variance
>
> lme1 <- lme(
> distance ~ age + Sex,
> random = ~ 1,
> weights = varIdent(form = ~ 1 | Sex),
> data = Ortho_male_first
> )
> lme2 <- update(lme1, data = Ortho_female_first)
> lme3 <- update(lme1, weights = varIdent(form = ~ 1 |Sex_rev))
> lme4 <- update(lme3, data = Ortho_female_first)
>
> # Parameterization depends neither on sort order nor on factor
level order
> coef(lme1$modelStruct$varStruct)
> coef(lme2$modelStruct$varStruct)
> coef(lme3$modelStruct$varStruct)
> coef(lme4$modelStruct$varStruct)
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> R-sig-mixed-models at r-project.org
<mailto:R-sig-mixed-models at r-project.org> mailing list