Continuous variable as random slope and the minimum number of levels for a categorical variable to be treated as random - R-SIG-mixed-models

Fri, Apr 14, 2017 3:05 AM #

Dear all,

I've recently read in this page (https://dynamicecology.
wordpress.com/2015/11/04/is-it-a-fixed-or-random-effect/) the following
text "First you CANNOT treat a continuous variable as a random effect. So
if you are putting area or temperature or body size is in they may be a
nuisance/control variable but they are a fixed effect. Of course you are
only estimating one parameter (the slope) so there is no degree of freedom
cost to treating it as random. And it makes no sense to ask what is the
variance across a continuous variable."
Actually I don't know why it doesn't make any sense ask what is the
variance across a continuous variable.
I've seen the classical example on sleepstudy data which treats a cntinuous
variable as random slope:
fm1 <- lmer (Reaction~Days+(Days|Subject), sleepstudy)
with sleepstudy$Days being a continuous variable, and lmer estimates the
variance of the Days slope.

So... is it OK to use a continuous variable as random slope or not?

Furthermore the post says: "[...] you should not treat a categorical
variable with only two levels (e.g. two sites), also known as a binary
variable, as a random effect. You wouldn?t take two measures and then try
to estimate variance, but that is what you?re asking R to do if you treat
it as random. Beyond that there is a lot of debate. But many people think
<http://stats.stackexchange.com/questions/37647/minimum-number-of-levels-for-a-random-effects-factor>
you
should have at least 5 levels (e.g. 5 sites) before you treat something as
random"

Actually I've seen a lot of GLMMs done with random factors with just 2
levels. Is it acceptable or not?

Thanks in advance,

Michele

Research Associate @ NPSY-Lab.VR - University of Verona
Research Associate @ AgliotiLab - University of Rome "La Sapienza"
Iscrizione all'albo A dell'Ordine degli Psicologi del Veneto n.7733

office tel. 0039 045 802 8401

*http://agliotilab.org/lab-staff/phd-students/3rd-year/michele-scandola#anchor
<http://agliotilab.org/lab-staff/phd-students/3rd-year/michele-scandola#anchor>*
http://profs.formazione.univr.it/npsy-labvr/michele-scandola/
http://scholar.google.it/citations?user=mRc0hxsAAAAJ
http://it.linkedin.com/pub/michele-scandola/24/967/313



*Le informazioni, i dati e le notizie contenute nella presente
comunicazione e i relativi allegati sono di natura privata e come tali
possono essere riservate e sono, comunque, destinate esclusivamente ai
destinatari indicati in epigrafe. La diffusione, distribuzione e/o la
copiatura del documento  trasmesso da parte di qualsiasi soggetto diverso
dal destinatario ? proibita, sia ai sensi dell?art. 616 c.p., sia ai sensi
del D.Lgs. n. 196/2003. Se avete ricevuto questo messaggio per errore, vi
preghiamo di distruggerlo e di darcene immediata comunicazione anche
inviando un messaggio di ritorno all?indirizzo e-mail del mittente.*
*This e-mail (including attachments) is intended only for the recipient(s)
named above. It may contain confidential or privileged information and
should not be read, copied or otherwise used by any other person. If you
are not the named recipient, please contact npsylab.vr at gmail.com
<npsylab.vr at gmail.com> and delete the e-mail from your system. Rif. D.L.
196/2003.*

	[[alternative HTML version deleted]]

Conor Michael Goold

Fri, Apr 14, 2017 3:50 AM #

Hi,

The post you link to is to treating "random effect" solely as the blocking factor or hierarchical grouping factor in the model, when one wants to estimate different intercept parameters for each of the grouping factors. For instance, when observations are nested within individuals as in the sleep study, then individuals are the grouping factor or the "random effect" and will have their own intercept. Actually, in one of the comments (second one), the author admits he doesn't include the topic of random slopes for brevity. But even with random slope terms, the slope is varying with respect to the same blocking factor as the intercept.

However, continuous variables that respect order (e.g. different ages) can also be treated as random effects or grouping variables through Gaussian process models.

When you say you have seen GLMMs with only 2 levels, do you mean random slopes or random intercepts? I'm guessing the former based on your first question.

The minimum size for a discrete grouping factor is dependent on the exact context (e.g. how many parameters are being estimated), but many recommend 5 as a minimum (although, this would only stand for the simplest of models) and more is always better. For instance, Stegmueller 2013 (http://onlinelibrary.wiley.com/doi/10.1111/ajps.12001/abstract) says that having at least 15-20 levels of the grouping factor in ML estimation is best, whereas Bayesian methods are more robust at lower number of levels per grouping factor.

Also, as another commenter discussed, the random/fixed effect terms can be confusing and perhaps a better way to think about these sorts of models is simply whether parameters vary by some grouping factor or not. Thus, you could have intercepts or slopes varying with respect to a grouping factor. I prefer to write "Intercepts and the slope of predictor X varied by each individual" rather than "Random intercepts and slopes were included" because I think it's ultimately clearer about what is being done and what readers can expect from the analysis.

Best regards
Conor Goold
PhD Student
Phone: +47 67 23 27 24

Norwegian University of Life Sciences
Campus ?s. www.nmbu.no

From: R-sig-mixed-models <r-sig-mixed-models-bounces at r-project.org> on behalf of Michele Scandola <michele.scandola at gmail.com>
Sent: Friday, April 14, 2017 12:05 PM
To: r-sig-mixed-models at r-project.org
Subject: [R-sig-ME] Fwd: Continuous variable as random slope and the minimum number of levels for a categorical variable to be treated as random

Dear all,

I've recently read in this page (https://dynamicecology.
wordpress.com/2015/11/04/is-it-a-fixed-or-random-effect/) the following
text "First you CANNOT treat a continuous variable as a random effect. So
if you are putting area or temperature or body size is in they may be a
nuisance/control variable but they are a fixed effect. Of course you are
only estimating one parameter (the slope) so there is no degree of freedom
cost to treating it as random. And it makes no sense to ask what is the
variance across a continuous variable."
Actually I don't know why it doesn't make any sense ask what is the
variance across a continuous variable.
I've seen the classical example on sleepstudy data which treats a cntinuous
variable as random slope:
fm1 <- lmer (Reaction~Days+(Days|Subject), sleepstudy)
with sleepstudy$Days being a continuous variable, and lmer estimates the
variance of the Days slope.

So... is it OK to use a continuous variable as random slope or not?

Furthermore the post says: "[...] you should not treat a categorical
variable with only two levels (e.g. two sites), also known as a binary
variable, as a random effect. You wouldn?t take two measures and then try
to estimate variance, but that is what you?re asking R to do if you treat
it as random. Beyond that there is a lot of debate. But many people think
<http://stats.stackexchange.com/questions/37647/minimum-number-of-levels-for-a-random-effects-factor>
you
should have at least 5 levels (e.g. 5 sites) before you treat something as
random"

Actually I've seen a lot of GLMMs done with random factors with just 2
levels. Is it acceptable or not?

Thanks in advance,

Michele

--
Research Associate @ NPSY-Lab.VR - University of Verona
Research Associate @ AgliotiLab - University of Rome "La Sapienza"
Iscrizione all'albo A dell'Ordine degli Psicologi del Veneto n.7733

office tel. 0039 045 802 8401

*http://agliotilab.org/lab-staff/phd-students/3rd-year/michele-scandola#anchor
<http://agliotilab.org/lab-staff/phd-students/3rd-year/michele-scandola#anchor>*
http://profs.formazione.univr.it/npsy-labvr/michele-scandola/
http://scholar.google.it/citations?user=mRc0hxsAAAAJ
http://it.linkedin.com/pub/michele-scandola/24/967/313

*Le informazioni, i dati e le notizie contenute nella presente
comunicazione e i relativi allegati sono di natura privata e come tali
possono essere riservate e sono, comunque, destinate esclusivamente ai
destinatari indicati in epigrafe. La diffusione, distribuzione e/o la
copiatura del documento trasmesso da parte di qualsiasi soggetto diverso
dal destinatario ? proibita, sia ai sensi dell?art. 616 c.p., sia ai sensi
del D.Lgs. n. 196/2003. Se avete ricevuto questo messaggio per errore, vi
preghiamo di distruggerlo e di darcene immediata comunicazione anche
inviando un messaggio di ritorno all?indirizzo e-mail del mittente.*
*This e-mail (including attachments) is intended only for the recipient(s)
named above. It may contain confidential or privileged information and
should not be read, copied or otherwise used by any other person. If you
are not the named recipient, please contact npsylab.vr at gmail.com
<npsylab.vr at gmail.com> and delete the e-mail from your system. Rif. D.L.
196/2003.*

[[alternative HTML version deleted]]

_______________________________________________
R-sig-mixed-models at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models

Michele Scandola

Fri, Apr 14, 2017 10:33 AM #

Dear Conor,

Thanks a lot for your answers. I don't know why but I have misunderstood
the article, I've thought it was talking about random slopes. Now it makes
sense.
However I didn't know that even continuous, ordered variables can be used
as grouping factors. Do you have any reference about that? The link I've
shared clearly states it is not possible.
However, in your example you have spoken about age. May be a good idea to
use it as a nested grouping factor in the participant grouping factor? I
mean something like (1|subject:age).

Best regards,

Michele


Il 14 Apr 2017 12:50 PM, "Conor Michael Goold" <conor.goold at nmbu.no> ha
scritto:

Hi,

The post you link to is to treating "random effect" solely as the blocking
factor or hierarchical grouping factor in the model, when one wants to
estimate different intercept parameters for each of the grouping factors.
For instance, when observations are nested within individuals as in the
sleep study, then individuals are the grouping factor or the "random
effect" and will have their own intercept. Actually, in one of the comments
(second one), the author admits he doesn't include the topic of random
slopes for brevity. But even with random slope terms, the slope is varying
with respect to the same blocking factor as the intercept.

However, continuous variables that respect order (e.g. different ages) can
also be treated as random effects or grouping variables through Gaussian
process models.

When you say you have seen GLMMs with only 2 levels, do you mean random
slopes or random intercepts? I'm guessing the former based on your first
question.

The minimum size for a discrete grouping factor is dependent on the exact
context (e.g. how many parameters are being estimated), but many recommend
5 as a minimum (although, this would only stand for the simplest of models)
and more is always better. For instance, Stegmueller 2013 (
http://onlinelibrary.wiley.com/doi/10.1111/ajps.12001/abstract) says that
having at least 15-20 levels of the grouping factor in ML estimation is
best, whereas Bayesian methods are more robust at lower number of levels
per grouping factor.

Also, as another commenter discussed, the random/fixed effect terms can be
confusing and perhaps a better way to think about these sorts of models is
simply whether parameters vary by some grouping factor or not. Thus, you
could have intercepts or slopes varying with respect to a grouping factor.
I prefer to write "Intercepts and the slope of predictor X varied by each
individual" rather than "Random intercepts and slopes were included"
because I think it's ultimately clearer about what is being done and what
readers can expect from the analysis.

Best regards
Conor Goold
PhD Student
Phone:        +47 67 23 27 24



Norwegian University of Life Sciences
Campus ?s. www.nmbu.no

From: R-sig-mixed-models <r-sig-mixed-models-bounces at r-project.org> on

behalf of Michele Scandola <michele.scandola at gmail.com>
Sent: Friday, April 14, 2017 12:05 PM
To: r-sig-mixed-models at r-project.org
Subject: [R-sig-ME] Fwd: Continuous variable as random slope and the
minimum number of levels for a categorical variable to be treated as random

Dear all,

I've recently read in this page (https://dynamicecology.
wordpress.com/2015/11/04/is-it-a-fixed-or-random-effect/) the following
text "First you CANNOT treat a continuous variable as a random effect. So
if you are putting area or temperature or body size is in they may be a
nuisance/control variable but they are a fixed effect. Of course you are
only estimating one parameter (the slope) so there is no degree of freedom
cost to treating it as random. And it makes no sense to ask what is the
variance across a continuous variable."
Actually I don't know why it doesn't make any sense ask what is the
variance across a continuous variable.
I've seen the classical example on sleepstudy data which treats a cntinuous
variable as random slope:
fm1 <- lmer (Reaction~Days+(Days|Subject), sleepstudy)
with sleepstudy$Days being a continuous variable, and lmer estimates the
variance of the Days slope.

So... is it OK to use a continuous variable as random slope or not?

Furthermore the post says: "[...] you should not treat a categorical
variable with only two levels (e.g. two sites), also known as a binary
variable, as a random effect. You wouldn?t take two measures and then try
to estimate variance, but that is what you?re asking R to do if you treat
it as random. Beyond that there is a lot of debate. But many people think
<http://stats.stackexchange.com/questions/37647/minimum-
number-of-levels-for-a-random-effects-factor>
you
should have at least 5 levels (e.g. 5 sites) before you treat something as
random"

Actually I've seen a lot of GLMMs done with random factors with just 2
levels. Is it acceptable or not?

Thanks in advance,

Michele

--
Research Associate @ NPSY-Lab.VR - University of Verona
Research Associate @ AgliotiLab - University of Rome "La Sapienza"
Iscrizione all'albo A dell'Ordine degli Psicologi del Veneto n.7733

office tel. 0039 045 802 8401

*http://agliotilab.org/lab-staff/phd-students/3rd-year/
michele-scandola#anchor
<http://agliotilab.org/lab-staff/phd-students/3rd-year/
michele-scandola#anchor>*
http://profs.formazione.univr.it/npsy-labvr/michele-scandola/
http://scholar.google.it/citations?user=mRc0hxsAAAAJ
http://it.linkedin.com/pub/michele-scandola/24/967/313

*Le informazioni, i dati e le notizie contenute nella presente
comunicazione e i relativi allegati sono di natura privata e come tali
possono essere riservate e sono, comunque, destinate esclusivamente ai
destinatari indicati in epigrafe. La diffusione, distribuzione e/o la
copiatura del documento  trasmesso da parte di qualsiasi soggetto diverso
dal destinatario ? proibita, sia ai sensi dell?art. 616 c.p., sia ai sensi
del D.Lgs. n. 196/2003. Se avete ricevuto questo messaggio per errore, vi
preghiamo di distruggerlo e di darcene immediata comunicazione anche
inviando un messaggio di ritorno all?indirizzo e-mail del mittente.*
*This e-mail (including attachments) is intended only for the recipient(s)
named above. It may contain confidential or privileged information and
should not be read, copied or otherwise used by any other person. If you
are not the named recipient, please contact npsylab.vr at gmail.com
<npsylab.vr at gmail.com> and delete the e-mail from your system. Rif. D.L.
196/2003.*

        [[alternative HTML version deleted]]

_______________________________________________
R-sig-mixed-models at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models

Conor Michael Goold

Fri, Apr 14, 2017 12:37 PM #

Dear Michele,


I don't have a good reference to hand, but have a look at Gaussian process models and there will be lots of information. The grouping factor could still be discrete, but Gaussian processes allow the modelling of the covariances between "random effects". For instance, the modelling of spatial autocorrelation within hierarchical levels.


Actually, Richard McElreath has an example of the above in his book Statistical Rethinking, where he models the number of tools in different cultures using a Poisson regression, and using a Gaussian process to represent the correlations among the different cultures (since the number of tools by cultures are spatially correlated).?


Best

Conor

From: drs.strange at gmail.com <drs.strange at gmail.com> on behalf of Michele Scandola <michele.scandola at gmail.com>
Sent: Friday, April 14, 2017 7:33 PM
To: Conor Michael Goold
Cc: r-sig-mixed-models at r-project.org
Subject: Re: [R-sig-ME] Fwd: Continuous variable as random slope and the minimum number of levels for a categorical variable to be treated as random

Dear Conor,

Thanks a lot for your answers. I don't know why but I have misunderstood the article, I've thought it was talking about random slopes. Now it makes sense.
However I didn't know that even continuous, ordered variables can be used as grouping factors. Do you have any reference about that? The link I've shared clearly states it is not possible.
However, in your example you have spoken about age. May be a good idea to use it as a nested grouping factor in the participant grouping factor? I mean something like (1|subject:age).

Best regards,

Michele


Il 14 Apr 2017 12:50 PM, "Conor Michael Goold" <conor.goold at nmbu.no<mailto:conor.goold at nmbu.no>> ha scritto:
Hi,

The post you link to is to treating "random effect" solely as the blocking factor or hierarchical grouping factor in the model, when one wants to estimate different intercept parameters for each of the grouping factors. For instance, when observations are nested within individuals as in the sleep study, then individuals are the grouping factor or the "random effect" and will have their own intercept. Actually, in one of the comments (second one), the author admits he doesn't include the topic of random slopes for brevity. But even with random slope terms, the slope is varying with respect to the same blocking factor as the intercept.

However, continuous variables that respect order (e.g. different ages) can also be treated as random effects or grouping variables through Gaussian process models.

When you say you have seen GLMMs with only 2 levels, do you mean random slopes or random intercepts? I'm guessing the former based on your first question.

The minimum size for a discrete grouping factor is dependent on the exact context (e.g. how many parameters are being estimated), but many recommend 5 as a minimum (although, this would only stand for the simplest of models) and more is always better. For instance, Stegmueller 2013 (http://onlinelibrary.wiley.com/doi/10.1111/ajps.12001/abstract) says that having at least 15-20 levels of the grouping factor in ML estimation is best, whereas Bayesian methods are more robust at lower number of levels per grouping factor.

Also, as another commenter discussed, the random/fixed effect terms can be confusing and perhaps a better way to think about these sorts of models is simply whether parameters vary by some grouping factor or not. Thus, you could have intercepts or slopes varying with respect to a grouping factor. I prefer to write "Intercepts and the slope of predictor X varied by each individual" rather than "Random intercepts and slopes were included" because I think it's ultimately clearer about what is being done and what readers can expect from the analysis.

Best regards
Conor Goold
PhD Student
Phone:        +47 67 23 27 24<tel:%2B47%2067%2023%2027%2024>



Norwegian University of Life Sciences
Campus ?s. www.nmbu.no<http://www.nmbu.no>

________________________________________
From: R-sig-mixed-models <r-sig-mixed-models-bounces at r-project.org<mailto:r-sig-mixed-models-bounces at r-project.org>> on behalf of Michele Scandola <michele.scandola at gmail.com<mailto:michele.scandola at gmail.com>>
Sent: Friday, April 14, 2017 12:05 PM
To: r-sig-mixed-models at r-project.org<mailto:r-sig-mixed-models at r-project.org>
Subject: [R-sig-ME] Fwd: Continuous variable as random slope and the minimum number of levels for a categorical variable to be treated as random

Dear all,

I've recently read in this page (https://dynamicecology.
wordpress.com/2015/11/04/is-it-a-fixed-or-random-effect/<http://wordpress.com/2015/11/04/is-it-a-fixed-or-random-effect/>) the following
text "First you CANNOT treat a continuous variable as a random effect. So
if you are putting area or temperature or body size is in they may be a
nuisance/control variable but they are a fixed effect. Of course you are
only estimating one parameter (the slope) so there is no degree of freedom
cost to treating it as random. And it makes no sense to ask what is the
variance across a continuous variable."
Actually I don't know why it doesn't make any sense ask what is the
variance across a continuous variable.
I've seen the classical example on sleepstudy data which treats a cntinuous
variable as random slope:
fm1 <- lmer (Reaction~Days+(Days|Subject), sleepstudy)
with sleepstudy$Days being a continuous variable, and lmer estimates the
variance of the Days slope.

So... is it OK to use a continuous variable as random slope or not?

Furthermore the post says: "[...] you should not treat a categorical
variable with only two levels (e.g. two sites), also known as a binary
variable, as a random effect. You wouldn?t take two measures and then try
to estimate variance, but that is what you?re asking R to do if you treat
it as random. Beyond that there is a lot of debate. But many people think
<http://stats.stackexchange.com/questions/37647/minimum-number-of-levels-for-a-random-effects-factor>
you
should have at least 5 levels (e.g. 5 sites) before you treat something as
random"

Actually I've seen a lot of GLMMs done with random factors with just 2
levels. Is it acceptable or not?

Thanks in advance,

Michele


--
Research Associate @ NPSY-Lab.VR - University of Verona
Research Associate @ AgliotiLab - University of Rome "La Sapienza"
Iscrizione all'albo A dell'Ordine degli Psicologi del Veneto n.7733

office tel. 0039 045 802 8401<tel:0039%20045%20802%208401>

*http://agliotilab.org/lab-staff/phd-students/3rd-year/michele-scandola#anchor
<http://agliotilab.org/lab-staff/phd-students/3rd-year/michele-scandola#anchor>*
http://profs.formazione.univr.it/npsy-labvr/michele-scandola/
http://scholar.google.it/citations?user=mRc0hxsAAAAJ
http://it.linkedin.com/pub/michele-scandola/24/967/313



*Le informazioni, i dati e le notizie contenute nella presente
comunicazione e i relativi allegati sono di natura privata e come tali
possono essere riservate e sono, comunque, destinate esclusivamente ai
destinatari indicati in epigrafe. La diffusione, distribuzione e/o la
copiatura del documento  trasmesso da parte di qualsiasi soggetto diverso
dal destinatario ? proibita, sia ai sensi dell?art. 616 c.p., sia ai sensi
del D.Lgs. n. 196/2003. Se avete ricevuto questo messaggio per errore, vi
preghiamo di distruggerlo e di darcene immediata comunicazione anche
inviando un messaggio di ritorno all?indirizzo e-mail del mittente.*
*This e-mail (including attachments) is intended only for the recipient(s)
named above. It may contain confidential or privileged information and
should not be read, copied or otherwise used by any other person. If you
are not the named recipient, please contact npsylab.vr at gmail.com<mailto:npsylab.vr at gmail.com>
<npsylab.vr at gmail.com<mailto:npsylab.vr at gmail.com>> and delete the e-mail from your system. Rif. D.L.
196/2003.*

        [[alternative HTML version deleted]]

_______________________________________________
R-sig-mixed-models at r-project.org<mailto:R-sig-mixed-models at r-project.org> mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models