Grouping variables technically suitable for modeling - R-SIG-mixed-models

Sun, Nov 7, 2021 2:20 PM #

Dear Experts,

Apologies if this question has come up before. But I'm looking for
published references that provide guidance on when one or more grouping
variables that theoretically need to be random factors can also
"technically" be used as random factors?

For example, I have heard for a grouping variable to be technically taken
as a random factor, it needs to have at least 10 or so unique categories?
(Any reference to confirm or disconfirm this?)

For example, I have heard for two grouping variables to be technically
taken as random factors, they each need to have a sufficiently different
number of unique categories relative to the other one. Otherwise, their
variance components can't be distinguished from one another and thus only
one of them can be taken as random, not both (Any reference to confirm or
disconfirm this?)

Thanks,
Tim M

Ben Bolker

Mon, Nov 8, 2021 5:34 PM #

This is a bit of a "how long is a piece of string" question ...


   The "5-6 levels of a grouping variable" rule of thumb is quoted in 
various places: a variety of those references (Gelman and Hill 2006, 
K?ry and Royle 2015, Harrison et al 2018, Arnqvist 2020) are collected 
by Gomes 
(https://www.biorxiv.org/content/10.1101/2021.04.11.439357v2.full).

   I sort of see what you mean by your second paragraph, but can you 
give an example?

On 11/7/21 5:20 PM, Timothy MacKenzie wrote:

_______________________________________________
R-sig-mixed-models at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models

Timothy MacKenzie

Tue, Nov 9, 2021 6:18 AM #

Dear Ben,

Thank you for sharing the references regarding my first question.

Regarding my second question, I simply mean if we have say ID1 and ID2,
then for ID2 to be distinguishably nested in ID1, it needs to have a
different unique categories relative to those of ID1.

For example, if ID1 has 120 unique categories and ID2 has 130
unique categories nested in ID1, then the variance components for ID1 and
ID2 are not distinguishable from each other. As a result, only one of them
can be added as a random effect; either (1 | ID1) or (1 | |ID2), but not (1
| ID1/ID2).

Is this correct and is there a published reference confirming or
disconfirming this?

Thanks,
Tim M

On Mon, Nov 8, 2021 at 7:35 PM Ben Bolker <bbolker at gmail.com> wrote:

_______________________________________________
R-sig-mixed-models at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models

_______________________________________________
R-sig-mixed-models at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models

Thierry Onkelinx

Tue, Nov 9, 2021 7:02 AM #

Dear Timothy,

I would expect in your example that the combined effect of ID1 and ID2 will
be more or less equally split over ID1 and ID2. As this would yield a lower
penalty then attributing the effect fully to either ID1 or ID2. Hence the
random effect variances of 1|ID1/ID2 will be a lot smaller than 1|ID1 or
1|ID2.

As ID2 defines almost the same grouping as ID1, it doesn't make sense to
include both of them in the model.

I have no reference at hand for this. Just common sense.

ir. Thierry Onkelinx
Statisticus / Statistician

Vlaamse Overheid / Government of Flanders
INSTITUUT VOOR NATUUR- EN BOSONDERZOEK / RESEARCH INSTITUTE FOR NATURE AND
FOREST
Team Biometrie & Kwaliteitszorg / Team Biometrics & Quality Assurance
thierry.onkelinx at inbo.be
Havenlaan 88 bus 73, 1000 Brussel
www.inbo.be

///////////////////////////////////////////////////////////////////////////////////////////
To call in the statistician after the experiment is done may be no more
than asking him to perform a post-mortem examination: he may be able to say
what the experiment died of. ~ Sir Ronald Aylmer Fisher
The plural of anecdote is not data. ~ Roger Brinner
The combination of some data and an aching desire for an answer does not
ensure that a reasonable answer can be extracted from a given body of data.
~ John Tukey
///////////////////////////////////////////////////////////////////////////////////////////

<https://www.inbo.be>

Op di 9 nov. 2021 om 15:19 schreef Timothy MacKenzie <fswfswt at gmail.com>:

Dear Ben,

Thank you for sharing the references regarding my first question.

Regarding my second question, I simply mean if we have say ID1 and ID2,
then for ID2 to be distinguishably nested in ID1, it needs to have a
different unique categories relative to those of ID1.

For example, if ID1 has 120 unique categories and ID2 has 130
unique categories nested in ID1, then the variance components for ID1 and
ID2 are not distinguishable from each other. As a result, only one of them
can be added as a random effect; either (1 | ID1) or (1 | |ID2), but not (1
| ID1/ID2).

Is this correct and is there a published reference confirming or
disconfirming this?

Thanks,
Tim M

On Mon, Nov 8, 2021 at 7:35 PM Ben Bolker <bbolker at gmail.com> wrote:

    This is a bit of a "how long is a piece of string" question ...


   The "5-6 levels of a grouping variable" rule of thumb is quoted in
various places: a variety of those references (Gelman and Hill 2006,
K?ry and Royle 2015, Harrison et al 2018, Arnqvist 2020) are collected
by Gomes
(https://www.biorxiv.org/content/10.1101/2021.04.11.439357v2.full).

   I sort of see what you mean by your second paragraph, but can you
give an example?


On 11/7/21 5:20 PM, Timothy MacKenzie wrote:

Dear Experts,

Apologies if this question has come up before. But I'm looking for
published references that provide guidance on when one or more grouping
variables that theoretically need to be random factors can also
"technically" be used as random factors?

For example, I have heard for a grouping variable to be technically

taken

as a random factor, it needs to have at least 10 or so unique

categories?

(Any reference to confirm or disconfirm this?)

For example, I have heard for two grouping variables to be technically
taken as random factors, they each need to have a sufficiently

different

number of unique categories relative to the other one. Otherwise, their
variance components can't be distinguished from one another and thus

only

one of them can be taken as random, not both (Any reference to confirm

or

disconfirm this?)

Thanks,
Tim M

      [[alternative HTML version deleted]]

_______________________________________________
R-sig-mixed-models at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models

_______________________________________________
R-sig-mixed-models at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models

_______________________________________________
R-sig-mixed-models at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models

Timothy MacKenzie

Tue, Nov 9, 2021 9:30 AM #

Dear Thierry,

That "As ID2 defines almost the same grouping as ID1, it doesn't make
sense to include both of them in the model." makes good sense.

Thanks!
Tim M

On Tue, Nov 9, 2021 at 9:03 AM Thierry Onkelinx

<thierry.onkelinx at inbo.be> wrote:

Dear Timothy,

I would expect in your example that the combined effect of ID1 and ID2 will be more or less equally split over ID1 and ID2. As this would yield a lower penalty then attributing the effect fully to either ID1 or ID2. Hence the random effect variances of 1|ID1/ID2 will be a lot smaller than 1|ID1 or 1|ID2.

As ID2 defines almost the same grouping as ID1, it doesn't make sense to include both of them in the model.

I have no reference at hand for this. Just common sense.

ir. Thierry Onkelinx
Statisticus / Statistician

Vlaamse Overheid / Government of Flanders
INSTITUUT VOOR NATUUR- EN BOSONDERZOEK / RESEARCH INSTITUTE FOR NATURE AND FOREST
Team Biometrie & Kwaliteitszorg / Team Biometrics & Quality Assurance
thierry.onkelinx at inbo.be
Havenlaan 88 bus 73, 1000 Brussel
www.inbo.be

///////////////////////////////////////////////////////////////////////////////////////////
To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher
The plural of anecdote is not data. ~ Roger Brinner
The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data. ~ John Tukey
///////////////////////////////////////////////////////////////////////////////////////////




Op di 9 nov. 2021 om 15:19 schreef Timothy MacKenzie <fswfswt at gmail.com>:

Dear Ben,

Thank you for sharing the references regarding my first question.

Regarding my second question, I simply mean if we have say ID1 and ID2,
then for ID2 to be distinguishably nested in ID1, it needs to have a
different unique categories relative to those of ID1.

For example, if ID1 has 120 unique categories and ID2 has 130
unique categories nested in ID1, then the variance components for ID1 and
ID2 are not distinguishable from each other. As a result, only one of them
can be added as a random effect; either (1 | ID1) or (1 | |ID2), but not (1
| ID1/ID2).

Is this correct and is there a published reference confirming or
disconfirming this?

Thanks,
Tim M

On Mon, Nov 8, 2021 at 7:35 PM Ben Bolker <bbolker at gmail.com> wrote:

    This is a bit of a "how long is a piece of string" question ...


   The "5-6 levels of a grouping variable" rule of thumb is quoted in
various places: a variety of those references (Gelman and Hill 2006,
K?ry and Royle 2015, Harrison et al 2018, Arnqvist 2020) are collected
by Gomes
(https://www.biorxiv.org/content/10.1101/2021.04.11.439357v2.full).

   I sort of see what you mean by your second paragraph, but can you
give an example?


On 11/7/21 5:20 PM, Timothy MacKenzie wrote:

Dear Experts,

Apologies if this question has come up before. But I'm looking for
published references that provide guidance on when one or more grouping
variables that theoretically need to be random factors can also
"technically" be used as random factors?

For example, I have heard for a grouping variable to be technically taken
as a random factor, it needs to have at least 10 or so unique categories?
(Any reference to confirm or disconfirm this?)

For example, I have heard for two grouping variables to be technically
taken as random factors, they each need to have a sufficiently different
number of unique categories relative to the other one. Otherwise, their
variance components can't be distinguished from one another and thus only
one of them can be taken as random, not both (Any reference to confirm or
disconfirm this?)

Thanks,
Tim M

      [[alternative HTML version deleted]]

_______________________________________________
R-sig-mixed-models at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models

_______________________________________________
R-sig-mixed-models at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models

_______________________________________________
R-sig-mixed-models at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models