Singularity and the 3 Level Hierarchy

Hi Kim,

Re the 2 different models random effects - Model1) Cell:Subject vs model 2) 1|Cell

Seems to me that what this is telling us is that there is no evidence of differences between subjects, but there is evidence of differences between cells. So to include that info in the model there are at least 2 ways one might go about it - based on how the cell info is coded up in the data:
1) If the data is coded up so each cell has a different indicator value e.g. subject 1 has cell 1, cell 2 and cell 3, while subject 2 has cell 4 and cell 5, etc. Then you can assign a different random intercept to each cell by including "1|Cell" in the model call.
2) However, if the data is coded up as the cell # for each subject e.g. subject 1 has cell 1, cell 2 and cell 3, while subject 2 has cell 1 and cell 2, etc. Then you need to include this in the model as Cell:Subject to get a different random intercept for each cell. Because if you included the base "Cell" variable it would fit a single random intercept that represents everyone's cell 1, and then a single random intercept that covers everyone's cell 2, etc

So....If I have got that right.....?? I think both of the above methods give the same model i.e. different random intercept for each cell? (which is valid) But how you specify it depends on how cell is coded up in the data. 

As to whether yr Model2<-lmer(Y~Groupf+(1|Cell),REML=FALSE,data=file12) is valid. Well that depends on how the cell variable is coded up. I suspect it is likely coded up as Point 2) above - meaning it isn't.

Chris Howden | Statistical Lead
Statistical Consulting Unit | Sydney Informatics Hub | Core Research Facilities
+61 410 689 945 | christopher.howden at sydney.edu.au
Statistical Resources and Workflows/Workshops

Acknowledgement of SIH workflows and statistical consulting ensures the continuation of these free services. Please consider the suggested wording on our Acknowledgements page to acknowledge our contributions in any arising publications.

CRICOS 00026A
This email plus any attachments to it are confidential. Any unauthorised use is strictly prohibited. If you receive this email in error, please delete it and any attachments Please think of our environment and only print this email if necessary

-----Original Message-----
From: R-sig-mixed-models <r-sig-mixed-models-bounces at r-project.org> On Behalf Of Kim Pearce via R-sig-mixed-models
Sent: Tuesday, 25 February 2025 5:15 AM
To: kalman.toth <kalman.toth at protonmail.com>
Cc: r-sig-mixed-models at r-project.org
Subject: Re: [R-sig-ME] Singularity and the 3 Level Hierarchy

Dear Kalman,

Thank you so much for your reply to the question I sent to the list last week.

Just a follow up to some of your points?

~I don?t see any reason to use REML = FALSE in your case. Given what you described, I?d definitely stick with REML = TRUE?it?s the better choice when you want to properly account for the random effect structure.
~anova() or AIC() can be used to compare models with different random effect structures.

I have read that REML is said to provide more accurate estimates of random variances whereas ML is said to produce more accurate estimates of fixed regression parameters.

Additionally, I have seen the following stipulated:

1.       REML to compare models with nested random effects and the same fixed effect structure

2.       ML to compare models with nested fixed effects and the same random effect structure

3.       ML to compare models with and without random effects

For each of 1,2 and 3 above I assume that ?model comparison? can be done via the comparison of, for example, AIC values where ?smaller is better?. However, Andy Field in his text ?Discovering Statistics Using IBM SPSS Statistics? (2013, Edition 4: page 826 & page 835) only uses ML when comparing two linear mixed models via the likelihood ratio test (he states that the likelihood ratio test ?works only if full maximum likelihood estimation is used (and not REML)?.

In fact, in the past, when I have used the anova() function in R to compare linear mixed models which have been fitted using REML=TRUE, R stipulates in the output that the models are ?refitted with ML (instead of REML)? before comparison takes place.

Additionally, Field stipulates that the likelihood ratio test requires that the ?new model must contain  all of the effects of the older model? i.e. that the models are nested.

~A worked example would help.

~That singular fit warning probably means that one of your parameters is estimated at (or very close to) zero. In most cases, it?s fine to just drop that parameter?unless you have a strong scientific reason to keep it.

My example was hypothetical but , say, we had j subjects and k cells in total for these j subjects and we ran the model:

Model1<-lmer(Y~Groupf+(1|Subject/Cell),REML=FALSE,data=file1)

which gave the following:

Random effects

Groups         Name                    Variance

Cell:Subject  (Intercept)              2.049e-01

Subject       (Intercept)              2.978e-09

We can definitely see singularity here (i.e. the estimated random intercept variance is virtually zero  for the j intercepts at the Subject level and approaching zero for the k intercepts at the Subject x Cell level).

In the above, it looks as if we could try dropping the random intercepts at the Subject level in order to get a non singular fit i.e. fit:

Model2<-lmer(Y~Groupf+(1|Cell),REML=FALSE,data=file1)

I was just concerned that this 2 level model implies that the ?top level? of the structure is Cell when, in truth, the ?top level? is Subject (as cells are nested within subjects in my hypothetical study).  Your message implies that Model2 above would, in fact, still be valid.

I would be interested to hear if you (or anyone else) have any further views.

Kindest regards,
Kim

-----Original Message-----
From: kalman.toth <kalman.toth at protonmail.com>
Sent: 22 February 2025 09:19
To: Kim Pearce <kim.pearce at newcastle.ac.uk>
Cc: r-sig-mixed-models at r-project.org
Subject: Re: [R-sig-ME] Singularity and the 3 Level Hierarchy

? External sender. Take care when opening links or attachments. Do not provide your login details.

Hi Kim,

I am just a scientist who often works with repeated measures data and not a statistician but I would like to add a few comments and others might amend those or add more.

1) I don?t see any reason to use REML = FALSE in your case. Given what you described, I?d definitely stick with REML = TRUE?it?s the better choice when you want to properly account for the random effect structure.

2) A work example would help.

3) What model you use should be primarily based on your experimental design and your scientific knowledge of the field.

4) That singular fit warning probably means that one of your parameters is estimated at (or very close to) zero. In most cases, it?s fine to just drop that parameter?unless you have a strong scientific reason to keep it. Also, if your experimental design is fully nested, you might want to try adding the interaction term ('Subject:Cell') instead of just 'Cell'.

5) anova() or AIC() can be used to compare models with different random effect structures.

Best Regards,

Kalman Toth

Singularity and the 3 Level Hierarchy

Thread (6 messages)