Hello!
I'm confused about the error the lmer function sometimes gives, "Error: number of observations (=n) <= number of random effects (=n) for term (x| id); the random-effects parameters and the residual variance (or scale parameter) are probably unidentifiable"
Regarding this error, I'm confused about how the "number of random effects" is defined. My na?ve understanding was that if you have, say, 10 clusters, then a random intercept model estimates 10 random effects (i.e., a random intercept for all clusters). If you have one random intercept and one random slope, the model would estimate 10+10+1=21 random effects (random intercept and slope for all participants plus the correlation between them). With 2 random intercepts and 2 slopes it would be 10+10+10+10+2 (or 3, depending on the exact random effects syntax)=42 (or 43).
However, experimenting a little showed me that the correlations between random effects do not seem to be included into the number of random effects, so I'll forget them for now.
My main puzzlement comes from this: I generated a simple dataset of 10 clusters ("id") and 3 observations per cluster ("time"), as well as a level 1 predictor ("x") and ran the following models:
mod1<-lmer(y ~ x + time + (time|id) + (x|id), data=d)
This model converges (though the correlations between random effects are 1 and -1, which is probably just due to my sloppy data generation process) with no errors or warnings.
Then I ran this model:
mod2<-lmer(y ~ x + time + (time + x |id), data=d)
This model won't converge and I get the error:
Error: number of observations (=30) <= number of random effects (=30) for term (time + x | id); the random-effects parameters and the residual variance (or scale parameter) are probably unidentifiable
********************
I thought that the first model would estimate 40 random effects (two intercepts and two slopes for each of the 10 clusters), and the second model would estimate 30 (1 intercept and 2 slopes for each cluster). This seems to be correct regarding the second model, but why is the first model seemingly estimating less random effects (less than 40, and apparently also less than 30)?
I do apologize if this is very basic; I don't have math or proper stats background, just applied stats. I did read the manual, and several online discussions regarding this error before posting. I didn't find the answer in the manual (this may well be due to my own incompetence), and the online discussions (e.g. https://stats.stackexchange.com/questions/193678/number-of-random-effects-is-not-correct-in-lmer-model) seem to support my initial intuition, which however is clearly wrong. What am I missing?
Thank you in advance if someone can help!
-Sointu
*********************
My code for the above:
set.seed(12345)
id1<-c(1:10)
id<-rep(id1, each=3)
t<-c(1:3)
time<-rep(t, times=10)
x<-rnorm(30, 3,1)
err<-rnorm(10,0,1)
err2<-rep(err, each=3)
y<-3+0.2*x+0.3*time+rnorm(30)+err2
d<-data.frame(id, time, x, y)
mod1<-lmer(y ~ x + time + (time|id) + (x|id),data=d)
summary(mod1)
mod2<-lmer(y ~ x + time + (time + x|id),data=d)
Number of random effects estimated with different lmer specifications
3 messages · Thierry Onkelinx, Leikas, Sointu S
Dear Sointu, You only need to count the number of parameters contributing to the linear predictor. Hence not the parameters of the variance-covariance matrix. 1 random intercept + 2 random slopes = 3 parameters per ID Times 10 ID yields 30 parameters Your mod1 actually requires 2 times (1 random intercept + 1 random slope) times 10 ID = 40 random effect parameters. IMHO lme4 should warn for this too. ranef(mod1)$id Best regards, ir. Thierry Onkelinx Statisticus / Statistician Vlaamse Overheid / Government of Flanders INSTITUUT VOOR NATUUR- EN BOSONDERZOEK / RESEARCH INSTITUTE FOR NATURE AND FOREST Team Biometrie & Kwaliteitszorg / Team Biometrics & Quality Assurance thierry.onkelinx at inbo.be Havenlaan 88 bus 73, 1000 Brussel *Postadres:* Koning Albert II-laan 15 bus 186, 1210 Brussel *Poststukken die naar dit adres worden gestuurd, worden ingescand en digitaal aan de geadresseerde bezorgd. Zo kan de Vlaamse overheid haar dossiers volledig digitaal behandelen. Poststukken met de vermelding ?vertrouwelijk? worden niet ingescand, maar ongeopend aan de geadresseerde bezorgd.* www.inbo.be /////////////////////////////////////////////////////////////////////////////////////////// To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher The plural of anecdote is not data. ~ Roger Brinner The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data. ~ John Tukey /////////////////////////////////////////////////////////////////////////////////////////// <https://www.inbo.be> Op wo 8 mei 2024 om 10:51 schreef Leikas, Sointu S < sointu.leikas at helsinki.fi>:
Hello!
I'm confused about the error the lmer function sometimes gives, "Error:
number of observations (=n) <= number of random effects (=n) for term (x|
id); the random-effects parameters and the residual variance (or scale
parameter) are probably unidentifiable"
Regarding this error, I'm confused about how the "number of random
effects" is defined. My na?ve understanding was that if you have, say, 10
clusters, then a random intercept model estimates 10 random effects (i.e.,
a random intercept for all clusters). If you have one random intercept and
one random slope, the model would estimate 10+10+1=21 random effects
(random intercept and slope for all participants plus the correlation
between them). With 2 random intercepts and 2 slopes it would be
10+10+10+10+2 (or 3, depending on the exact random effects syntax)=42 (or
43).
However, experimenting a little showed me that the correlations between
random effects do not seem to be included into the number of random
effects, so I'll forget them for now.
My main puzzlement comes from this: I generated a simple dataset of 10
clusters ("id") and 3 observations per cluster ("time"), as well as a level
1 predictor ("x") and ran the following models:
mod1<-lmer(y ~ x + time + (time|id) + (x|id), data=d)
This model converges (though the correlations between random effects are 1
and -1, which is probably just due to my sloppy data generation process)
with no errors or warnings.
Then I ran this model:
mod2<-lmer(y ~ x + time + (time + x |id), data=d)
This model won't converge and I get the error:
Error: number of observations (=30) <= number of random effects (=30) for
term (time + x | id); the random-effects parameters and the residual
variance (or scale parameter) are probably unidentifiable
********************
I thought that the first model would estimate 40 random effects (two
intercepts and two slopes for each of the 10 clusters), and the second
model would estimate 30 (1 intercept and 2 slopes for each cluster). This
seems to be correct regarding the second model, but why is the first model
seemingly estimating less random effects (less than 40, and apparently also
less than 30)?
I do apologize if this is very basic; I don't have math or proper stats
background, just applied stats. I did read the manual, and several online
discussions regarding this error before posting. I didn't find the answer
in the manual (this may well be due to my own incompetence), and the online
discussions (e.g.
https://stats.stackexchange.com/questions/193678/number-of-random-effects-is-not-correct-in-lmer-model)
seem to support my initial intuition, which however is clearly wrong. What
am I missing?
Thank you in advance if someone can help!
-Sointu
*********************
My code for the above:
set.seed(12345)
id1<-c(1:10)
id<-rep(id1, each=3)
t<-c(1:3)
time<-rep(t, times=10)
x<-rnorm(30, 3,1)
err<-rnorm(10,0,1)
err2<-rep(err, each=3)
y<-3+0.2*x+0.3*time+rnorm(30)+err2
d<-data.frame(id, time, x, y)
mod1<-lmer(y ~ x + time + (time|id) + (x|id),data=d)
summary(mod1)
mod2<-lmer(y ~ x + time + (time + x|id),data=d)
[[alternative HTML version deleted]]
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
Dear Thierry, thank you very much for the quick and helpful reply! I thought there had to be 40 but I trusted too much the error message/lack of it :) This really clarified things for me! all the best, Sointu From: Thierry Onkelinx <thierry.onkelinx at inbo.be> Sent: keskiviikko 8. toukokuuta 2024 12.02 To: Leikas, Sointu S <sointu.leikas at helsinki.fi> Cc: r-sig-mixed-models at r-project.org Subject: Re: [R-sig-ME] Number of random effects estimated with different lmer specifications Dear Sointu, You only need to count the number of parameters contributing to the linear predictor. Hence not the parameters of the variance-covariance matrix. 1 random intercept + 2 random slopes = 3 parameters per ID Times 10 ID yields 30 parameters Your mod1 actually requires 2 times (1 random intercept + 1 random slope) times 10 ID = 40 random effect parameters. IMHO lme4 should warn for this too. ranef(mod1)$id Best regards, ir. Thierry Onkelinx Statisticus / Statistician Vlaamse Overheid / Government of Flanders INSTITUUT VOOR NATUUR- EN BOSONDERZOEK / RESEARCH INSTITUTE FOR NATURE AND FOREST Team Biometrie & Kwaliteitszorg / Team Biometrics & Quality Assurance thierry.onkelinx at inbo.be<mailto:thierry.onkelinx at inbo.be> Havenlaan 88 bus 73, 1000 Brussel Postadres: Koning Albert II-laan 15 bus 186, 1210 Brussel Poststukken die naar dit adres worden gestuurd, worden ingescand en digitaal aan de geadresseerde bezorgd. Zo kan de Vlaamse overheid haar dossiers volledig digitaal behandelen. Poststukken met de vermelding ?vertrouwelijk? worden niet ingescand, maar ongeopend aan de geadresseerde bezorgd. www.inbo.be<http://www.inbo.be> /////////////////////////////////////////////////////////////////////////////////////////// To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher The plural of anecdote is not data. ~ Roger Brinner The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data. ~ John Tukey /////////////////////////////////////////////////////////////////////////////////////////// [https://inbo-website-prd-532750756126.s3-eu-west-1.amazonaws.com/inbologoleeuw_nl.png]<https://www.inbo.be/> Op wo 8 mei 2024 om 10:51 schreef Leikas, Sointu S <sointu.leikas at helsinki.fi<mailto:sointu.leikas at helsinki.fi>>: Hello! I'm confused about the error the lmer function sometimes gives, "Error: number of observations (=n) <= number of random effects (=n) for term (x| id); the random-effects parameters and the residual variance (or scale parameter) are probably unidentifiable" Regarding this error, I'm confused about how the "number of random effects" is defined. My na?ve understanding was that if you have, say, 10 clusters, then a random intercept model estimates 10 random effects (i.e., a random intercept for all clusters). If you have one random intercept and one random slope, the model would estimate 10+10+1=21 random effects (random intercept and slope for all participants plus the correlation between them). With 2 random intercepts and 2 slopes it would be 10+10+10+10+2 (or 3, depending on the exact random effects syntax)=42 (or 43). However, experimenting a little showed me that the correlations between random effects do not seem to be included into the number of random effects, so I'll forget them for now. My main puzzlement comes from this: I generated a simple dataset of 10 clusters ("id") and 3 observations per cluster ("time"), as well as a level 1 predictor ("x") and ran the following models: mod1<-lmer(y ~ x + time + (time|id) + (x|id), data=d) This model converges (though the correlations between random effects are 1 and -1, which is probably just due to my sloppy data generation process) with no errors or warnings. Then I ran this model: mod2<-lmer(y ~ x + time + (time + x |id), data=d) This model won't converge and I get the error: Error: number of observations (=30) <= number of random effects (=30) for term (time + x | id); the random-effects parameters and the residual variance (or scale parameter) are probably unidentifiable ******************** I thought that the first model would estimate 40 random effects (two intercepts and two slopes for each of the 10 clusters), and the second model would estimate 30 (1 intercept and 2 slopes for each cluster). This seems to be correct regarding the second model, but why is the first model seemingly estimating less random effects (less than 40, and apparently also less than 30)? I do apologize if this is very basic; I don't have math or proper stats background, just applied stats. I did read the manual, and several online discussions regarding this error before posting. I didn't find the answer in the manual (this may well be due to my own incompetence), and the online discussions (e.g. https://stats.stackexchange.com/questions/193678/number-of-random-effects-is-not-correct-in-lmer-model) seem to support my initial intuition, which however is clearly wrong. What am I missing? Thank you in advance if someone can help! -Sointu ********************* My code for the above: set.seed(12345) id1<-c(1:10) id<-rep(id1, each=3) t<-c(1:3) time<-rep(t, times=10) x<-rnorm(30, 3,1) err<-rnorm(10,0,1) err2<-rep(err, each=3) y<-3+0.2*x+0.3*time+rnorm(30)+err2 d<-data.frame(id, time, x, y) mod1<-lmer(y ~ x + time + (time|id) + (x|id),data=d) summary(mod1) mod2<-lmer(y ~ x + time + (time + x|id),data=d) _______________________________________________ R-sig-mixed-models at r-project.org<mailto:R-sig-mixed-models at r-project.org> mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models