Nested model variance/parameter value - R-SIG-mixed-models

Fri, Dec 10, 2021 3:29 AM #

I am a novice in mixed models, and I am trying to fit a model to a 
survey data with an interval-scale dependent variable (hon), four 
fixed-effect variables (sex, age, schooling, and questions) and two 
random effects. The random effects are interviewer (intv) and 
interviewee (ID), and as such, they are in a nested relationship. Sex, 
age and questions are found to be in an interacting relationship.

A major question I am asking here is whether the interviewer effect is 
significant or not, so I tried the following intercept-only models, 
with model 1 using the nested model, model 2 only the interviewer 
effect, and model 3 only the interviewee effect:

model1 <- lmer(hon ~ sex * age * Question + schooling + (1|intv/ID)
model2 <- lmer(hon ~ sex * age * Question + schooling + (1|intv)
model3 <- lmer(hon ~ sex * age * Question + schooling + (1|ID)

The output from each model says the following:

model 1:
Random effects:
  Groups   Name        Variance Std.Dev.
  ID:intv  (Intercept) 0.03988  0.1997
  intv     (Intercept) 0.00000  0.0000
  Residual             0.16847  0.4105
Number of obs: 3283, groups:  ID:intv, 305; intv, 28

model 2:
Random effects:
  Groups   Name        Variance Std.Dev.
  intv     (Intercept) 0.002348 0.04846
  Residual             0.205998 0.45387
Number of obs: 3283, groups:  intv, 28

model 3:
Random effects:
  Groups   Name        Variance Std.Dev.
  ID       (Intercept) 0.04107  0.2027
  Residual             0.16894  0.4110
Number of obs: 3294, groups:  ID, 306

The respective Log likelihood and AIC values are:

model1	AIC = 4249.232  LL = -2076.616 (df=48)
model2	AIC = 4539.69   LL = -2222.845 (df=47)
model3	AIC = 4274.99   LL = -2090.495 (df=47)

Since I got an error message saying "models were not all fitted to the 
same size of dataset" while running anova(), I compared the AICs and 
concluded that model2 is the best model of the three.

Here I have three questions:

1. Why is the variance for the interviewer effect(intv) zero? Is it 
necessarily so because of the nested model, or is it simply because 
that there is no interviewer effect?

2. If intv is really zero, why does not the model 3 give a better AIC?

3. Am I allowed to compare the three models with AIC as I did above? 
Or should I use LL?

Thanks in advance,

Kenjiro Matsuda

John Maindonald

Fri, Dec 10, 2021 5:03 PM #

My guess is that you should not be treating answers from different
questions as independent.  They are nested within individuals, and
a main effect is not sufficient to account for systematic differences.
There are shades of the story I heard of an experimenter whose blocks
were made up of plots that moved successively away from the river.
What do you get if you analyse a summary measure for the questionnaire
or individual questions?


John Maindonald             email: john.maindonald at anu.edu.au<mailto:john.maindonald at anu.edu.au>

On 11/12/2021, at 00:29, N o s t a l g i a <kenjiro at shoin.ac.jp<mailto:kenjiro at shoin.ac.jp>> wrote:

I am a novice in mixed models, and I am trying to fit a model to a survey data with an interval-scale dependent variable (hon), four fixed-effect variables (sex, age, schooling, and questions) and two random effects. The random effects are interviewer (intv) and interviewee (ID), and as such, they are in a nested relationship. Sex, age and questions are found to be in an interacting relationship.

A major question I am asking here is whether the interviewer effect is significant or not, so I tried the following intercept-only models, with model 1 using the nested model, model 2 only the interviewer effect, and model 3 only the interviewee effect:

model1 <- lmer(hon ~ sex * age * Question + schooling + (1|intv/ID)
model2 <- lmer(hon ~ sex * age * Question + schooling + (1|intv)
model3 <- lmer(hon ~ sex * age * Question + schooling + (1|ID)

The output from each model says the following:

model 1:
Random effects:
Groups   Name        Variance Std.Dev.
ID:intv  (Intercept) 0.03988  0.1997
intv     (Intercept) 0.00000  0.0000
Residual             0.16847  0.4105
Number of obs: 3283, groups:  ID:intv, 305; intv, 28

model 2:
Random effects:
Groups   Name        Variance Std.Dev.
intv     (Intercept) 0.002348 0.04846
Residual             0.205998 0.45387
Number of obs: 3283, groups:  intv, 28

model 3:
Random effects:
Groups   Name        Variance Std.Dev.
ID       (Intercept) 0.04107  0.2027
Residual             0.16894  0.4110
Number of obs: 3294, groups:  ID, 306

The respective Log likelihood and AIC values are:

model1 AIC = 4249.232  LL = -2076.616 (df=48)
model2 AIC = 4539.69   LL = -2222.845 (df=47)
model3 AIC = 4274.99   LL = -2090.495 (df=47)

Since I got an error message saying "models were not all fitted to the same size of dataset" while running anova(), I compared the AICs and concluded that model2 is the best model of the three.

Here I have three questions:

1. Why is the variance for the interviewer effect(intv) zero? Is it necessarily so because of the nested model, or is it simply because that there is no interviewer effect?

2. If intv is really zero, why does not the model 3 give a better AIC?

3. Am I allowed to compare the three models with AIC as I did above? Or should I use LL?

Thanks in advance,

Kenjiro Matsuda

_______________________________________________
R-sig-mixed-models at r-project.org<mailto:R-sig-mixed-models at r-project.org> mailing list
https://aus01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fstat.ethz.ch%2Fmailman%2Flistinfo%2Fr-sig-mixed-models&amp;data=04%7C01%7Cjohn.maindonald%40anu.edu.au%7C5a76556ebeb544a5b77e08d9bbd07302%7Ce37d725cab5c46249ae5f0533e486437%7C0%7C0%7C637747797303366625%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&amp;sdata=leBHp2TfI6mW1m4YIqeMw2czjyr%2FI7wrKiSWxFIAtO0%3D&amp;reserved=0

N o s t a l g i a

Fri, Dec 10, 2021 9:29 PM #

Hi John,

Treating answers (Q) as a random effect nested within an individual 
sounds like an interesting idea. As Qs are not part of my main 
interest, that would pose no problem to me. I guess it would be like:

model4 <- lmer(hon ~ sex * age * Question + schooling + 
(1|intv/ID/Question)

Or should I drop it from the interaction of the fixed effect?

- Ken

On 2021/12/11 10:03, John Maindonald wrote:

My guess is that you should not be treating answers from different
questions as independent. ?They are nested within individuals, and
a main effect is not sufficient to account for systematic differences.
There are shades of the story I heard of an experimenter whose blocks
were made up of plots that moved successively away from the river.
What do you get if you analyse a summary measure for the questionnaire
or individual questions?

John Maindonaldemail: john.maindonald at anu.edu.au 
<mailto:john.maindonald at anu.edu.au>

On 11/12/2021, at 00:29, N o s t a l g i a <kenjiro at shoin.ac.jp 
<mailto:kenjiro at shoin.ac.jp>> wrote:

I am a novice in mixed models, and I am trying to fit a model to a 
survey data with an interval-scale dependent variable (hon), four 
fixed-effect variables (sex, age, schooling, and questions) and two 
random effects. The random effects are interviewer (intv) and 
interviewee (ID), and as such, they are in a nested relationship. 
Sex, age and questions are found to be in an interacting relationship.

A major question I am asking here is whether the interviewer effect 
is significant or not, so I tried the following intercept-only 
models, with model 1 using the nested model, model 2 only the 
interviewer effect, and model 3 only the interviewee effect:

model1 <- lmer(hon ~ sex * age * Question + schooling + (1|intv/ID)
model2 <- lmer(hon ~ sex * age * Question + schooling + (1|intv)
model3 <- lmer(hon ~ sex * age * Question + schooling + (1|ID)

The output from each model says the following:

model 1:
Random effects:
Groups ??Name ???????Variance Std.Dev.
ID:intv ?(Intercept) 0.03988 ?0.1997
intv ????(Intercept) 0.00000 ?0.0000
Residual ????????????0.16847 ?0.4105
Number of obs: 3283, groups: ?ID:intv, 305; intv, 28

model 2:
Random effects:
Groups ??Name ???????Variance Std.Dev.
intv ????(Intercept) 0.002348 0.04846
Residual ????????????0.205998 0.45387
Number of obs: 3283, groups: ?intv, 28

model 3:
Random effects:
Groups ??Name ???????Variance Std.Dev.
ID ??????(Intercept) 0.04107 ?0.2027
Residual ????????????0.16894 ?0.4110
Number of obs: 3294, groups: ?ID, 306

The respective Log likelihood and AIC values are:

model1AIC = 4249.232 ?LL = -2076.616 (df=48)
model2AIC = 4539.69 ??LL = -2222.845 (df=47)
model3AIC = 4274.99 ??LL = -2090.495 (df=47)

Since I got an error message saying "models were not all fitted to 
the same size of dataset" while running anova(), I compared the AICs 
and concluded that model2 is the best model of the three.

Here I have three questions:

1. Why is the variance for the interviewer effect(intv) zero? Is it 
necessarily so because of the nested model, or is it simply because 
that there is no interviewer effect?

2. If intv is really zero, why does not the model 3 give a better AIC?

3. Am I allowed to compare the three models with AIC as I did above? 
Or should I use LL?

Thanks in advance,

Kenjiro Matsuda

_______________________________________________
R-sig-mixed-models at r-project.org 
<mailto:R-sig-mixed-models at r-project.org> mailing list
https://aus01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fstat.ethz.ch%2Fmailman%2Flistinfo%2Fr-sig-mixed-models&amp;data=04%7C01%7Cjohn.maindonald%40anu.edu.au%7C5a76556ebeb544a5b77e08d9bbd07302%7Ce37d725cab5c46249ae5f0533e486437%7C0%7C0%7C637747797303366625%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&amp;sdata=leBHp2TfI6mW1m4YIqeMw2czjyr%2FI7wrKiSWxFIAtO0%3D&amp;reserved=0 
<https://aus01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fstat.ethz.ch%2Fmailman%2Flistinfo%2Fr-sig-mixed-models&amp;data=04%7C01%7Cjohn.maindonald%40anu.edu.au%7C5a76556ebeb544a5b77e08d9bbd07302%7Ce37d725cab5c46249ae5f0533e486437%7C0%7C0%7C637747797303366625%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&amp;sdata=leBHp2TfI6mW1m4YIqeMw2czjyr%2FI7wrKiSWxFIAtO0%3D&amp;reserved=0>

Karl Ove Hufthammer

Sat, Dec 11, 2021 2:55 AM #

N o s t a l g i a skreiv 10.12.2021 12:29:

No, model 2 has the *highest* AIC, and based on AIC, it would be the 
*worst* model. The best model would be the one with the lowest AIC. 
(Also, it doesn?t seem realistic to assume no random effect for the 
interviewees, so I would also dismiss model 2 based on *theoretical* 
grounds.)

But in this case, comparing the AICs (or log likelihood) is actually 
*not* valid, as the data were not fitted to the same dataset (something 
which anova() warns you about). In model 3, you have 3294 observations, 
but in model 1 and 2, you only have 3283 observations. The only 
difference between the models is that model 3 doesn?t include the ?intv? 
variable. In other words, for 11 responses, you don?t know who the 
interviewer was.

So you have to refit the models to the *same* dataset, e.g., by removing 
the observation where ?is.na(intv)? before fitting the models.

Karl Ove Hufthammer

John Maindonald

Sat, Dec 11, 2021 10:58 AM #

Possibly, you need to allow for a within individual correlation structure.
What I should have said was that the correlation structure within
individuals persists even when allowance is made for the other fixed
effects.  But why not start by looking at total scores?  Looking at
principal components after a principal components breakdown might
be another possibility.  Have you been able to find published analyses,
or on the web, that have broken results down by individual question0
results?

John Maindonald             email: john.maindonald at anu.edu.au<mailto:john.maindonald at anu.edu.au>

On 11/12/2021, at 18:29, N o s t a l g i a <kenjiro at shoin.ac.jp<mailto:kenjiro at shoin.ac.jp>> wrote:

Hi John,

Treating answers (Q) as a random effect nested within an individual sounds like an interesting idea. As Qs are not part of my main interest, that would pose no problem to me. I guess it would be like:

model4 <- lmer(hon ~ sex * age * Question + schooling + (1|intv/ID/Question)

Or should I drop it from the interaction of the fixed effect?

- Ken

On 2021/12/11 10:03, John Maindonald wrote:

My guess is that you should not be treating answers from different
questions as independent.  They are nested within individuals, and
a main effect is not sufficient to account for systematic differences.
There are shades of the story I heard of an experimenter whose blocks
were made up of plots that moved successively away from the river.
What do you get if you analyse a summary measure for the questionnaire
or individual questions?
John Maindonaldemail: john.maindonald at anu.edu.au<mailto:john.maindonald at anu.edu.au> <mailto:john.maindonald at anu.edu.au>

On 11/12/2021, at 00:29, N o s t a l g i a <kenjiro at shoin.ac.jp<mailto:kenjiro at shoin.ac.jp> <mailto:kenjiro at shoin.ac.jp>> wrote:

I am a novice in mixed models, and I am trying to fit a model to a survey data with an interval-scale dependent variable (hon), four fixed-effect variables (sex, age, schooling, and questions) and two random effects. The random effects are interviewer (intv) and interviewee (ID), and as such, they are in a nested relationship. Sex, age and questions are found to be in an interacting relationship.

A major question I am asking here is whether the interviewer effect is significant or not, so I tried the following intercept-only models, with model 1 using the nested model, model 2 only the interviewer effect, and model 3 only the interviewee effect:

model1 <- lmer(hon ~ sex * age * Question + schooling + (1|intv/ID)
model2 <- lmer(hon ~ sex * age * Question + schooling + (1|intv)
model3 <- lmer(hon ~ sex * age * Question + schooling + (1|ID)

The output from each model says the following:

model 1:
Random effects:
Groups   Name        Variance Std.Dev.
ID:intv  (Intercept) 0.03988  0.1997
intv     (Intercept) 0.00000  0.0000
Residual             0.16847  0.4105
Number of obs: 3283, groups:  ID:intv, 305; intv, 28

model 2:
Random effects:
Groups   Name        Variance Std.Dev.
intv     (Intercept) 0.002348 0.04846
Residual             0.205998 0.45387
Number of obs: 3283, groups:  intv, 28

model 3:
Random effects:
Groups   Name        Variance Std.Dev.
ID       (Intercept) 0.04107  0.2027
Residual             0.16894  0.4110
Number of obs: 3294, groups:  ID, 306

The respective Log likelihood and AIC values are:

model1AIC = 4249.232  LL = -2076.616 (df=48)
model2AIC = 4539.69   LL = -2222.845 (df=47)
model3AIC = 4274.99   LL = -2090.495 (df=47)

Since I got an error message saying "models were not all fitted to the same size of dataset" while running anova(), I compared the AICs and concluded that model2 is the best model of the three.

Here I have three questions:

1. Why is the variance for the interviewer effect(intv) zero? Is it necessarily so because of the nested model, or is it simply because that there is no interviewer effect?

2. If intv is really zero, why does not the model 3 give a better AIC?

3. Am I allowed to compare the three models with AIC as I did above? Or should I use LL?

Thanks in advance,

Kenjiro Matsuda

_______________________________________________
R-sig-mixed-models at r-project.org<mailto:R-sig-mixed-models at r-project.org> <mailto:R-sig-mixed-models at r-project.org> mailing list
https://aus01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fstat.ethz.ch%2Fmailman%2Flistinfo%2Fr-sig-mixed-models&amp;data=04%7C01%7Cjohn.maindonald%40anu.edu.au%7Cdbe70cc56c264314b20508d9bc673512%7Ce37d725cab5c46249ae5f0533e486437%7C0%7C0%7C637747973731627086%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&amp;sdata=ei93uMUtP1IC3TPFAriLh0VcHVQjaTY8OXfOs9uVDdk%3D&amp;reserved=0<https://aus01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fstat.ethz.ch%2Fmailman%2Flistinfo%2Fr-sig-mixed-models&amp;data=04%7C01%7Cjohn.maindonald%40anu.edu.au%7Cdbe70cc56c264314b20508d9bc673512%7Ce37d725cab5c46249ae5f0533e486437%7C0%7C0%7C637747973731627086%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&amp;sdata=ei93uMUtP1IC3TPFAriLh0VcHVQjaTY8OXfOs9uVDdk%3D&amp;reserved=0>

N o s t a l g i a

Sun, Dec 12, 2021 10:38 PM #

Karl,

Thanks for pointing out my mistakes. Yes,I should have chosen model1 
with the least AIC among the three, and I should not have compared the 
three with different dataset to start with.

I went back the original dataset and deleted all the cases that 
includes NAs manually (somehow "na.action = na.exclude, data = third2" 
did not work). Now anova() works fine, and the best model turned out 
to be (anova-wise as well asa AIC-wise) the one with only ID as the 
random variable. Everything seems fine -- except that the variace for 
intv remained zero in the model that incorporates both intv and ID as 
a random variable.  This probably I need to accept as it is: there is 
absolutely no interviewer effect.

Thanks again,

- Ken

On 2021/12/11 19:55, Karl Ove Hufthammer wrote: