Hello all, I am facing a dilemma of whether or not I should include by-item random intercepts in my model. Here are the details of my problem. I have a dataset of repeated measure in which participants solved single-digit arithmetic problems (e.g., 4x5, 2+7, ) and their response latencies were recorded. The dependent variable is response latency. The independent variables include characteristics of the stimuli (i.e., level 1) and of the participants (i.e., level 2). I set up the structure of random effects following recommendations from Barr et al. (2013). For simplicity, let's say the model contains one IV. DVti = gamma00 + gamma10IVti + u0i + u1iIVti + I0i + rti gamma00, gamma10 are fixed effects u0i is the random intercept u1j is the random slope I0i is the by-item random intercept rti is the residual I used lme4 to test the model lmer(DV ~ IV + (1 + IV|sub) + (1|item), data= DT) As I mentioned, the stimuli in my experiments are single-digit arithmetic problems. Unlike stimuli such as English words, there are only 100 single-digit arithmetic problems for each operation and all of them were included in my experiment. So here is my dilemma: On one hand, a random by-item intercept would allow me to account for the fact that there are repeated observations on each item and they are not independent from each other. On the other, a random by-item intercept implies there exists more items which were not included in my experiment. However, this is not the case. I have included all single-digit arithmetic problems in my experiment. I could adopt a fixed-effect approach and use 100 dummy variables to account for the item-based clustering but this would be practically impossible. To iterate my question: should I include a random by-item intercept given the special feature of my dataset? A few follow-up questions: what's the consequence of including/excluding this random effect? How are type-I error and power affected? Should I use a nested structure instead of the crossed one I have mentioned above? For example, if each participant contributed multiple observations on each item, should I nest the by-item random intercept under subject? Thank you very much! Chunyun
by-item random intercepts
3 messages · Chunyun Ma, Jake Westfall
Hi Chunyun, As I mentioned, the stimuli in my experiments are single-digit arithmetic
problems. Unlike stimuli such as English words, there are only 100 single-digit arithmetic problems for each operation and all of them were included in my experiment.
If you've really exhaustively sampled all possible stimuli that could have appeared in your study, then I would argue that it doesn't make conceptual sense to analyze the stimuli as random effects. I could adopt a fixed-effect approach and use 100 dummy variables to
account for the item-based clustering but this would be practically impossible.
Is it? Have you tried it? Adding fixed effects usually increases the computational burden *far* less than adding random effects. So while this analysis might be a bit unwieldy, is it actually infeasible? If the answer is yes, then a reasonable alternative is to simply ignore the stimulus effects altogether. Practically speaking, the result is usually much the same as explicitly adding stimulus fixed effects to the model. The reason is because ignoring the stimulus effects (vs. adding them as fixed) mainly just serves to throw the stimulus variance into the residual variance, but unless your experiment is quite tiny, the residual variance probably already contributes *very* little to the standard errors of the fixed effect parameter estimates of interest. (Getting more into the mathematical weeds, the residual variance enters the standard error *roughly* as var(resid)/sqrt(n), where n is the number of rows -- this term is probably already tiny unless your experiment is tiny, and it should remain tiny even if you increase var(resid) by a lot.) Note however that the above is assuming that the stimulus effects are at best weakly correlated with the other regressors. That assumption is likely true in an experimental context, but to the extent that it is false, omitting the stimulus effects could also alter the other fixed effect parameter estimates. Should I use a nested structure instead of the crossed one I have mentioned
above? For example, if each participant contributed multiple observations on each item, should I nest the by-item random intercept under subject?
I don't see why you would do that. Jake
On Wed, Sep 13, 2017 at 9:02 PM, Chunyun Ma <mcypsy at gmail.com> wrote:
Hello all,
I am facing a dilemma of whether or not I should include by-item random
intercepts in my model. Here are the details of my problem.
I have a dataset of repeated measure in which participants solved
single-digit arithmetic problems (e.g., 4x5, 2+7, ) and their response
latencies were recorded.
The dependent variable is response latency. The independent variables
include characteristics of the stimuli (i.e., level 1) and of the
participants (i.e., level 2).
I set up the structure of random effects following recommendations from
Barr et al. (2013). For simplicity, let's say the model contains one IV.
DVti = gamma00 + gamma10IVti + u0i + u1iIVti + I0i + rti
gamma00, gamma10 are fixed effects
u0i is the random intercept
u1j is the random slope
I0i is the by-item random intercept
rti is the residual
I used lme4 to test the model
lmer(DV ~ IV + (1 + IV|sub) + (1|item), data= DT)
As I mentioned, the stimuli in my experiments are single-digit arithmetic
problems. Unlike stimuli such as English words, there are only 100
single-digit arithmetic problems for each operation and all of them were
included in my experiment. So here is my dilemma:
On one hand, a random by-item intercept would allow me to account for the
fact that there are repeated observations on each item and they are not
independent from each other.
On the other, a random by-item intercept implies there exists more items
which were not included in my experiment. However, this is not the case. I
have included all single-digit arithmetic problems in my experiment.
I could adopt a fixed-effect approach and use 100 dummy variables to
account for the item-based clustering but this would be practically
impossible.
To iterate my question:
should I include a random by-item intercept given the special feature of my
dataset?
A few follow-up questions:
what's the consequence of including/excluding this random effect? How are
type-I error and power affected?
Should I use a nested structure instead of the crossed one I have mentioned
above? For example, if each participant contributed multiple observations
on each item, should I nest the by-item random intercept under subject?
Thank you very much!
Chunyun
[[alternative HTML version deleted]]
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
Er, small correction, I meant that the residual variance enters the standard error as roughly sqrt(var(resid)/n), not var(resid)/sqrt(n) :p Jake On Wed, Sep 13, 2017 at 9:21 PM, Jake Westfall <jake.a.westfall at gmail.com> wrote:
Hi Chunyun, As I mentioned, the stimuli in my experiments are single-digit arithmetic
problems. Unlike stimuli such as English words, there are only 100 single-digit arithmetic problems for each operation and all of them were included in my experiment.
If you've really exhaustively sampled all possible stimuli that could have appeared in your study, then I would argue that it doesn't make conceptual sense to analyze the stimuli as random effects. I could adopt a fixed-effect approach and use 100 dummy variables to
account for the item-based clustering but this would be practically impossible.
Is it? Have you tried it? Adding fixed effects usually increases the computational burden *far* less than adding random effects. So while this analysis might be a bit unwieldy, is it actually infeasible? If the answer is yes, then a reasonable alternative is to simply ignore the stimulus effects altogether. Practically speaking, the result is usually much the same as explicitly adding stimulus fixed effects to the model. The reason is because ignoring the stimulus effects (vs. adding them as fixed) mainly just serves to throw the stimulus variance into the residual variance, but unless your experiment is quite tiny, the residual variance probably already contributes *very* little to the standard errors of the fixed effect parameter estimates of interest. (Getting more into the mathematical weeds, the residual variance enters the standard error *roughly* as var(resid)/sqrt(n), where n is the number of rows -- this term is probably already tiny unless your experiment is tiny, and it should remain tiny even if you increase var(resid) by a lot.) Note however that the above is assuming that the stimulus effects are at best weakly correlated with the other regressors. That assumption is likely true in an experimental context, but to the extent that it is false, omitting the stimulus effects could also alter the other fixed effect parameter estimates. Should I use a nested structure instead of the crossed one I have mentioned
above? For example, if each participant contributed multiple observations on each item, should I nest the by-item random intercept under subject?
I don't see why you would do that. Jake On Wed, Sep 13, 2017 at 9:02 PM, Chunyun Ma <mcypsy at gmail.com> wrote:
Hello all,
I am facing a dilemma of whether or not I should include by-item random
intercepts in my model. Here are the details of my problem.
I have a dataset of repeated measure in which participants solved
single-digit arithmetic problems (e.g., 4x5, 2+7, ) and their response
latencies were recorded.
The dependent variable is response latency. The independent variables
include characteristics of the stimuli (i.e., level 1) and of the
participants (i.e., level 2).
I set up the structure of random effects following recommendations from
Barr et al. (2013). For simplicity, let's say the model contains one IV.
DVti = gamma00 + gamma10IVti + u0i + u1iIVti + I0i + rti
gamma00, gamma10 are fixed effects
u0i is the random intercept
u1j is the random slope
I0i is the by-item random intercept
rti is the residual
I used lme4 to test the model
lmer(DV ~ IV + (1 + IV|sub) + (1|item), data= DT)
As I mentioned, the stimuli in my experiments are single-digit arithmetic
problems. Unlike stimuli such as English words, there are only 100
single-digit arithmetic problems for each operation and all of them were
included in my experiment. So here is my dilemma:
On one hand, a random by-item intercept would allow me to account for the
fact that there are repeated observations on each item and they are not
independent from each other.
On the other, a random by-item intercept implies there exists more items
which were not included in my experiment. However, this is not the case. I
have included all single-digit arithmetic problems in my experiment.
I could adopt a fixed-effect approach and use 100 dummy variables to
account for the item-based clustering but this would be practically
impossible.
To iterate my question:
should I include a random by-item intercept given the special feature of
my
dataset?
A few follow-up questions:
what's the consequence of including/excluding this random effect? How are
type-I error and power affected?
Should I use a nested structure instead of the crossed one I have
mentioned
above? For example, if each participant contributed multiple observations
on each item, should I nest the by-item random intercept under subject?
Thank you very much!
Chunyun
[[alternative HTML version deleted]]
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models