Skip to content

Specifying the correct LMM for 'unsual' data

3 messages · Tom Fritzsche, Maarten Jung

#
Dear list,

a colleague of mine asked me to help her planing a linear mixed models
analysis and, as handling her data and the corresponding research questions
with lmer seems kind of tricky to me, I hope one of you can help me along.

+++++++++++++++++++++++++++++++++++++
The experiment is as follows:

Participants (46 younger and 45 older children) looked at a series of
pictures (one picture per trial) and had to solve two tasks consecutively:

- Task block 1: Prospective memory (PM) task: while doing other tasks,
participants had to remember to press a specified button when they saw a
certain object
- Task block 2. Visual search: participants had only this one task ?
pressing a button as soon as possible when seeing a certain object

Each child saw the same pictures in the same task block ? pictures 1-6 in
task block 1 and pictures 7-18 in task block 2. Each picture was presented
only once, so there were different pictures in the task blocks.

Trials with target object in task 1 are allocated regarding the
participant?s reactions in PM hits (participants did press the button) and
PM misses (participants did not press the button). (Therefore, a certain
picture can be a PM hit trial for one child and a PM miss trial for the
other.) As there were six trials (= pictures), which contained the target
object, each participant can have a minimum of zero and a maximum of six PM
hits with the according number of PM misses.
Here is the number of PM hits per age group:

Younger children:
- 2 children: 0 hits
- 9 children: 1 hit
- 8 children : 2 hits
- 12 children: 3 hits
- 4 children: 4 hits
- 4 children: 5 hits
- 7 children: 6 hits

Older children
- 2 children: 0 hits
- 3 children: 1 hit
- 4 children: 2 hits
- 6 children: 3 hits
- 7 children: 4 hits
- 11 children: 5 hits
- 12 children: 6 hits

(In the visual search task almost all children have pressed the button
correctly in all 12 visual search target trials).

She is interested in how long participants looked at the PM and visual
search target, respectively, depending on if it was a PM hit, a PM miss or
a visual search hit and how this is influenced by the age group. Therefore,
she has got only one data point per trial. And if a participant has no PM
misses there is no data point at all in this condition for this participant.

The variables are defined as follows:
- age_group: categorical predictor with 2 levels (younger and older
children)
- condition: categorical predictor with 3 levels (PM hit, PM miss, visual
search hit)
+++++++++++++++++++++++++++++++++++++

My suggestion for the maximal linear mixed model would be:

lmer(dwell_time ~ age_group*condition + (1 + condition|participant) +
(1|picture), data)

I intentionally didn`t use (1 + condition|picture) here because there are
different pictures in the task blocks (see above) - hope this makes sense.

I have two questions:
1. Am I correct with the maximal linear mixed model specifications?
2. I think that the data points in the PM-miss-condition (or
PM-hit-condition) are not missing at random because they are missing if
(and only if) there are 6 data point for the same participant in the
PM-hit-condition (and vice versa). Do you think one has to worry about this
and are there any suggestions how to deal with it?

Best,
Maarten
#
Hi Maarten,

I would not collapse the task and the kind of response (hit/miss) into
one condition predictor. They are conceptually independent as task is
a manipulated factor and response a measured value (covariate in this
model). Also, one of them can vary within pictures the other not (see
model specification below).

So my suggestion would be to have those two predictors:

task: 2-level factor: PM, VS
response: 2-level predictor: hit, miss

Beware of how you specify the contrasts for (all of) the categorical
predictors. The default treatment contrast is most likely not the most
straight-forward way to interpret the model estimates.

Regarding your questions:

1. Am I correct with the maximal linear mixed model specifications?

With the changed predictors I think that this would be the maximal
model. Response can vary also within pictures as each can be a hit or
miss.

lmer(dwell_time ~ age_group * task * response + (1 + task * response |
participant) + (1 + response | picture), data)


2. I think that the data points in the PM-miss-condition (or
PM-hit-condition) are not missing at random because they are missing if
(and only if) there are 6 data point for the same participant in the
PM-hit-condition (and vice versa). Do you think one has to worry about this
and are there any suggestions how to deal with it?

Imbalanced data sets and even missing design cells are not a problem
for mixed models as they take the number of the observation into
account (shrinkage).

Best,
Tom

---

Tom Fritzsche
University of Potsdam
Department of Linguistics
Karl-Liebknecht-Str. 24-25
14476 Potsdam
Germany

office: 14.140
phone: +49 331 977 2296
fax: +49 331 977 2095
e-mail: tom.fritzsche at uni-potsdam.de
web:    www.ling.uni-potsdam.de/~fritzsche



On 25 January 2018 at 15:35, Maarten Jung
<Maarten.Jung at mailbox.tu-dresden.de> wrote:
1 day later
#
Hi Tom,

your suggestions for the categorical predictors make sense and are
conceptually a much better solution than collapsing everything into a
single predictor - many thanks for that!

I am aware of the partial pooling/shrinkage in the estimation process,
although for your suggestion there would literally be no data for the
VS-miss-condition. And I think that, in this case, the estimation would be
based on the younger children given that there are clearly more missing
data points for older children.

With my second question I was referring to the MAR (missing at random)
assumption of mixed models: "missing data on a given variable
may depend on other observed information, but does not depend on the data
that would have been observed but were in fact missing" (West, Welch &
Galecki, 2015).
I have read that including covariates which 'predict' the nonavailability
of data points should be included (but, to be honest, I have no idea how
this helps with the missing data) and wonder if the inclusion of say number
of hits (if this is a better predictor than age_group) would improve the
model.

Best,
Maarten

On Thu, Jan 25, 2018 at 4:08 PM, Tom Fritzsche <tom.fritzsche at uni-potsdam.de