In terms of contrast coding, two more helpful resources are:
http://talklab.psy.gla.ac.uk/tvw/catpred/
http://palday.bitbucket.org/stats/coding.html
Channel makes sense as a random effect / grouping term for your particular
design, *not* nested within participant. The implicit crossing given by
(1|Participant) + (1|Channel) models [omitting any slope terms to focus on
the grouping variables] (1) interindividual differences in the EEG and (2)
differences between electrodes because closely located electrodes can be
thought of as samples from a population consisting of a given Region of
Interest (ROI), especially if the electrode placement is somewhat
symmetric. The differences resulting from variance in electrode placement
between participants will be covered by the implicit crossing of these two
random effects.
Note that using channel as a random effect is somewhat more difficult if
you're doing a whole scalp analysis as sampling across the whole scalp can
be viewed as sampling from multiple ROIs, i.e. multiple populations. Two
possible solutions are (1) to include ROI in the fixed effects and keep
channel in the random effects and (2) model channel as a two or three
continuous spatial variables (e.g. displacement from midline or
displacement from center based on 10-20 coordinates, or spatial coordinates
of the sort used in source localisation) in the fixed effects. In the case
of (1), the channel random effect would then be modelling the typical
variance within ROIs (because that's hopefully the major source of variance
structured by channel left over after modelling ROI and your experimental
manipulation). If this within-variance differs greatly between between
ROIs, then this may be a sub-optimal modelling choice. In the case of (2),
it might still make sense to additionally model channel as a random effect
(i.e. the RE with the factor consisting of channel names, the FE with the
continuous coordinates), see Thierry Onkelinx's posts on the subject and
http://rpubs.com/INBOstats/both_fixed_random , but I haven't thought
about this enough nor examined the resulting model fits.
Best,
Phillip
-----Original Message-----
From: R-sig-mixed-models [mailto:r-sig-mixed-models-bounces at r-project.org]
On Behalf Of paul
Sent: Tuesday, 7 June 2016 5:27 AM
To: Houslay, Tom <T.Houslay at exeter.ac.uk>
Cc: r-sig-mixed-models at r-project.org
Subject: Re: [R-sig-ME] Related fixed and random factors and planned
comparisons in a 2x2 design
Dear Tom,
Thank you so much for these detailed replies and I appreciate your help!
Sincerely,
Paul
2016-06-06 21:51 GMT+02:00 Houslay, Tom <T.Houslay at exeter.ac.uk>:
Hi Paul,
I think you're right here in that actually you don't want to nest
channel inside participant (which led to that error message - sorry,
should have seen that coming!).
It's hard to know without seeing data plotted, but my guess from your
email is that you probably see some clustering both at individual
level and at channel level? Perhaps separate random effects, ie
(1|Participant) + (1|Channel), is the way to go (and then you
shouldn't have the problem as regards number of observations - instead
you'll have an intercept deviation for each of your N individuals, and
also intercept deviations for each of your 9 channels). You certainly
want to keep the participant intercept in though, as each individual
gets both items (right?), so you need to model that association. You
can use your variance components output from lmer to determine what
proportion of the phenotypic variance (conditional on your fixed
effects) is explained by each of these components, eg
V(individual)/(V(individual) + V(channel) + V(residual) would give you
the proportion explained by differences among individuals in their
voltage. It would be cool to know if differences among individuals, or
among channels, is driving the variation that you find. I think using
the sjplot function for lmer would be useful to look at the levels of
your random
effects:
http://strengejacke.de/sjPlot/sjp.lmer/
As for 'contrasts', again I haven't used that particular package, but
from a brief glance it looks like you're on the right track - binary
coding is the 'simple coding' as set out here:
http://www.ats.ucla.edu/stat/r/library/contrast_coding.htm
Good luck!
Tom
------------------------------
*From:* paul <graftedlife at gmail.com>
*Sent:* 06 June 2016 20:06:02
*To:* Houslay, Tom
*Cc:* r-sig-mixed-models at r-project.org
*Subject:* Re: Related fixed and random factors and planned
comparisons in a 2x2 design
Dear Tom,
Many thanks for these very helpful comments and suggestions! Would you
just allow me to ask some further questions:
1. I've been considering whether to cross or to nest the random
effects for quite a while. Data from the same channel across
participants do show corresponding trends (thus a bit different from
the case when, e.g., sampling nine neurons from the same individual).
Would nesting channel within participant deal with that relationship?
2. I actually also tried nesting channel within participant. However,
when I proceeded to run planned comparisons (I guess I'd better have
them done because of their theoretical significance) based on this
mixed-effect modeling approach (as illustrated in the earlier mail but
with the random factor as (1|participant/channel), to maintain
consistency of analytical methods), R gave me an error message:
Error: number of levels of each grouping factor must be < number of
observations
I think this is because in my data, each participant only contributes
one data point per channel and thus the data points are not enough. I
guess that probably means I can't go on in this direction to run the
planned comparisons... (?) I'm not pretty sure how contrasts based on
binary dummy variables may be done and will try to further explore
that. But before I establish the mixed model I already set up
orthogonal contrasts for group and item in the dataset using the
function contrasts(). Does this have anything to do with what you meant?
3. I worried about pseudoreplicability when participant ID is not
included. Concerning this point, later it came to me that
pseudoreplicability usually occurred in cases when multiple responses
from the same individual are grouped in the same cell, rendering the
data within the same cell non-independent (similar to the case of
repeated-measure ANOVA? sorry if I got a wrong understanding...). But
as mentioned earlier in my data, each participant only contributes one
data point per channel, when channel alone is already modeled as a
random factor, would that mean all data points within a cell all come
from different participants and thus in this case may deal with the
independence assumption? (Again I'm sorry if my concept is wrong and
would appreciate instructions on this point...)
Many, many thanks!
Paul
2016-06-06 19:10 GMT+02:00 Houslay, Tom <T.Houslay at exeter.ac.uk>:
Hi Paul,
I don't think anyone's responded to this yet, but my main point would
be that you should check out Schielzeth & Nakagawa's 2012 paper
'Nested by design' (
http://onlinelibrary.wiley.com/doi/10.1111/j.2041-210x.2012.00251.x/a
bstract
) for a nice rundown on structuring your model for this type of data.
It may also be worth thinking about how random intercepts work in a
visual sense; there are a variety of tools that help you do this from
a model (packages sjplot, visreg, broom), or you can just plot
different levels yourself (eg consider plotting the means for AP, AQ,
BP, BQ; the same with mean values from each individual overplotted
around these group means; and even the group means with all points
shown, perhaps coloured by individual - ggplot is really useful for
getting this type of figure together quickly).
As to some of your other questions:
1) You need to keep participant ID in. I'm not 100% on your data
structure from the question, but you certainly seem to have repeated
measures for individuals (I'm assuming that groups A and B each
contain multiple individuals, none of whom were in both groups, and
each of which were shown both objects P and Q, in a random order).
It's not surprising that the effects of group are weakened if you
remove participant ID, because you're then effectively entering
pseudoreplication into your model (ie, telling your model that all
the data points within a group are independent, when that isn't the
2) I think channel should be nested within individual, with a model
something like model <- lmer(voltage ~ group * item +
(1|participant/channel), data = ...)
3) This really depends on what your interest is. If you simply want
to show that there is an overall interaction effect, then your
p-value from a likelihood ratio test of the model with/without the
interaction term gives significance of this interaction, and then a
plot of predicted values for the fixed effects (w/ data overplotted if
possible) should show the trends.
You could also use binary dummy variables to make more explicit
contrasts, but it's worth reading up on these a bit more. I don't
really use these type of comparisons very much, so I can't comment
4) Your item is like treatment in this case - you appear to be more
interested in the effect of different items (rather than how much
variation 'item' explains), so keep this as a fixed effect and not as
Hope some of this is useful,
Tom