contrast matrix for aov - R-help

Wed, Mar 9, 2005 4:03 PM #

How do we specify a contrast interaction matrix for an ANOVA model?

We have a two-factor, repeated measures design, with

Cue Direction (2) x  Brain Hemisphere(2)

Each of these has 2 levels, 'left' and 'right', so it's a simple 2x2 design 
matrix.  We have 8 subjects in each cell (a balanced design) and we want to 
specify the interaction contrast so that:

CueLeft>CueRght for the Right Hemisphere
CueRght>CueLeft for the Left Hemisphere.

Here is a copy of the relevant commands for R:

########################################
lh_cueL <- rowMeans( LHroi.cueL[,t1:t2] )
lh_cueR <- rowMeans( LHroi.cueR[,t1:t2] )
rh_cueL <- rowMeans( RHroi.cueL[,t1:t2] )
rh_cueR <- rowMeans( RHroi.cueR[,t1:t2] )
roiValues <- c( lh_cueL, lh_cueR, rh_cueL, rh_cueR )

cuelabels <- c("CueLeft", "CueRight")
hemlabels <- c("LH", "RH")

roiDataframe <- data.frame( roi=roiValues, Subject=gl(8,1,32,subjectlabels), 
Hemisphere=gl(2,16,32,hemlabels), Cue=gl(2,8,32,cuelabels) )

roi.aov <- aov(roi ~ (Cue*Hemisphere) + Error(Subject/(Cue*Hemisphere)), 
data=roiDataframe)
print(summary(roi.aov))
########################################


I've tried to create a contrast matrix like this:

cm <- contrasts(roiDataframe$Cue)

which gives the main effect contrasts for the Cue factor.  I really want to 
specify the interaction contrasts, so I tried this:

########################################
# c( lh_cueL, lh_cueR, rh_cueL, rh_cueR )
# CueRight>CueLeft for the Left Hemisphere.
# CueLeft>CueRight for the Right Hemisphere

cm <- c(-1, 1, 1, -1)
dim(cm) <- c(2,2)

roi.aov <- aov( roi ~ (Cue*Hemisphere) + Error(Subject/(Cue*Hemisphere)),
contrasts=cm, data=roiDataframe)
print(summary(roi.aov))
########################################

but the results of these two aov commands are identical.  Is it the case that 
the 2x2 design matrix is always going to give the same F values for the 
interaction regardless of the contrast direction?  OR, is there some way to get 
a summary output for the contrasts that is not available from the print method?

Brian Ripley

Thu, Mar 10, 2005 12:24 AM #

On Wed, 9 Mar 2005, Darren Weber wrote:

Where does `repeated measures' come into this?  You appear to have 
repeated a 2x2 experiment in each of 8 blocks (subjects).  Such a design 
is usually analysed with fixed effects.  (Perhaps you averaged over 
repeats in the first few lines of your code?)

I think the error model should be Error(Subject).  In what sense are `Cue' 
and `Cue:Hemisphere' random effects nested inside `Subject'?

Let me fake some `data':

set.seed(1); roiValues <- rnorm(32)
subjectlabels <- paste("V"1:8, sep = "")
options(contrasts = c("contr.helmert", "contr.poly"))
roi.aov <- aov(roi ~ Cue*Hemisphere + Error(Subject), data=roiDataframe)

Call:
aov(formula = roi ~ Cue * Hemisphere + Error(Subject), data = roiDataframe)

Grand Mean: 0.1165512

Stratum 1: Subject

Terms:
                 Residuals
Sum of Squares   4.200946
Deg. of Freedom         7

Residual standard error: 0.7746839

Stratum 2: Within

Terms:
                       Cue Hemisphere Cue:Hemisphere Residuals
Sum of Squares   0.216453   0.019712       0.057860 21.896872
Deg. of Freedom         1          1              1        21

Residual standard error: 1.021131
Estimated effects are balanced

Note that all the action is in one stratum, and the SSQs are the same 
as

aov(roi ~ Subject + Cue * Hemisphere, data = roiDataframe)

(and also the same as for your fit).

It auto-prints, so you don't need print().

(That is up to sign what Helmert contrasts give you.)

Yes, as however you code the design (via `contrasts') you are fitting the 
same subspaces.  Not sure what you mean by `contrast direction', though.

However, you have not specified `contrasts' correctly:

contrasts: A list of contrasts to be used for some of the factors in
           the formula.

and cm is not a list, and an interaction is not a factor.

For more than two levels, yes: see `split' under ?summary.aov.
Also, see se.contrasts which allows you to find the standard error for any 
contrast.

For the fixed-effects model you can use summary.lm:

Df  Sum Sq Mean Sq F value Pr(>F)
Subject         7  4.2009  0.6001  0.5756 0.7677
Cue             1  0.2165  0.2165  0.2076 0.6533
Hemisphere      1  0.0197  0.0197  0.0189 0.8920
Cue:Hemisphere  1  0.0579  0.0579  0.0555 0.8161
Residuals      21 21.8969  1.0427

Call:
aov(formula = roi ~ Subject + Cue * Hemisphere, data = roiDataframe)

Residuals:
     Min      1Q  Median      3Q     Max
-1.7893 -0.4197  0.1723  0.5868  1.3033

Coefficients:
                  Estimate Std. Error t value Pr(>|t|)
[...]
Cue1             -0.08224    0.18051  -0.456    0.653
Hemisphere1       0.02482    0.18051   0.137    0.892
Cue1:Hemisphere1 -0.04252    0.18051  -0.236    0.816

where the F values are the squares of the t values.

Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

Peter Dalgaard

Thu, Mar 10, 2005 1:33 AM #

Prof Brian Ripley <ripley at stats.ox.ac.uk> writes:

Actually, that's not "usual" in SAS (and not SPSS either, I believe)
in things like

proc glm;
        model y1-y4= ;
        repeated row 2 col 2;

[Not that SAS/SPSS is the Gospel, but they do tend to set the
terminology in these matters.]

There you'd get the analysis split up as analyses of three contrasts
corresponding to the main effects and interaction, c(-1,-1,1,1),
c(-1,1,-1,1), and c(-1,1,1,-1) in the 2x2 case (up to scale and sign).
In the 2x2 case, this corresponds exactly to the 4-stratum model
row*col + Error(block/(row*col)).

(It is interesting to note that it is still not the optimal analysis
for arbitrary covariance patterns because dependence between contrasts
is not utilized - it is basically assumed to be absent.)

With more than two levels, you get the additional complications of
sphericity testing and GG/HF epsilons and all that.

O__  ---- Peter Dalgaard             Blegdamsvej 3  
  c/ /'_ --- Dept. of Biostatistics     2200 Cph. N   
 (*) \(*) -- University of Copenhagen   Denmark      Ph: (+45) 35327918
~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk)             FAX: (+45) 35327907

Christophe Pallier

Thu, Mar 10, 2005 1:43 AM #

Prof Brian Ripley wrote:

I do not understand this, and I think I am probably not the only one. 
That is why I would be grateful if you could give a bit more information.

My understanding is that the fixed factors Cue and Hemisphere are 
crossed with the random factor Subject (in other words, Cue and 
Hemisphere are within-subjects factors, and this is probably why Darren 
called it a "repeated measure" design).

In this case, it seems to me from the various textbooks I read on Anova, 
that the appropriate MS to  test the interaction Cue:Hemisphere is 
Subject:Cue:Hemisphere (with 7 degress of freedom, as there are 8 
independent subjects). 

If you input Error(Subject/(Cue*Hemisphere)) in the aov formula, then 
the test for the interaction indeed uses the Subject:Cue:Hemisphere 
source of variation in demoninator. This fits with the ouput of other 
softwares.

If you include only 'Subjet', then the test for the interaction has 21 
degrees of Freedom, and I do not understand what this tests.

I apologize in if my terminology is not accurate.  But I hope you can 
clarify what is wrong with the Error(Subject/(Cue*Hemisphere)) term,
or maybe just point us to the relevant textbooks.

Thanks in advance,

Christophe Pallier

Let me fake some `data':

set.seed(1); roiValues <- rnorm(32)
subjectlabels <- paste("V"1:8, sep = "")
options(contrasts = c("contr.helmert", "contr.poly"))
roi.aov <- aov(roi ~ Cue*Hemisphere + Error(Subject), data=roiDataframe)

roi.aov


Call:
aov(formula = roi ~ Cue * Hemisphere + Error(Subject), data = 
roiDataframe)

Grand Mean: 0.1165512

Stratum 1: Subject

Terms:
                Residuals
Sum of Squares   4.200946
Deg. of Freedom         7

Residual standard error: 0.7746839

Stratum 2: Within

Terms:
                      Cue Hemisphere Cue:Hemisphere Residuals
Sum of Squares   0.216453   0.019712       0.057860 21.896872
Deg. of Freedom         1          1              1        21

Residual standard error: 1.021131
Estimated effects are balanced

Note that all the action is in one stratum, and the SSQs are the same as

aov(roi ~ Subject + Cue * Hemisphere, data = roiDataframe)

(and also the same as for your fit).

print(summary(roi.aov))


It auto-prints, so you don't need print().

########################################


I've tried to create a contrast matrix like this:

cm <- contrasts(roiDataframe$Cue)

which gives the main effect contrasts for the Cue factor.  I really 
want to specify the interaction contrasts, so I tried this:

########################################
# c( lh_cueL, lh_cueR, rh_cueL, rh_cueR )
# CueRight>CueLeft for the Left Hemisphere.
# CueLeft>CueRight for the Right Hemisphere

cm <- c(-1, 1, 1, -1)
dim(cm) <- c(2,2)


(That is up to sign what Helmert contrasts give you.)

roi.aov <- aov( roi ~ (Cue*Hemisphere) + 
Error(Subject/(Cue*Hemisphere)),
contrasts=cm, data=roiDataframe)
print(summary(roi.aov))
########################################

but the results of these two aov commands are identical.  Is it the 
case that the 2x2 design matrix is always going to give the same F 
values for the interaction regardless of the contrast direction?


Yes, as however you code the design (via `contrasts') you are fitting 
the same subspaces.  Not sure what you mean by `contrast direction', 
though.

However, you have not specified `contrasts' correctly:

contrasts: A list of contrasts to be used for some of the factors in
          the formula.

and cm is not a list, and an interaction is not a factor.

OR, is there some way to get a summary output for the contrasts that 
is not available from the print method?


For more than two levels, yes: see `split' under ?summary.aov.
Also, see se.contrasts which allows you to find the standard error for 
any contrast.

For the fixed-effects model you can use summary.lm:

fit <- aov(roi ~ Subject + Cue * Hemisphere, data = roiDataframe)
summary(fit)

               Df  Sum Sq Mean Sq F value Pr(>F)
Subject         7  4.2009  0.6001  0.5756 0.7677
Cue             1  0.2165  0.2165  0.2076 0.6533
Hemisphere      1  0.0197  0.0197  0.0189 0.8920
Cue:Hemisphere  1  0.0579  0.0579  0.0555 0.8161
Residuals      21 21.8969  1.0427

summary.lm(fit)


Call:
aov(formula = roi ~ Subject + Cue * Hemisphere, data = roiDataframe)

Residuals:
    Min      1Q  Median      3Q     Max
-1.7893 -0.4197  0.1723  0.5868  1.3033

Coefficients:
                 Estimate Std. Error t value Pr(>|t|)
[...]
Cue1             -0.08224    0.18051  -0.456    0.653
Hemisphere1       0.02482    0.18051   0.137    0.892
Cue1:Hemisphere1 -0.04252    0.18051  -0.236    0.816

where the F values are the squares of the t values.

Brian Ripley

Thu, Mar 10, 2005 2:07 AM #

On Thu, 10 Mar 2005, Christophe Pallier wrote:

The issue is whether the variance of the error really depends on the 
treatment combination, which is what the Error(Subject/(Cue*Hemisphere)) 
assumes.  With that model

Error: Subject:Cue
           Df Sum Sq Mean Sq F value Pr(>F)
Cue        1 0.2165  0.2165  0.1967 0.6708
Residuals  7 7.7041  1.1006

Error: Subject:Hemisphere
            Df Sum Sq Mean Sq F value Pr(>F)
Hemisphere  1 0.0197  0.0197  0.0154 0.9047
Residuals   7 8.9561  1.2794

Error: Subject:Cue:Hemisphere
                Df Sum Sq Mean Sq F value Pr(>F)
Cue:Hemisphere  1 0.0579  0.0579  0.0773  0.789
Residuals       7 5.2366  0.7481

you are assuming different variances for three contrasts.

It uses a common variance for all treatment combinations.

Nothing is `wrong' with it, it just seems discordant with the description
of the experiment.

Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

Brian Ripley

Thu, Mar 10, 2005 2:38 AM #

On Thu, 10 Mar 2005, Peter Dalgaard wrote:

That seems to be appropriate only if the four treatments are done in a 
particular order (`repeated') and one expects correlations in the 
responses.  However, here the measurements seem to have been averages of 
replications.

It may be "usual" to (mis?)specify experiments in SAS that way: I don't 
know what end users do, but it is not the only way possible in SAS.

It also assumes that there is a difference between variances of the 
contrasts, that is there is either correlation between results or a 
difference in variances under different treatments.  Nothing in the 
description led me to expect either of those, but I was asking why it was 
specified that way.  (If the variance does differ with the mean then there 
are probably more appropriate analyses.)

Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

Darren Weber

Thu, Mar 10, 2005 10:54 AM #

Prof Brian Ripley wrote:

Each subject experiences multiple instances of a visual stimulus that 
cues them to orient their attention to the left or right visual field.  
For each instance, we acquire brain activity measures from the left and 
right hemisphere (to put it simply).  For each condition in this 2x2 
design, we actually have about 400 instances for each cell for each 
subject.  These are averaged in several ways, so we obtain a single 
scalar value for each subject in the final analysis.  So, at this point, 
it does not appear to be a repeated measures analysis.  Perhaps another 
way to put it - each subject experiences every cell of the design, so we 
have "within-subjects" factors rather than "between-subjects" factors.  
That is, each subject experiences all experimental conditions, we do not 
separate and randomly allocate subjects to only one cell or another of 
the experiment.

I hope this clarifies the first question.  I will try to digest further 
points in this thread, if they are not too confounded by this first 
point of contention already.

Darren Weber

Thu, Mar 10, 2005 11:07 AM #

As an R newbie (formerly SPSS), I was pleased to find some helpful notes 
on ANOVA here:

http://personality-project.org/r/r.anova.html

In my case, I believe the relevant section is:

Example 4. Two-Way Within-Subjects ANOVA

This is where I noted and copied the error notation.

Sorry for any confusion about terms - I did mean "within-subjects" 
factors, rather than repeated measures (although, as noted earlier, we 
do have both in this experiment).

Prof Brian Ripley wrote:

On Thu, 10 Mar 2005, Christophe Pallier wrote:

Prof Brian Ripley wrote:

On Wed, 9 Mar 2005, Darren Weber wrote:

We have a two-factor, repeated measures design, with



Where does `repeated measures' come into this?  You appear to have 
repeated a 2x2 experiment in each of 8 blocks (subjects).  Such a 
design is usually analysed with fixed effects.  (Perhaps you 
averaged over repeats in the first few lines of your code?)

roi.aov <- aov(roi ~ (Cue*Hemisphere) + 
Error(Subject/(Cue*Hemisphere)), data=roiDataframe)



I think the error model should be Error(Subject).  In what sense are 
`Cue' and `Cue:Hemisphere' random effects nested inside `Subject'?

I do not understand this, and I think I am probably not the only one. 
That is why I would be grateful if you could give a bit more 
information.

My understanding is that the fixed factors Cue and Hemisphere are 
crossed with the random factor Subject (in other words, Cue and 
Hemisphere are within-subjects factors, and this is probably why 
Darren called it a "repeated measure" design).


The issue is whether the variance of the error really depends on the 
treatment combination, which is what the 
Error(Subject/(Cue*Hemisphere)) assumes.  With that model

Error: Subject:Cue
          Df Sum Sq Mean Sq F value Pr(>F)
Cue        1 0.2165  0.2165  0.1967 0.6708
Residuals  7 7.7041  1.1006

Error: Subject:Hemisphere
           Df Sum Sq Mean Sq F value Pr(>F)
Hemisphere  1 0.0197  0.0197  0.0154 0.9047
Residuals   7 8.9561  1.2794

Error: Subject:Cue:Hemisphere
               Df Sum Sq Mean Sq F value Pr(>F)
Cue:Hemisphere  1 0.0579  0.0579  0.0773  0.789
Residuals       7 5.2366  0.7481

you are assuming different variances for three contrasts.

In this case, it seems to me from the various textbooks I read on 
Anova, that the appropriate MS to  test the interaction 
Cue:Hemisphere is Subject:Cue:Hemisphere (with 7 degress of freedom, 
as there are 8 independent subjects). If you input 
Error(Subject/(Cue*Hemisphere)) in the aov formula, then the test for 
the interaction indeed uses the Subject:Cue:Hemisphere source of 
variation in demoninator. This fits with the ouput of other softwares.

If you include only 'Subjet', then the test for the interaction has 
21 degrees of Freedom, and I do not understand what this tests.


It uses a common variance for all treatment combinations.

I apologize in if my terminology is not accurate.  But I hope you can 
clarify what is wrong with the Error(Subject/(Cue*Hemisphere)) term,
or maybe just point us to the relevant textbooks.


Nothing is `wrong' with it, it just seems discordant with the description
of the experiment.