H-F corr.: covariance matrix for interaction effect - R-devel

Peter Dalgaard · 2005-02-25T01:54:12Z

Peter Dalgaard writes: > Bela Bauer writes: > > > Hi, > > > > I'm still not quite there with my H-F (G-G) correction code. I have it > > working for the main effects, but I just can't figure out how to do it > > for the effect interactions. The thing I really don't know (and can't > > find anything about) is how to calculate the covariance matrix for the > > interaction between the two (or even n) main factors. > > I've looked through some books her

Peter Dalgaard

Thu, Feb 24, 2005 5:54 PM #

Peter Dalgaard <p.dalgaard@biostat.ku.dk> writes:

[moved to r-devel since this is getting technical]

Now I am getting confused: I can reproduce the G-G epsilon in all the
cases I have tried but the H-F epsilon eludes me. Consider this SAS
code

proc glm;
        model bmc1-bmc7=  / nouni;
        repeated visit 7/printe;

This ends up with

                      Greenhouse-Geisser Epsilon    0.6047
                      Huynh-Feldt Epsilon           0.7466

This makes OK sense since there are 22 observations

[1] 0.7466162

However, consider the following small change:

proc glm;
        class grp;
        model bmc1-bmc7= grp / nouni;
        repeated visit 7/printe;

Now I get 

                      Greenhouse-Geisser Epsilon    0.6677
                      Huynh-Feldt Epsilon           0.8976

Since we have one less DF for the covariance matrix, I would expect
that the H-F epsilon would be

[1] 0.876696

The discrepancy gets worse as more covariates are added. If bmc1 is
moved to the rhs, I get

                      Greenhouse-Geisser Epsilon    0.6917
                      Huynh-Feldt Epsilon           0.9533

Where I would have expected

[1] 0.8643953

Does anyone have a clue as to what is going on here? Is mighty SAS
simply doing the wrong thing? The G-G epsilon depends only on the
eigenvalues of the observed covariance matrix, so surely the H-F
correction should depend only on the dimension and the DF for the
empirical covariance matrix?

O__  ---- Peter Dalgaard             Blegdamsvej 3  
  c/ /'_ --- Dept. of Biostatistics     2200 Cph. N   
 (*) \(*) -- University of Copenhagen   Denmark      Ph: (+45) 35327918
~~~~~~~~~~ - (p.dalgaard@biostat.ku.dk)             FAX: (+45) 35327907

Peter Dalgaard

Mon, Feb 28, 2005 3:52 PM #

Peter Dalgaard <p.dalgaard@biostat.ku.dk> writes:

Just in case anyone was wondering, I think I now know what SAS is
doing, and yes, it is a bug. 

The HF correction is

HFeps = (n * (k-1) * GGeps - 2) / ((k-1) * ((n-1) - (k-1) * GG.eps))

for the simple two-way layout, where the residual SSD matrix has (n-1)
degrees of freedom. For the case with covariates, it looks like (to 4
significant digits) SAS is generalizing the above to

HFeps = (n * (k-1) * GGeps - 2) / ((k-1) * (f - (k-1) * GG.eps))

where f is the degrees of freedom for the SSD. However, the first n
also needs adjustment; the correctly generalized formula should read

HFeps = ((f+1) * (k-1) * GGeps - 2) / ((k-1) * (f - (k-1) * GG.eps))

(The G-G epsilon is essentially the squared mean of the eigenvalues of
a suitably transformed SSD divided by the mean of the squares of the
eigenvalues. This is less than one unless all eigenvalues are
identical. H-F replaces numerator and denominator with bias-corrected
variants. However, since everything is a function of the SSD matrix,
sthe formula can only depend on n via the degrees of freedom.)

O__  ---- Peter Dalgaard             Blegdamsvej 3  
  c/ /'_ --- Dept. of Biostatistics     2200 Cph. N   
 (*) \(*) -- University of Copenhagen   Denmark      Ph: (+45) 35327918
~~~~~~~~~~ - (p.dalgaard@biostat.ku.dk)             FAX: (+45) 35327907