Skip to content
Prev 326100 / 398503 Next

Comparing each level of a factor to the global mean

On Jun 27, 2013, at 3:47 PM, Shaun Jackman wrote:

            
I believe you asking for "contr.sum" although I think there might be some differences between how it operates and what you are expressing as your expectations.
Call:
lm(formula = weight ~ Diet, data = ChickWeight)

Residuals:
    Min      1Q  Median      3Q     Max 
-103.95  -53.65  -13.64   40.38  230.05 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  125.869      2.986  42.150  < 2e-16 ***
Diet1        -23.223      4.454  -5.214 2.59e-07 ***
Diet2         -3.252      5.380  -0.604  0.54576    
Diet3         17.081      5.380   3.175  0.00158 ** 
---
Signif. codes:  0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1

Residual standard error: 69.33 on 574 degrees of freedom
Multiple R-squared:  0.05348,	Adjusted R-squared:  0.04853 
F-statistic: 10.81 on 3 and 574 DF,  p-value: 6.433e-07
[1] 121.8183
1   2   3   4 
220 120 120 118 

So in an unbalanced data situation, the Intercept is only approximately the grand mean.

To see what you are requesting in the summary you can an offset from the mean and use the Intercept suppression syntax:
Call:
lm(formula = weight ~ Diet + 0 + offset(rep(mean(ChickWeight$weight), 
    nrow(ChickWeight))), data = ChickWeight)

Residuals:
    Min      1Q  Median      3Q     Max 
-103.95  -53.65  -13.64   40.38  230.05 

Coefficients:
      Estimate Std. Error t value Pr(>|t|)    
Diet1 -19.1729     4.6740  -4.102 4.69e-05 ***
Diet2   0.7983     6.3286   0.126 0.899660    
Diet3  21.1317     6.3286   3.339 0.000895 ***
Diet4  13.4444     6.3820   2.107 0.035584 *  
---
Signif. codes:  0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1

Residual standard error: 69.33 on 574 degrees of freedom
Multiple R-squared:  0.7599,	Adjusted R-squared:  0.7583 
F-statistic: 454.3 on 4 and 574 DF,  p-value: < 2.2e-16

Notice this does estimate waht you requested, but I think it is more due to the use of an offset than to the choice of contrasts.
1           2           3           4 
-19.1728846   0.7983276  21.1316609  13.4443728 


I'm very worried this might be inferentially suspect, since the degrees of freedom and the anava F statistic are different than the usual methods.