Skip to content

Surprising results from summary(lm()) on data with NO variation

2 messages · Warnes, Gregory R, Brian Ripley

#
I have some data (from Affymetrix experiments) where I fit an aov() model to
a large number of outcome variables.  A reasonable fraction of the outcome
variables have 0 variability because values below a cutoff have been
replaced with the cutoff value (often 20) .  

In this case, the overall p-value from summary(lm(..))  is misleadingly
small:

For example (equivalent results in R 1.6.1 on both Solaris and for Windows
2000):


	> x <- rep(20, 4+5+5)  # no variation
	> Treatment <- factor(rep( c("Control","Low","High"), c(4,5,5) ))  #
3 treatment levels
	> summary( lm( x ~ Treatment)  )
	
	Call:
	lm(formula = x ~ Treatment)
	
	Residuals:
	       Min         1Q     Median         3Q        Max 
	-1.994e-14  7.889e-31  7.889e-31  7.889e-31  6.647e-15 
	
	Coefficients:
	               Estimate Std. Error   t value Pr(>|t|)    
	(Intercept)   2.000e+01  3.471e-15 5.762e+15   <2e-16 ***
	TreatmentHigh 6.647e-15  4.657e-15 1.427e+00    0.181    
	TreatmentLow  6.647e-15  4.657e-15 1.427e+00    0.181    
	---
	Signif. codes:  0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1 
	
	Residual standard error: 6.942e-15 on 11 degrees of freedom
	Multiple R-Squared: 0.5333,     Adjusted R-squared: 0.4485 
	F-statistic: 6.286 on 2 and 11 DF,  p-value: 0.01512
                                          ^^^^^^^^^^^^^^^^

Note F statistic and the overall p-value.     

For some values (eg 0 and 7),  fitting this model will generate NA's for the
estimates.  For most values, we get similar results.  I presume that this is
due to numerical problems in the fitting algorithm.  Still the result is
quite surprising.

It appears that summary.aov doesn't have the same problem:

	> x <- rep(20, 4+5+5)  # no variation
	> Treatment <- factor(rep( c("Control","Low","High"), c(4,5,5) ))  #
3 treatment levels
	> summary( aov( x ~ Treatment)  )
	            Df     Sum Sq    Mean Sq F value Pr(>F)
	Treatment    2 1.2622e-28 6.3110e-29  1.3095 0.3089
	Residuals   11 5.3011e-28 4.8190e-29               

Perhaps the mechanism for computing the F statistic that summary.aov uses is
less susceptible to the 'no variability' problem than the computation in
summary.lm.

I'm posting this here  to raise a warning flag.  For my particular
application, I'll just check for the no variability case and handle it
separately.  I'll leave it to the R core folks to decide if this is a big
enough problem to do something about in the code.

-Greg


LEGAL NOTICE\ Unless expressly stated otherwise, this message is ... [[dropped]]
#
It's just rounding errors.  We can't tell if you are genuinely interested
in studying rounding error or not.  In other words, whether something is
really constant or has a very small range of variation is not something
you can get from the numbers alone.

lda tries hard to warn here, but doing that well is expensive.
On Fri, 13 Dec 2002, Warnes, Gregory R wrote: