anova.lm and F-test

Hello,

Why does anova.lm sometimes return a p-value and at other times  not ? Is  
it because it recognizes nested models from non-nested ones ?
x<-seq(1,100,1)
y<-3*x+rnorm(100)
anova(lm(y~x),lm(y~x+I(x^2)),test="F")
Analysis of Variance Table

Model 1: y ~ x
Model 2: y ~ x + I(x^2)
   Res.Df    RSS Df Sum of Sq      F Pr(>F)
1     98 90.449
2     97 90.288  1   0.16117 0.1732 0.6782
anova(lm(y~x),lm(y~I(x^2)+I(x^3)),test="F")
Analysis of Variance Table

Model 1: y ~ x
Model 2: y ~ I(x^2) + I(x^3)
   Res.Df    RSS Df Sum of Sq F Pr(>F)
1     98   90.4
2     97 7345.7  1   -7255.3

Thanks, Suresh

Hello,

Why does anova.lm sometimes return a p-value and at other times  not ? Is it because it recognizes nested models from non-nested ones ?

x<-seq(1,100,1)
y<-3*x+rnorm(100)
anova(lm(y~x),lm(y~x+I(x^2)),test="F")
Analysis of Variance Table

Model 1: y ~ x
Model 2: y ~ x + I(x^2)
 Res.Df    RSS Df Sum of Sq      F Pr(>F)
1     98 90.449
2     97 90.288  1   0.16117 0.1732 0.6782

anova(lm(y~x),lm(y~I(x^2)+I(x^3)),test="F")
Analysis of Variance Table

Model 1: y ~ x
Model 2: y ~ I(x^2) + I(x^3)
 Res.Df    RSS Df Sum of Sq F Pr(>F)
1     98   90.4
2     97 7345.7  1   -7255.3

You have Df and Sum of Sq with opposite sign, so more parameters with a worse fit. The models are not nested, so the F test makes no sense. 

I'd say that the real question is why anova.lm doesn't protest loudly when detecting this? One possible answer is that it also misses other non-nested cases where the signs do not clash, and warning only in some of the incorrect cases could lead the naive user to believe that the other ones are OK. E.g. this F test is equally meaningless
anova(lm(y~I(x^4)),lm(y~I(x^2)+I(x^3)),test="F")
Analysis of Variance Table

Model 1: y ~ I(x^4)
Model 2: y ~ I(x^2) + I(x^3)
  Res.Df    RSS Df Sum of Sq      F    Pr(>F)    
1     98 186639                                  
2     97   7101  1    179538 2452.4 < 2.2e-16 ***

(Non-nestedness could in principle be determined by checking whether cbind(model.matrix(m1), model.matrix(m2)) has higher rank that both of its constituents, but numerical rank determination is a bit error-prone and slow, so this was not implemented).
Peter Dalgaard, Professor
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd.mes at cbs.dk  Priv: PDalgd at gmail.com
Dear Peter,

Thank you very much for that excellent answer to a rather stupid question :)
I did not notice that the RSS actually increased for the model with more
parameters and so in this case the F-statistic is negative and therefore a
p-value from the F-distribution is meaningless. But I guess your answer also
clarifies that as long as the F-statistic is in the valid range (>=0),
anova() will calculate it and return a p-value (whether or not the models
are nested).

Best, Suresh

Peter Dalgaard-2 wrote
On Jul 9, 2012, at 15:40 , Suresh Krishna wrote:

Hello,

Why does anova.lm sometimes return a p-value and at other times  not ? Is
it because it recognizes nested models from non-nested ones ?

x<-seq(1,100,1)
y<-3*x+rnorm(100)
anova(lm(y~x),lm(y~x+I(x^2)),test="F")
Analysis of Variance Table

Model 1: y ~ x
Model 2: y ~ x + I(x^2)
 Res.Df    RSS Df Sum of Sq      F Pr(>F)
1     98 90.449
2     97 90.288  1   0.16117 0.1732 0.6782

anova(lm(y~x),lm(y~I(x^2)+I(x^3)),test="F")
Analysis of Variance Table

Model 1: y ~ x
Model 2: y ~ I(x^2) + I(x^3)
 Res.Df    RSS Df Sum of Sq F Pr(>F)
1     98   90.4
2     97 7345.7  1   -7255.3

You have Df and Sum of Sq with opposite sign, so more parameters with a
worse fit. The models are not nested, so the F test makes no sense. 

I'd say that the real question is why anova.lm doesn't protest loudly when
detecting this? One possible answer is that it also misses other
non-nested cases where the signs do not clash, and warning only in some of
the incorrect cases could lead the naive user to believe that the other
ones are OK. E.g. this F test is equally meaningless

anova(lm(y~I(x^4)),lm(y~I(x^2)+I(x^3)),test="F")
Analysis of Variance Table

Model 1: y ~ I(x^4)
Model 2: y ~ I(x^2) + I(x^3)
  Res.Df    RSS Df Sum of Sq      F    Pr(>F)    
1     98 186639                                  
2     97   7101  1    179538 2452.4 < 2.2e-16 ***

(Non-nestedness could in principle be determined by checking whether
cbind(model.matrix(m1), model.matrix(m2)) has higher rank that both of its
constituents, but numerical rank determination is a bit error-prone and
slow, so this was not implemented). 

-- 
Peter Dalgaard, Professor
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd.mes@  Priv: PDalgd@

______________________________________________
R-help@ mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

--
View this message in context: http://r.789695.n4.nabble.com/anova-lm-and-F-test-tp4635845p4635867.html
Sent from the R help mailing list archive at Nabble.com.