-----Original Message-----
From: Denis Chabot [mailto:chabotd at globetrotter.net]
Sent: Wednesday, October 05, 2005 3:33 PM
To: John Fox
Cc: R list
Subject: Re: [R] testing non-linear component in mgcv:gam
Thank you everyone for your help, but my introduction to GAM
is turning my brain to mush. I thought the one part of the
output I understood the best was r-sq (adj), but now even
this is becoming foggy.
In my original message I mentioned a gam fit that turns out
to be a linear fit. By curiosity I analysed it with a linear
predictor only with mgcv package, and then as a linear model.
The output was identical in both, but the r-sq (adj) was 0.55
in mgcv and 0.26 in lm. In lm I hope that my interpretation
that 26% of the variance in y is explained by the linear
relationship with x is valid. Then what does r2 mean in mgcv?
Denis
Family: gaussian
Link function: identity
Formula:
wm.sed ~ Temp
Parametric coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.162879 0.019847 8.207 1.14e-09 ***
Temp -0.023792 0.006369 -3.736 0.000666 ***
---
Signif. codes: 0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1
R-sq.(adj) = 0.554 Deviance explained = 28.5%
GCV score = 0.09904 Scale est. = 0.093686 n = 37
Call:
lm(formula = wm.sed ~ Temp, weights = N.sets)
Residuals:
Min 1Q Median 3Q Max
-0.6138 -0.1312 -0.0325 0.1089 1.1449
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.162879 0.019847 8.207 1.14e-09 ***
Temp -0.023792 0.006369 -3.736 0.000666 ***
---
Signif. codes: 0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1
Residual standard error: 0.3061 on 35 degrees of freedom
Multiple R-Squared: 0.285, Adjusted R-squared: 0.2646
F-statistic: 13.95 on 1 and 35 DF, p-value: 0.000666
Le 05-10-05 ?? 09:45, John Fox a ??crit :
Dear Denis,
Take a closer look at the anova table: The models provide identical
fits to the data. The differences in degrees of freedom and
between the two models are essentially zero, 5.5554e-10 and
respectively.
I hope this helps,
John
--------------------------------
John Fox
Department of Sociology
McMaster University
Hamilton, Ontario
Canada L8S 4M4
905-525-9140x23604
http://socserv.mcmaster.ca/jfox
--------------------------------
-----Original Message-----
From: r-help-bounces at stat.math.ethz.ch
[mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Denis Chabot
Sent: Wednesday, October 05, 2005 8:22 AM
To: r-help at stat.math.ethz.ch
Subject: [R] testing non-linear component in mgcv:gam
Hi,
I need further help with my GAMs. Most models I test are very
obviously non-linear. Yet, to be on the safe side, I report the
significance of the smooth (default output of mgcv's
summary.gam) and confirm it deviates significantly from linearity.
I do the latter by fitting a second model where the same
entered without the s(), and then use anova.gam to compare
thought this was the equivalent of the default output of anova.gam
using package gam instead of mgcv.
I wonder if this procedure is correct because one of my models
appears to be linear. In fact mgcv estimates df to be
I could have stopped there. However I inadvertently repeated the
procedure outlined above. I would have thought in this case the
anova.gam comparing the smooth and the linear fit would
been not significant.
To my surprise, P was 6.18e-09!
Am I doing something wrong when I attempt to confirm the non-
parametric part a smoother is significant? Here is my example case
where the relationship does appear to be linear:
library(mgcv)
Temp <- c(-1.38, -1.12, -0.88, -0.62, -0.38, -0.12, 0.12,
0.88, 1.12,
1.38, 1.62, 1.88, 2.12, 2.38, 2.62, 2.88, 3.12, 3.38,
3.62, 3.88,
4.12, 4.38, 4.62, 4.88, 5.12, 5.38, 5.62, 5.88, 6.12,
6.38, 6.62, 6.88,
7.12, 8.38, 13.62)
N.sets <- c(2, 6, 3, 9, 26, 15, 34, 21, 30, 18, 28, 27,
22, 26, 24, 23,
15, 25, 24, 27, 19, 26, 24, 22, 13, 10, 2, 5,
1, 1, 1) wm.sed <- c(0.000000000, 0.016129032, 0.000000000,
0.062046512, 0.396459596, 0.189082949,
0.054757925, 0.142810440, 0.168005168, 0.180804428,
0.111439628, 0.128799505,
0.193707937, 0.105921610, 0.103497845, 0.028591837,
0.217894389, 0.020535469,
0.080389068, 0.105234450, 0.070213450, 0.050771363,
0.042074434, 0.102348837,
0.049748344, 0.019100478, 0.005203125, 0.101711864,
0.000000000, 0.000000000,
0.014808824, 0.000000000, 0.222000000, 0.167000000,
0.000000000, 0.000000000,
0.000000000)
sed.gam <- gam(wm.sed~s(Temp),weight=N.sets)
summary.gam(sed.gam)
Family: gaussian
Link function: identity
Formula:
wm.sed ~ s(Temp)
Parametric coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.08403 0.01347 6.241 3.73e-07 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Approximate significance of smooth terms:
edf Est.rank F p-value
s(Temp) 1 1 13.95 0.000666 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
R-sq.(adj) = 0.554 Deviance explained = 28.5%
GCV score = 0.09904 Scale est. = 0.093686 n = 37
# testing non-linear contribution
sed.lin <- gam(wm.sed~Temp,weight=N.sets)
summary.gam(sed.lin)
Family: gaussian
Link function: identity
Formula:
wm.sed ~ Temp
Parametric coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.162879 0.019847 8.207 1.14e-09 ***
Temp -0.023792 0.006369 -3.736 0.000666 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
R-sq.(adj) = 0.554 Deviance explained = 28.5%
GCV score = 0.09904 Scale est. = 0.093686 n = 37
anova.gam(sed.lin, sed.gam, test="F")
Analysis of Deviance Table
Model 1: wm.sed ~ Temp
Model 2: wm.sed ~ s(Temp)
Resid. Df Resid. Dev Df Deviance F Pr(>F)
1 3.5000e+01 3.279
2 3.5000e+01 3.279 5.5554e-10 2.353e-11 0.4521 6.18e-09 ***
Thanks in advance,
Denis Chabot