testing non-linear component in mgcv:gam
Fair enough, Andy. I thought I was getting both predictive ability and confirmation that the phenomenon I was studying was not linear. I have two projects, in one prediction is the goal and I don't really need to test linearity. In the second I needed to confirm a cycle was taking place and I thought my procedure did this. There was no theoretical reason to anticipate the shape of the cycle, so GAM was an appealing methodology. Denis Le 05-10-05 ?? 09:38, Liaw, Andy a ??crit :
I think you probably should state more clearly the goal of your analysis. In such situation, estimation and hypothesis testing are quite different. The procedure that gives you the `best' estimate is not necessarily the `best' for testing linearity of components. If your goal is estimation/prediction, why test linearity of components? If the goal is testing linearity, then you probably need to find the procedure that gives you a good test, rather than relying on what gam() gives you. Just my $0.02... Andy
From: Denis Chabot Hi, I need further help with my GAMs. Most models I test are very obviously non-linear. Yet, to be on the safe side, I report the significance of the smooth (default output of mgcv's summary.gam) and confirm it deviates significantly from linearity. I do the latter by fitting a second model where the same predictor is entered without the s(), and then use anova.gam to compare the two. I thought this was the equivalent of the default output of anova.gam using package gam instead of mgcv. I wonder if this procedure is correct because one of my models appears to be linear. In fact mgcv estimates df to be exactly 1.0 so I could have stopped there. However I inadvertently repeated the procedure outlined above. I would have thought in this case the anova.gam comparing the smooth and the linear fit would for sure have been not significant. To my surprise, P was 6.18e-09! Am I doing something wrong when I attempt to confirm the non- parametric part a smoother is significant? Here is my example case where the relationship does appear to be linear: library(mgcv)
This is mgcv 1.3-7
Temp <- c(-1.38, -1.12, -0.88, -0.62, -0.38, -0.12, 0.12,
0.38, 0.62,
0.88, 1.12,
1.38, 1.62, 1.88, 2.12, 2.38, 2.62, 2.88, 3.12, 3.38,
3.62, 3.88,
4.12, 4.38, 4.62, 4.88, 5.12, 5.38, 5.62, 5.88, 6.12,
6.38, 6.62, 6.88,
7.12, 8.38, 13.62)
N.sets <- c(2, 6, 3, 9, 26, 15, 34, 21, 30, 18, 28, 27, 27, 29, 31,
22, 26, 24, 23,
15, 25, 24, 27, 19, 26, 24, 22, 13, 10, 2, 5, 3, 1, 1,
1, 1, 1)
wm.sed <- c(0.000000000, 0.016129032, 0.000000000, 0.062046512,
0.396459596, 0.189082949,
0.054757925, 0.142810440, 0.168005168, 0.180804428,
0.111439628, 0.128799505,
0.193707937, 0.105921610, 0.103497845, 0.028591837,
0.217894389, 0.020535469,
0.080389068, 0.105234450, 0.070213450, 0.050771363,
0.042074434, 0.102348837,
0.049748344, 0.019100478, 0.005203125, 0.101711864,
0.000000000, 0.000000000,
0.014808824, 0.000000000, 0.222000000, 0.167000000,
0.000000000, 0.000000000,
0.000000000)
sed.gam <- gam(wm.sed~s(Temp),weight=N.sets)
summary.gam(sed.gam)
Family: gaussian
Link function: identity
Formula:
wm.sed ~ s(Temp)
Parametric coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.08403 0.01347 6.241 3.73e-07 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Approximate significance of smooth terms:
edf Est.rank F p-value
s(Temp) 1 1 13.95 0.000666 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
R-sq.(adj) = 0.554 Deviance explained = 28.5%
GCV score = 0.09904 Scale est. = 0.093686 n = 37
# testing non-linear contribution sed.lin <- gam(wm.sed~Temp,weight=N.sets) summary.gam(sed.lin)
Family: gaussian
Link function: identity
Formula:
wm.sed ~ Temp
Parametric coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.162879 0.019847 8.207 1.14e-09 ***
Temp -0.023792 0.006369 -3.736 0.000666 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
R-sq.(adj) = 0.554 Deviance explained = 28.5%
GCV score = 0.09904 Scale est. = 0.093686 n = 37
anova.gam(sed.lin, sed.gam, test="F")
Analysis of Deviance Table Model 1: wm.sed ~ Temp Model 2: wm.sed ~ s(Temp) Resid. Df Resid. Dev Df Deviance F Pr(>F) 1 3.5000e+01 3.279 2 3.5000e+01 3.279 5.5554e-10 2.353e-11 0.4521 6.18e-09 ***
Thanks in advance, Denis Chabot
______________________________________________ R-help at stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
---------------------------------------------------------------------- -------- Notice: This e-mail message, together with any attachments, contains information of Merck & Co., Inc. (One Merck Drive, Whitehouse Station, New Jersey, USA 08889), and/or its affiliates (which may be known outside the United States as Merck Frosst, Merck Sharp & Dohme or MSD and in Japan, as Banyu) that may be confidential, proprietary copyrighted and/or legally privileged. It is intended solely for the use of the individual or entity named on this message. If you are not the intended recipient, and have received this message in error, please notify us immediately by reply e-mail and then delete it from your system. ---------------------------------------------------------------------- --------