one year ago, here's the response :
"
- package gam is based very closely on the GAM approach presented in
Hastie and Tibshirani's "Generalized Additive Models" book. Estimation
is
by back-fitting and model selection is based on step-wise regression
methods based on approximate distributional results. A particular
strength
of this approach is that local regression smoothers (`lo()' terms) can
be
included in GAM models.
- gam in package mgcv represents GAMs using penalized regression
splines.
Estimation is by direct penalized likelihood maximization with
integrated smoothness estimation via GCV or related criteria (there is
also an alternative `gamm' function based on a mixed model approach).
Strengths of the this approach are that s() terms can be functions of
more
than one variable and that tensor product smooths are available via te()
terms - these are useful when different degrees of smoothness are
appropriate relative to different arguments of a smooth.
(...)
Basically, if you want integrated smoothness selection, an underlying
parametric representation, or want smooth interactions in your models
then mgcv is probably worth a try (but I would say that). If you want to
use local regression smoothers and/or prefer the stepwise selection
approach then package gam is for you.
"
i think the difference of p-values between :gam and :mgcv, is because
you don't have same number of step iteration. mgcv : gam choose the
number of step and with gam : gam you have to choose it..
hope it helps and someone gives us more details...
Yves
Le mer 28/09/2005 ?? 15:30, Denis Chabot a ??crit :
I only got one reply to my message:
No, this won't work. The problem is the usual one with model
selection: the p-value is calculated as if the df had been fixed,
when really it was estimated.
It is likely to be quite hard to get an honest p-value out of
something that does adaptive smoothing.
-thomas
I do not understand this: it seems that a lot of people chose df=4
for no particular reason, but p-levels are correct. If instead I
choose df=8 because a previous model has estimated this to be an
optimal df, P-levels are no good because df are estimated?
Furthermore, shouldn't packages gam and mgcv give similar results
when the same data and df are used? I tried this:
library(gam)
data(kyphosis)
kyp1 <- gam(Kyphosis ~ s(Age, 4), family=binomial, data=kyphosis)
kyp2 <- gam(Kyphosis ~ s(Number, 4), family=binomial, data=kyphosis)
kyp3 <- gam(Kyphosis ~ s(Start, 4), family=binomial, data=kyphosis)
anova.gam(kyp1)
anova.gam(kyp2)
anova.gam(kyp3)
detach(package:gam)
library(mgcv)
kyp4 <- gam(Kyphosis ~ s(Age, k=4, fx=T), family=binomial,
data=kyphosis)
kyp5 <- gam(Kyphosis ~ s(Number, k=4, fx=T), family=binomial,
data=kyphosis)
kyp6 <- gam(Kyphosis ~ s(Start, k=4, fx=T), family=binomial,
data=kyphosis)
anova.gam(kyp4)
anova.gam(kyp5)
anova.gam(kyp6)
P levels for these models, by pair
kyp1 vs kyp4: p= 0.083 and 0.068 respectively (not too bad)
kyp2 vs kyp5: p= 0.445 and 0.03 (wow!)
kyp3 vs kyp6: p= 0.053 and 0.008 (wow again)
Also if you plot all these you find that the mgcv plots are smoother
than the gam plots, even the same df are used all the time.
I am really confused now!
Denis
D??but du message r??exp??di?? :
De : Denis Chabot <chabotd at globetrotter.net>
Date : 26 septembre 2005 12:25:04 HAE
?? : r-help at stat.math.ethz.ch
Objet : p-level in packages mgcv and gam
Hi,
I am fairly new to GAM and started using package mgcv. I like the
fact that optimal smoothing is automatically used (i.e. df are not
determined a priori but calculated by the gam procedure).
But the mgcv manual warns that p-level for the smooth can be
underestimated when df are estimated by the model. Most of the
time my p-levels are so small that even doubling them would not
result in a value close to the P=0.05 threshold, but I have one
case with P=0.033.
I thought, probably naively, that running a second model with
fixed df, using the value of df found in the first model. I could
not achieve this with mgcv: its gam function does not seem to
accept fractional values of df (in my case 8.377).
So I used the gam package and fixed df to 8.377. The P-value I
obtained was slightly larger than with mgcv (0.03655 instead of
0.03328), but it is still < 0.05.
Was this a correct way to get around the "underestimated P-level"?
Furthermore, although the gam.check function of the mgcv package
suggests to me that the gaussian family (and identity link) are
adequate for my data, I must say the instructions in R help for
"family" and in Hastie, T. and Tibshirani, R. (1990) Generalized
Additive Models are too technical for me. If someone knows a
reference that explains how to choose model and link, i.e. what
tests to run on your data before choosing, I would really
appreciate it.
Thanks in advance,
Denis Chabot