unexpected GAM result - at least for me! - R-help

Mon, Mar 31, 2008 5:34 AM #

Hi


I am afraid i am not understanding something  very fundamental.... and does not matter how much i am looking into the book "Generalized Additive Models"  of S. Wood i still don't understand my result.

I am trying to model presence / absence (presence = 1, absence = 0) of a species using some lidar metrics (i have 4 of these). I am using different models and such .... and when i used gam i got this very weird (for me) result which i thought it is not possible - or i have no idea how to interpret it.

Family: binomial
Link function: logit
Formula:
can> 0 ~ s(be) + s(crr) + s(ch) + s(home)
Parametric coefficients:
            Estimate Std. Error z value Pr(>|z|)
(Intercept)    85.39     162.88   0.524      0.6
Approximate significance of smooth terms:
          edf Est.rank Chi.sq p-value
s(be)   1.000        1  0.100   0.751
s(crr)  3.929        8  0.380   1.000
s(ch)   6.820        9  0.396   1.000
s(home) 1.000        1  0.314   0.575
R-sq.(adj) =      1   Deviance explained =  100%
UBRE score = -0.81413  Scale est. = 1         n = 148

Is this a perfect fit with no statistical significance, an over-estimating or what???? It seems that the significance of the smooths terms is "null". Of course with such a model i predict perfectly presence / absence of species.

Again, i hope you don't mind i'm asking you this. Any explanation will be very much appreciated.

Thanks,

Monica

PS. I've contacted the author of the book who is the package maintainer as well but until now i didn't get a reply.

_________________________________________________________________


esh_realtime_042008

Duncan Murdoch

Mon, Mar 31, 2008 5:47 AM #

On 3/31/2008 8:34 AM, Monica Pisica wrote:

Look at the data.  You can get a perfect fit to a logistic regression 
model fairly easily, and it looks as though you've got one.  (In fact, 
the huge intercept suggests that all predictions will be 1.  Do you 
actually have any variation in the data?)

Duncan Murdoch

Monica Pisica

Mon, Mar 31, 2008 6:01 AM #

An embedded and charset-unspecified text was scrubbed...
Name: not available
Url: https://stat.ethz.ch/pipermail/r-help/attachments/20080331/a477b1d2/attachment.pl

Duncan Murdoch

Mon, Mar 31, 2008 6:30 AM #

On 3/31/2008 9:01 AM, Monica Pisica wrote:

I repeat:  look at the data. Compare the observed and predicted. That's 
the only way to know whether this is reasonable or not.

If you're getting reasonable predictions, then it's a valid fit.  (The 
tests and approximations used in the reported p-values may not be at all 
valid.  I don't know what the requirements are for those in a GAM, but 
if you're getting a perfect fit, then they probably aren't being met.)

Duncan Murdoch

 
Thanks again,
 
Monica

 > Date: Mon, 31 Mar 2008 08:47:48 -0400
 > From: murdoch at stats.uwo.ca
 > To: pisicandru at hotmail.com
 > CC: r-help at r-project.org
 > Subject: Re: [R] unexpected GAM result - at least for me!
 >
 > On 3/31/2008 8:34 AM, Monica Pisica wrote:

 > >
 > > Hi
 > >
 > >
 > > I am afraid i am not understanding something very fundamental....

and does not matter how much i am looking into the book "Generalized 
Additive Models" of S. Wood i still don't understand my result.

 > >
 > > I am trying to model presence / absence (presence = 1, absence = 0)

of a species using some lidar metrics (i have 4 of these). I am using 
different models and such .... and when i used gam i got this very weird 
(for me) result which i thought it is not possible - or i have no idea 
how to interpret it.

> >

 > >> can3.gam <- gam(can>0~s(be)+s(crr)+s(ch)+s(home), family = 'binomial')
 > >> summary(can3.gam)

 > > Family: binomial
 > > Link function: logit
 > > Formula:
 > > can> 0 ~ s(be) + s(crr) + s(ch) + s(home)
 > > Parametric coefficients:
 > > Estimate Std. Error z value Pr(>|z|)
 > > (Intercept) 85.39 162.88 0.524 0.6
 > > Approximate significance of smooth terms:
 > > edf Est.rank Chi.sq p-value
 > > s(be) 1.000 1 0.100 0.751
 > > s(crr) 3.929 8 0.380 1.000
 > > s(ch) 6.820 9 0.396 1.000
 > > s(home) 1.000 1 0.314 0.575
 > > R-sq.(adj) = 1 Deviance explained = 100%
 > > UBRE score = -0.81413 Scale est. = 1 n = 148
 > >
 > > Is this a perfect fit with no statistical significance, an

over-estimating or what???? It seems that the significance of the 
smooths terms is "null". Of course with such a model i predict perfectly 
presence / absence of species.

 > >
 > > Again, i hope you don't mind i'm asking you this. Any explanation

will be very much appreciated.

 >
 > Look at the data. You can get a perfect fit to a logistic regression
 > model fairly easily, and it looks as though you've got one. (In fact,
 > the huge intercept suggests that all predictions will be 1. Do you
 > actually have any variation in the data?)
 >
 > Duncan Murdoch


In a rush? Get real-time answers with Windows Live Messenger. 
<http://www.windowslive.com/messenger/overview.html?ocid=TXT_TAGLM_WL_Refresh_realtime_042008>