Skip to content

using GAM to assess the linearity in logistic regression

4 messages · ronggui, Wensui Liu, John Fox

#
as agresti(2002) points out that we had better to screen the data to see if the the logit(pi) and the predictor has linear realtionship in logistic regressin.and i find some materials  in MASS and the refernce of s-plus.but it is a bit  simple and i can not exactly master the means to assess the linearity in logistic regression. so anyone suggest some materials?

i am not familiar with GAM,but i think thers maybe some materials can let me use GAM to assess the linearity in logistic regression without master GAM model. is it right?

thank you!
#
I am a little confused about what you asked. 

If you want to assess the linearity in logistic regression, why do you
want to use GAM instead of GLM?

As far as I understand, GAM is used to capture nonlinearity rather linearity.

Am I right here?
On Apr 1, 2005 10:19 PM, ronggui <0034058 at fudan.edu.cn> wrote:

  
    
#
maybe the idea is simle,but the details is beyond me.you are right,gam can capture the non-linearity.but if the results from gam shows little evidence on on-linearity,then we can assume linearity exists. am i right? 

from agresti(2002):
...
Before fitting the model and making such interpretations,
look at the data to check that the logistic regression model is appropriate.
Since Y takes only values 0 and 1, it is difficult to check this by plotting Y
against x.
It can be helpful to plot sample proportions or logits against x.......When X is continuous and all nis1, or when it is essentially continuous
and all ni are small, this is unsatisfactory. One could group the data with
nearby x values into categories before calculating sample proportions and
sample logits. A better approach that does not require choosing arbitrary
categories uses a smoothing mechanism to reveal trends. One such smoothing
approach fits a generalized additive model__Section 4.8., which replaces the
linear predictor of a GLM by a smooth function. Inspect a plot of the fit
to see if severe discrepancies occur from the S-shaped trend predicted
by logistic regression.

from" S-PLUS (and R) Manual to Accompany
Agresti¡¯s Categorical Data Analysis (2002)"(2nd edition,Laura A. Thompson, 2005)

Prior to fitting a logistic regression model to data, one should check the assumption of a logistic relationship between the response and explanatory variables. A simple way
to do this is to use the linear relationship between the logit and the explanatory variable. The values of the explanatory variable can be plotted against the sample logits (p. 168, Agresti) at those values. The plot should look roughly linear for a logistic model to be appropriate. If there are not enough response data at each unique x value (and categorizing x values is undesirable), then the technique of the last section in Chapter 4 can be used (i.e., GAM). There, we saw that a sigmoidal (or S-shaped) trend
appeared in the plot of the response by predictor (Figure 4.7, Agresti).

 from MASS:
....
    Residuals are not always very informative with binary responses but at least
none are particularly large here.
    An alternative approach is to predict the actual live birth weight and later
threshold at 2.5 kilograms. This is left as an exercise for the reader; surprisingly
it produces somewhat worse predictions with around 52 errors.
      We can examine the linearity in age and mother¡¯s weight more flexibly using
generalized additive models. These stand in the same relationship to additive
models (Section 8.8) as generalized linear models do to regression models; replace
the linear predictor in a GLM by an additive model, the sum of linear and
smooth terms in the explanatory variables. We use function gam from S-PLUS.
(R has a somewhat different function gam in package mgcv by Simon Wood.)
ht + ui + ftv + s(age1) + s(age2) + smoke:ui, binomial,
bwt, bf.maxit=25)
Residual Deviance: 170.35 on 165.18 degrees of freedom
DF for Terms and Chi-squares for Nonparametric Effects
Df Npar Df Npar Chisq P(Chi)
s(age) 1 3.0 3.1089 0.37230
s(lwt) 1 2.9 2.3392 0.48532
s(age1) 1 3.0 3.2504 0.34655
s(age2) 1 3.0 3.1472 0.36829
FALSE TRUE
0 115 15
1 28 31
Creating the variables age1 and age2 allows us to fit smooth terms for the difference
in having one or more visits in the first trimester. Both the summary and
the plots show no evidence of non-linearity. The convergence of the fitting algorithm
is slow in this example, so we increased the control parameter bf.maxit
from 10 to 25. The parameter ask = T allows us to choose plots from a menu.
Our choice of plots is shown in Figure 7.2.
See Chambers and Hastie (1992) for more details on gam .




On Fri, 01 Apr 2005 23:37:13 -0500
Wensui Liu <liuwensui at gmail.com> wrote:

            
#
Dear ronggui,

There are several approaches you can take, one of which is to fit a GAM and
simply look to see whether the relationships appear linear on the logit
scale. As well, you could compare the fit of the GAM with semiparametric
models in which each smooth term in turn is replaced by a linear term; see
?anova.gam in the mcgv or gam package and the on-line appendix on
nonparametric regression to my R and S-PLUS Companion to Applied Regression
(at
http://socserv.socsci.mcmaster.ca/jfox/Books/Companion/appendix-nonparametri
c-regression.pdf, and slightly out of date).

Another approach is to fit the linear logit model with glm() and examine
component+residual (partial-residual) plots via the cr.plots() function or
the ceres.plots() function, both in the car package. 

If nonlinearity in, say, x is correctable by a power transformation, you can
get an approximate score test for the need to transform x by adding the
"constructed variable" I(x*log(x)) to the model and examining its Wald
statistic; an added-variable plot (av.plots in car) for the constructed
variable shows leverage and influence on the decision to transform x. You
can also compute a suggested power transformation as p = 1 - b/g, where b is
the coefficient of x in the *original* model and g that of the constructed
variable. Details are in the R and S-PLUS Companion. Some further examples
are in lecture notes at
http://socserv.socsci.mcmaster.ca/jfox/Courses/soc740/lecture-11.pdf.

If x is quantitative but discrete, refitting the logit model replacing x
with as.factor(x) and comparing via anova() to the original model gives a
test of nonlinearity.

I hope this helps,
 John

--------------------------------
John Fox
Department of Sociology
McMaster University
Hamilton, Ontario
Canada L8S 4M4
905-525-9140x23604
http://socserv.mcmaster.ca/jfox 
--------------------------------