Skip to content

A question about using “by” in GAM model fitting of interaction between smooth terms and factor

3 messages · willow1980, Simon Wood

#
I am a little bit confusing about the following help message on how to fit a
GAM model with interaction between factor and smooth terms from
http://rss.acs.unt.edu/Rdoc/library/mgcv/html/gam.models.html:
?Sometimes models of the form: 
E(y)=b0+f(x)z
need to be estimated (where f is a smooth function, as usual.) The
appropriate formula is:
y~z+s(x,by=z)
- the by argument ensures that the smooth function gets multiplied by
covariate z, but GAM smooths are centred (average value zero), so the z+
term is needed as well (f is being represented by a constant plus a centred
smooth). If we'd wanted: 
E(y)=f(x)z
then the appropriate formula would be: y~z+s(x,by=z)-1.?
When I tried two scripts, I found they gave the same results. That is, the
codes ?y~z+s(x,by=z)? and ?y~z+s(x,by=z)-1? gave the same results. The
following is my result:
###########################################################################
?anova(model1,model2,test="Chisq")
Analysis of Deviance Table

Model 1: FLBS ~ SES + s(FAFR, by = SES) + s(byear, by = SES) + s(FAFR,
    byear, by = SES)
Model 2: FLBS ~ SES + s(FAFR, by = SES) + s(byear, by = SES) + s(FAFR,
    byear, by = SES) - 1
   Resid. Df Resid. Dev         Df  Deviance P(>|Chi|)
1 1.2076e+03     1458.4                               
2 1.2076e+03     1458.4 1.9099e-11 5.030e-10 2.074e-10?
###########################################################################
Is this in conflict with above statement that ?If we'd wanted: E(y)=f(x)z
then the appropriate formula would be: y~z+s(x,by=z)-1.?? Also, if you are
familiar with GAM modelling, please have a look at my modelling process.
That is, I want to study how one factor together with two smooth terms will
influence the response. In model2, I also fitted the interaction between two
smooth terms, together with the interaction of this interaction with factor.
Is model 2 reasonable? I find it is rather complicated to interpret the plot
of model 2.
Thank you very much for helping!
#
The problem here is that the help page you are looking at appears to be from 
an earlier version of `mgcv' than you are using (it's from a version that did 
not support factor `by' variables). Take a look at ?gam.models for the 
version that you are actually using. 

The reason that your models give the same fit is because ~z and ~z-1 differ 
only in the identifiability constraints used, when `z' is a factor (for all 
linear type models). 

As far as model reasonableness is concerned: it's a bit difficult to say 
without knowing the context. The only thing that stands out is that you are 
using an isotropic `s' term for the interaction --- this is fine if `byear' 
and `FAFR' are really naturally on the same scale, but if not tensor product 
smooths (`te') may be preferable, as the are independent of the relative 
scaling of the variables. For plot interpretability, I'd drop the `main 
effect' smooths and just leave in the interaction.  

best,
Simon
On Tuesday 05 May 2009 16:53, willow1980 wrote:

  
    
#
Dear Simon,
Thank you so much!
Actually, it seems that Crawley's R book adopted the information from that
earlier version and he discussed "by" in the context of that version. 
I will take a practice according to your suggestion.
Thanks again!
Jianghua
Simon Wood-4 wrote: