Skip to content

mgcv: increasing basis dimension

5 messages · Simon Wood, Greg Dropkin

#
hi

Using a ts or tprs basis, I expected gcv to decrease when increasing the
basis dimension, as I thought this would minimise gcv over a larger
subspace. But gcv increased. Here's an example. thanks for any comments.

greg

#simulate some data
set.seed(0)
x1<-runif(500)
x2<-rnorm(500)
x3<-rpois(500,3)
d<-runif(500)
linp<--1+x1+0.5*x2+0.3*exp(-2*d)*sin(10*d)*x3
y<-rpois(500,exp(linp))
sum(y)

library(mgcv)
#basis dimension k=5
m1<-gam(y~x1+x2+te(d,bs="ts")+te(x3,bs="ts")+te(d,x3,bs="ts"),family="poisson")

#basis dimension k=10
m2<-gam(y~x1+x2+te(d,bs="ts",k=10)+te(x3,bs="ts",k=10)+te(d,x3,bs="ts",k=10),family="poisson")

#gcv increased
m1$gcv
m2$gcv

summary(m1)
summary(m2)

gam.check(m1)
gam.check(m2)


#is this due to bs="ts"?

#basis dimension k=5
m1tp<-gam(y~x1+x2+te(d,bs="tp")+te(x3,bs="tp")+te(d,x3,bs="tp"),family="poisson")

#basis dimension k=10
m2tp<-gam(y~x1+x2+te(d,bs="tp",k=10)+te(x3,bs="tp",k=10)+te(d,x3,bs="tp",k=10),family="poisson")

m1tp$gcv
m2tp$gcv

#no

summary(m1tp)
summary(m2tp)

gam.check(m1tp)
gam.check(m2tp)
#
That's interesting. Playing with the example, it doesn't seem to be a 
local minimum. I think that this happens because, although the higher 
rank basis contains the lower rank basis, the penalty can not simply 
suppress all the extra components in the higher rank basis and recover 
exactly what the lower rank basis gave: it's forced to include some of 
the extra stuff, even if heavily penalized, and this is what is 
degrading the higher rank fit in this case.

t2 tensor product smooths seem to be less susceptible to this effect, 
and for reasons I don't understand so does REML based smoothness 
selection (gam(...,method="REML"))

best,
Simon
On 13/02/12 23:24, Greg Dropkin wrote:

  
    
#
thanks Simon

I'll upgrade R to try t2. The data I'm actually analysing requires scaled
Poisson so I don't think REML is an option.

thanks

Greg
On 14/02/12 11:22 Simon Wood wrote:
That's interesting. Playing with the example, it doesn't seem to be a
local minimum. I think that this happens because, although the higher
rank basis contains the lower rank basis, the penalty can not simply
suppress all the extra components in the higher rank basis and recover
exactly what the lower rank basis gave: it's forced to include some of
the extra stuff, even if heavily penalized, and this is what is
degrading the higher rank fit in this case.

t2 tensor product smooths seem to be less susceptible to this effect,
and for reasons I don't understand so does REML based smoothness
selection (gam(...,method="REML"))

best,
Simon
#
Hi Greg,

Recent mgcv versions use extended quasi-likelihood in place of the 
likelihood for (Laplace approx) REML with quasi families (e.g. McCullagh 
and Nelder, GLM book 2nd ed section 9.6): this fixes the problems with 
trying to use the quasi-likelihood directly with REML.

best,
Simon
On 14/02/12 10:42, Greg Dropkin wrote:

  
    
10 days later
#
hi Simon

this follows on from the example where gcv increased unexpectedly with
increasing basis dimension. I'm now looking at t2 tensor splines with
REML, and find that the REML score can increase when adding a new
predictor. Again, this seems odd.

thanks

Greg

library(mgcv)
set.seed(0)
#simulate some data
x1<-runif(2000)
x2<-rnorm(2000)
x3<-rpois(2000,3)
d<-runif(2000)
linp<--5+x1+0.3*x2+4*cos(x1*x2)
y<-rpois(2000,exp(linp))
#y does not depend on d
table(y)
sum(y)

m0<-gam(y~t2(x1)+t2(x2)+t2(x1,x2),family="quasipoisson",method="REML",scale=-1)
sc0<-m0$scale
sc0
m0<-gam(as.formula(m0),family="quasipoisson",method="REML",scale=sc0)
m1<-gam(update.formula(m0,~.+t2(d)),family="quasipoisson",method="REML",scale=sc0)

#dev has decreased slightly, though not significantly
m0$dev
m1$dev

#REML has increased, unexpected (to me)
m0$gcv
m1$gcv

#does this go away if using full=TRUE?

m0t<-gam(y~t2(x1,full=TRUE)+t2(x2,full=TRUE)+t2(x1,x2,full=TRUE),family="quasipoisson",method="REML",scale=-1)
sc0t<-m0t$scale
sc0t
m0t<-gam(as.formula(m0t),family="quasipoisson",method="REML",scale=sc0t)
m1t<-gam(update.formula(m0t,~.+t2(d,full=TRUE)),family="quasipoisson",method="REML",scale=sc0t)

#dev has still decreased slightly
m0t$dev
m1t$dev

#REML has still increased
m0t$gcv
m1t$gcv

#how about te rather than t2?

m0te<-gam(y~te(x1)+te(x2)+te(x1,x2),family="quasipoisson",method="REML",scale=-1)
sc0te<-m0t$scale
sc0te
m0te<-gam(as.formula(m0t),family="quasipoisson",method="REML",scale=sc0t)
m1te<-gam(update.formula(m0t,~.+te(d)),family="quasipoisson",method="REML",scale=sc0t)

#dev has still decreased slightly
m0te$dev
m1te$dev

#now REML decreases
m0te$gcv
m1te$gcv