Skip to content

Automatic Knot selection in Piecewise linear splines

5 messages · Martin Maechler, Anupam Tyagi, vito muggeo

#
How can I do automatic knot selection while fitting piecewise linear
splines to two variables x and y? Which package to use to do it simply? I
also want to visualize the splines (and the scatter plot) with a graph.

Anupam
6 days later
#
> How can I do automatic knot selection while fitting piecewise linear
    > splines to two variables x and y? Which package to use to do it simply? I
    > also want to visualize the splines (and the scatter plot) with a graph.

    > Anupam

NB: linear splines, i.e. piecewise linear continuous functions.
Given the knots, use  approx() or approxfun() however, the
automatic knots selection does not happen in the base R packages.

I'm sure there are several R packages doing this.
The best such package in my opinion is "earth" which does a
re-implementation (and extensive  *generalization*) of the
famous  MARS algorithm of Friedman.
==> https://en.wikipedia.org/wiki/Multivariate_adaptive_regression_splines

Note that their strengths and power is that  they do their work
for multivariate x (MARS := Multivariate Adaptive Regression
Splines), but indeed do work for the simple 1D case.

In the following example, we always get 11 final knots,
but I'm sure one can tweak the many tuning paramters of earth()
to get more:

## Can we do  knot-selection  for simple (x,y) splines?  === Yes, via  earth() {using MARS}!

x <- (0:800)/8

f <- function(x) 7 * sin(pi/8*x) * abs((x-50)/20)^1.25 - (x-40)*(12-x)/64
curve(f(x), 0, 100, n = 1000, col=2, lwd=2)

set.seed(11)
y <- f(x) + 10*rnorm(x)

m.sspl <- smooth.spline(x,y) # base line "standard smoother"

require(earth)
fm1 <- earth(x, y) # default settings
summary(fm1, style = "pmax") #-- got  10 knots (x = 44 "used twice") below
## Call: earth(x=x, y=y)

## y =
##   175.9612
##   -   10.6744 * pmax(0,      x -  4.625)
##   +  9.928496 * pmax(0,      x - 10.875)
##   -  5.940857 * pmax(0,      x -  20.25)
##   +  3.438948 * pmax(0,      x - 27.125)
##   -  3.828159 * pmax(0,     44 -      x)
##   +  4.207046 * pmax(0,      x -     44)
##   +  2.573822 * pmax(0,      x -   76.5)
##   -  10.99073 * pmax(0,      x - 87.125)
##   +  10.97592 * pmax(0,      x - 90.875)
##   +  9.331949 * pmax(0,      x -     94)
##   -   8.48575 * pmax(0,      x -   96.5)

## Selected 12 of 12 terms, and 1 of 1 predictors
## Termination condition: Reached nk 21
## Importance: x
## Number of terms at each degree of interaction: 1 11 (additive model)
## GCV 108.6592    RSS 82109.44    GRSq 0.861423    RSq 0.86894


fm2 <- earth(x, y, fast.k = 0) # (more extensive forward pass)
summary(fm2)
all.equal(fm1, fm2)# they are identical (apart from 'call'):
fm3 <- earth(x, y, fast.k = 0, pmethod = "none", trace = 3) # extensive forward pass; *no* pruning
## still no change: fm3 "==" fm1
all.equal(predict(fm1, xx), predict(fm3, xx))

## BTW: The chosen knots and coefficients are
mat <- with(fm1, cbind(dirs, cuts=c(cuts), coef = c(coefficients)))

## Plots : fine grid for visualization: instead of   xx <- seq(x[1], x[length(x)], length.out = 1024)
rnx <- extendrange(x) ## to extrapolate a bit
xx <- do.call(seq.int, c(rnx, list(length.out = 1200)))

cbind(f = f(xx),
      sspl = predict(m.sspl, xx)$y,
      mars = predict(fm1, xx)) -> fits

plot(x,y, xlim=rnx, cex = 1/4, col = adjustcolor(1, 1/2))
cols <- c(adjustcolor(2, 1/3),
          adjustcolor(4, 2/3),
          adjustcolor("orange4", 2/3))
lwds <- c(3, 2, 2)
matlines(xx, fits, col = cols, lwd = lwds, lty=1)
legend("topleft", c("true f(x)", "smooth.spline()", "earth()"),
       col=cols, lwd=lwds, bty = "n")
title(paste("earth() linear spline vs. smooth.spline();  n =", length(x)))
mtext(substitute(f(x) == FDEF, list(FDEF = body(f))))
#
Thanks, Martin. This is very helpful.



On Tue, 16 Jul 2024 at 14:52, Martin Maechler <maechler at stat.math.ethz.ch>
wrote:

  
    
10 days later
#
dear all,
I apologize for my delay in replying you. Here my contribution, maybe 
just for completeness:

Similar to "earth", "segmented" also fits piecewise linear relationships 
with the number of breakpoints being selected by the AIC or BIC 
(recommended).

#code (example and code from Martin Maechler previous email)

library(segmented)
o<-selgmented(y, ~x, Kmax=20, type="bic", msg=TRUE)
plot(o, add=TRUE)
lines(o, col=2) #the approx CI for the breakpoints

confint(o) #the estimated breakpoints (with CI's)
slope(o) #the estimated slopes (with CI's)


However segmented appears to be less efficient than earth (although with 
reasonable running times), it does NOT work with multivariate responses 
neither products between piecewise linear terms.

kind regards,
Vito



Il 16/07/2024 11:22, Martin Maechler ha scritto:

  
    
#
Thanks!

For some reason I am getting an error when I run your code with my
variables. It works fine with Martin's x and y variables.
So far as I know variable lengths are equal.
Error in model.frame.default(formula = y ~ x, drop.unused.levels = TRUE) :
        variable lengths differ (found for 'x')
[1] 2726
[1] 2726
On Fri, 26 Jul 2024 at 20:16, Vito Muggeo <vito.muggeo at unipa.it> wrote: