How can I do automatic knot selection while fitting piecewise linear splines to two variables x and y? Which package to use to do it simply? I also want to visualize the splines (and the scatter plot) with a graph. Anupam
Automatic Knot selection in Piecewise linear splines
5 messages · Martin Maechler, Anupam Tyagi, vito muggeo
6 days later
Anupam Tyagi
on Tue, 9 Jul 2024 16:16:43 +0530 writes:
> How can I do automatic knot selection while fitting piecewise linear
> splines to two variables x and y? Which package to use to do it simply? I
> also want to visualize the splines (and the scatter plot) with a graph.
> Anupam
NB: linear splines, i.e. piecewise linear continuous functions.
Given the knots, use approx() or approxfun() however, the
automatic knots selection does not happen in the base R packages.
I'm sure there are several R packages doing this.
The best such package in my opinion is "earth" which does a
re-implementation (and extensive *generalization*) of the
famous MARS algorithm of Friedman.
==> https://en.wikipedia.org/wiki/Multivariate_adaptive_regression_splines
Note that their strengths and power is that they do their work
for multivariate x (MARS := Multivariate Adaptive Regression
Splines), but indeed do work for the simple 1D case.
In the following example, we always get 11 final knots,
but I'm sure one can tweak the many tuning paramters of earth()
to get more:
## Can we do knot-selection for simple (x,y) splines? === Yes, via earth() {using MARS}!
x <- (0:800)/8
f <- function(x) 7 * sin(pi/8*x) * abs((x-50)/20)^1.25 - (x-40)*(12-x)/64
curve(f(x), 0, 100, n = 1000, col=2, lwd=2)
set.seed(11)
y <- f(x) + 10*rnorm(x)
m.sspl <- smooth.spline(x,y) # base line "standard smoother"
require(earth)
fm1 <- earth(x, y) # default settings
summary(fm1, style = "pmax") #-- got 10 knots (x = 44 "used twice") below
## Call: earth(x=x, y=y)
## y =
## 175.9612
## - 10.6744 * pmax(0, x - 4.625)
## + 9.928496 * pmax(0, x - 10.875)
## - 5.940857 * pmax(0, x - 20.25)
## + 3.438948 * pmax(0, x - 27.125)
## - 3.828159 * pmax(0, 44 - x)
## + 4.207046 * pmax(0, x - 44)
## + 2.573822 * pmax(0, x - 76.5)
## - 10.99073 * pmax(0, x - 87.125)
## + 10.97592 * pmax(0, x - 90.875)
## + 9.331949 * pmax(0, x - 94)
## - 8.48575 * pmax(0, x - 96.5)
## Selected 12 of 12 terms, and 1 of 1 predictors
## Termination condition: Reached nk 21
## Importance: x
## Number of terms at each degree of interaction: 1 11 (additive model)
## GCV 108.6592 RSS 82109.44 GRSq 0.861423 RSq 0.86894
fm2 <- earth(x, y, fast.k = 0) # (more extensive forward pass)
summary(fm2)
all.equal(fm1, fm2)# they are identical (apart from 'call'):
fm3 <- earth(x, y, fast.k = 0, pmethod = "none", trace = 3) # extensive forward pass; *no* pruning
## still no change: fm3 "==" fm1
all.equal(predict(fm1, xx), predict(fm3, xx))
## BTW: The chosen knots and coefficients are
mat <- with(fm1, cbind(dirs, cuts=c(cuts), coef = c(coefficients)))
## Plots : fine grid for visualization: instead of xx <- seq(x[1], x[length(x)], length.out = 1024)
rnx <- extendrange(x) ## to extrapolate a bit
xx <- do.call(seq.int, c(rnx, list(length.out = 1200)))
cbind(f = f(xx),
sspl = predict(m.sspl, xx)$y,
mars = predict(fm1, xx)) -> fits
plot(x,y, xlim=rnx, cex = 1/4, col = adjustcolor(1, 1/2))
cols <- c(adjustcolor(2, 1/3),
adjustcolor(4, 2/3),
adjustcolor("orange4", 2/3))
lwds <- c(3, 2, 2)
matlines(xx, fits, col = cols, lwd = lwds, lty=1)
legend("topleft", c("true f(x)", "smooth.spline()", "earth()"),
col=cols, lwd=lwds, bty = "n")
title(paste("earth() linear spline vs. smooth.spline(); n =", length(x)))
mtext(substitute(f(x) == FDEF, list(FDEF = body(f))))
Thanks, Martin. This is very helpful. On Tue, 16 Jul 2024 at 14:52, Martin Maechler <maechler at stat.math.ethz.ch> wrote:
Anupam Tyagi
on Tue, 9 Jul 2024 16:16:43 +0530 writes:
> How can I do automatic knot selection while fitting piecewise linear
> splines to two variables x and y? Which package to use to do it
simply? I
> also want to visualize the splines (and the scatter plot) with a
graph.
> Anupam
NB: linear splines, i.e. piecewise linear continuous functions. Given the knots, use approx() or approxfun() however, the automatic knots selection does not happen in the base R packages. I'm sure there are several R packages doing this. The best such package in my opinion is "earth" which does a re-implementation (and extensive *generalization*) of the famous MARS algorithm of Friedman. ==> https://en.wikipedia.org/wiki/Multivariate_adaptive_regression_splines Note that their strengths and power is that they do their work for multivariate x (MARS := Multivariate Adaptive Regression Splines), but indeed do work for the simple 1D case. In the following example, we always get 11 final knots, but I'm sure one can tweak the many tuning paramters of earth() to get more: ## Can we do knot-selection for simple (x,y) splines? === Yes, via earth() {using MARS}! x <- (0:800)/8 f <- function(x) 7 * sin(pi/8*x) * abs((x-50)/20)^1.25 - (x-40)*(12-x)/64 curve(f(x), 0, 100, n = 1000, col=2, lwd=2) set.seed(11) y <- f(x) + 10*rnorm(x) m.sspl <- smooth.spline(x,y) # base line "standard smoother" require(earth) fm1 <- earth(x, y) # default settings summary(fm1, style = "pmax") #-- got 10 knots (x = 44 "used twice") below ## Call: earth(x=x, y=y) ## y = ## 175.9612 ## - 10.6744 * pmax(0, x - 4.625) ## + 9.928496 * pmax(0, x - 10.875) ## - 5.940857 * pmax(0, x - 20.25) ## + 3.438948 * pmax(0, x - 27.125) ## - 3.828159 * pmax(0, 44 - x) ## + 4.207046 * pmax(0, x - 44) ## + 2.573822 * pmax(0, x - 76.5) ## - 10.99073 * pmax(0, x - 87.125) ## + 10.97592 * pmax(0, x - 90.875) ## + 9.331949 * pmax(0, x - 94) ## - 8.48575 * pmax(0, x - 96.5) ## Selected 12 of 12 terms, and 1 of 1 predictors ## Termination condition: Reached nk 21 ## Importance: x ## Number of terms at each degree of interaction: 1 11 (additive model) ## GCV 108.6592 RSS 82109.44 GRSq 0.861423 RSq 0.86894 fm2 <- earth(x, y, fast.k = 0) # (more extensive forward pass) summary(fm2) all.equal(fm1, fm2)# they are identical (apart from 'call'): fm3 <- earth(x, y, fast.k = 0, pmethod = "none", trace = 3) # extensive forward pass; *no* pruning ## still no change: fm3 "==" fm1 all.equal(predict(fm1, xx), predict(fm3, xx)) ## BTW: The chosen knots and coefficients are mat <- with(fm1, cbind(dirs, cuts=c(cuts), coef = c(coefficients))) ## Plots : fine grid for visualization: instead of xx <- seq(x[1], x[length(x)], length.out = 1024) rnx <- extendrange(x) ## to extrapolate a bit xx <- do.call(seq.int, c(rnx, list(length.out = 1200))) cbind(f = f(xx), sspl = predict(m.sspl, xx)$y, mars = predict(fm1, xx)) -> fits plot(x,y, xlim=rnx, cex = 1/4, col = adjustcolor(1, 1/2)) cols <- c(adjustcolor(2, 1/3), adjustcolor(4, 2/3), adjustcolor("orange4", 2/3)) lwds <- c(3, 2, 2) matlines(xx, fits, col = cols, lwd = lwds, lty=1) legend("topleft", c("true f(x)", "smooth.spline()", "earth()"), col=cols, lwd=lwds, bty = "n") title(paste("earth() linear spline vs. smooth.spline(); n =", length(x))) mtext(substitute(f(x) == FDEF, list(FDEF = body(f))))
Anupam. [[alternative HTML version deleted]]
10 days later
dear all, I apologize for my delay in replying you. Here my contribution, maybe just for completeness: Similar to "earth", "segmented" also fits piecewise linear relationships with the number of breakpoints being selected by the AIC or BIC (recommended). #code (example and code from Martin Maechler previous email) library(segmented) o<-selgmented(y, ~x, Kmax=20, type="bic", msg=TRUE) plot(o, add=TRUE) lines(o, col=2) #the approx CI for the breakpoints confint(o) #the estimated breakpoints (with CI's) slope(o) #the estimated slopes (with CI's) However segmented appears to be less efficient than earth (although with reasonable running times), it does NOT work with multivariate responses neither products between piecewise linear terms. kind regards, Vito Il 16/07/2024 11:22, Martin Maechler ha scritto:
Anupam Tyagi
on Tue, 9 Jul 2024 16:16:43 +0530 writes:
> How can I do automatic knot selection while fitting piecewise linear
> splines to two variables x and y? Which package to use to do it simply? I
> also want to visualize the splines (and the scatter plot) with a graph.
> Anupam
NB: linear splines, i.e. piecewise linear continuous functions. Given the knots, use approx() or approxfun() however, the automatic knots selection does not happen in the base R packages. I'm sure there are several R packages doing this. The best such package in my opinion is "earth" which does a re-implementation (and extensive *generalization*) of the famous MARS algorithm of Friedman. ==> https://en.wikipedia.org/wiki/Multivariate_adaptive_regression_splines Note that their strengths and power is that they do their work for multivariate x (MARS := Multivariate Adaptive Regression Splines), but indeed do work for the simple 1D case. In the following example, we always get 11 final knots, but I'm sure one can tweak the many tuning paramters of earth() to get more: ## Can we do knot-selection for simple (x,y) splines? === Yes, via earth() {using MARS}! x <- (0:800)/8 f <- function(x) 7 * sin(pi/8*x) * abs((x-50)/20)^1.25 - (x-40)*(12-x)/64 curve(f(x), 0, 100, n = 1000, col=2, lwd=2) set.seed(11) y <- f(x) + 10*rnorm(x) m.sspl <- smooth.spline(x,y) # base line "standard smoother" require(earth) fm1 <- earth(x, y) # default settings summary(fm1, style = "pmax") #-- got 10 knots (x = 44 "used twice") below ## Call: earth(x=x, y=y) ## y = ## 175.9612 ## - 10.6744 * pmax(0, x - 4.625) ## + 9.928496 * pmax(0, x - 10.875) ## - 5.940857 * pmax(0, x - 20.25) ## + 3.438948 * pmax(0, x - 27.125) ## - 3.828159 * pmax(0, 44 - x) ## + 4.207046 * pmax(0, x - 44) ## + 2.573822 * pmax(0, x - 76.5) ## - 10.99073 * pmax(0, x - 87.125) ## + 10.97592 * pmax(0, x - 90.875) ## + 9.331949 * pmax(0, x - 94) ## - 8.48575 * pmax(0, x - 96.5) ## Selected 12 of 12 terms, and 1 of 1 predictors ## Termination condition: Reached nk 21 ## Importance: x ## Number of terms at each degree of interaction: 1 11 (additive model) ## GCV 108.6592 RSS 82109.44 GRSq 0.861423 RSq 0.86894 fm2 <- earth(x, y, fast.k = 0) # (more extensive forward pass) summary(fm2) all.equal(fm1, fm2)# they are identical (apart from 'call'): fm3 <- earth(x, y, fast.k = 0, pmethod = "none", trace = 3) # extensive forward pass; *no* pruning ## still no change: fm3 "==" fm1 all.equal(predict(fm1, xx), predict(fm3, xx)) ## BTW: The chosen knots and coefficients are mat <- with(fm1, cbind(dirs, cuts=c(cuts), coef = c(coefficients))) ## Plots : fine grid for visualization: instead of xx <- seq(x[1], x[length(x)], length.out = 1024) rnx <- extendrange(x) ## to extrapolate a bit xx <- do.call(seq.int, c(rnx, list(length.out = 1200))) cbind(f = f(xx), sspl = predict(m.sspl, xx)$y, mars = predict(fm1, xx)) -> fits plot(x,y, xlim=rnx, cex = 1/4, col = adjustcolor(1, 1/2)) cols <- c(adjustcolor(2, 1/3), adjustcolor(4, 2/3), adjustcolor("orange4", 2/3)) lwds <- c(3, 2, 2) matlines(xx, fits, col = cols, lwd = lwds, lty=1) legend("topleft", c("true f(x)", "smooth.spline()", "earth()"), col=cols, lwd=lwds, bty = "n") title(paste("earth() linear spline vs. smooth.spline(); n =", length(x))) mtext(substitute(f(x) == FDEF, list(FDEF = body(f))))
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
================================================= Vito M.R. Muggeo, PhD Professor of Statistics Dip.to Sc Econom, Az e Statistiche Universit? di Palermo viale delle Scienze, edificio 13 90128 Palermo - ITALY tel: 091 23895240; fax: 091 485726 http://www.unipa.it/persone/docenti/m/vito.muggeo Associate Editor: Statistical Modelling past chair, Statistical Modelling Society coordinator, PhD Program in Econ, Businss, Statist
Thanks! For some reason I am getting an error when I run your code with my variables. It works fine with Martin's x and y variables. So far as I know variable lengths are equal.
o <-selgmented(lnCpc, ~lnGdpc, Kmax=20, type="bic", msg=TRUE)
Error in model.frame.default(formula = y ~ x, drop.unused.levels = TRUE) :
variable lengths differ (found for 'x')
length(lnCpc)
[1] 2726
length(lnGdpc)
[1] 2726
On Fri, 26 Jul 2024 at 20:16, Vito Muggeo <vito.muggeo at unipa.it> wrote:
dear all, I apologize for my delay in replying you. Here my contribution, maybe just for completeness: Similar to "earth", "segmented" also fits piecewise linear relationships with the number of breakpoints being selected by the AIC or BIC (recommended). #code (example and code from Martin Maechler previous email) library(segmented) o<-selgmented(y, ~x, Kmax=20, type="bic", msg=TRUE) plot(o, add=TRUE) lines(o, col=2) #the approx CI for the breakpoints confint(o) #the estimated breakpoints (with CI's) slope(o) #the estimated slopes (with CI's) However segmented appears to be less efficient than earth (although with reasonable running times), it does NOT work with multivariate responses neither products between piecewise linear terms. kind regards, Vito Il 16/07/2024 11:22, Martin Maechler ha scritto:
Anupam Tyagi
on Tue, 9 Jul 2024 16:16:43 +0530 writes:
> How can I do automatic knot selection while fitting piecewise
linear
> splines to two variables x and y? Which package to use to do it
simply? I
> also want to visualize the splines (and the scatter plot) with a
graph.
> Anupam
NB: linear splines, i.e. piecewise linear continuous functions. Given the knots, use approx() or approxfun() however, the automatic knots selection does not happen in the base R packages. I'm sure there are several R packages doing this. The best such package in my opinion is "earth" which does a re-implementation (and extensive *generalization*) of the famous MARS algorithm of Friedman. ==>
Note that their strengths and power is that they do their work for multivariate x (MARS := Multivariate Adaptive Regression Splines), but indeed do work for the simple 1D case. In the following example, we always get 11 final knots, but I'm sure one can tweak the many tuning paramters of earth() to get more: ## Can we do knot-selection for simple (x,y) splines? === Yes, via
earth() {using MARS}!
x <- (0:800)/8 f <- function(x) 7 * sin(pi/8*x) * abs((x-50)/20)^1.25 - (x-40)*(12-x)/64 curve(f(x), 0, 100, n = 1000, col=2, lwd=2) set.seed(11) y <- f(x) + 10*rnorm(x) m.sspl <- smooth.spline(x,y) # base line "standard smoother" require(earth) fm1 <- earth(x, y) # default settings summary(fm1, style = "pmax") #-- got 10 knots (x = 44 "used twice")
below
## Call: earth(x=x, y=y) ## y = ## 175.9612 ## - 10.6744 * pmax(0, x - 4.625) ## + 9.928496 * pmax(0, x - 10.875) ## - 5.940857 * pmax(0, x - 20.25) ## + 3.438948 * pmax(0, x - 27.125) ## - 3.828159 * pmax(0, 44 - x) ## + 4.207046 * pmax(0, x - 44) ## + 2.573822 * pmax(0, x - 76.5) ## - 10.99073 * pmax(0, x - 87.125) ## + 10.97592 * pmax(0, x - 90.875) ## + 9.331949 * pmax(0, x - 94) ## - 8.48575 * pmax(0, x - 96.5) ## Selected 12 of 12 terms, and 1 of 1 predictors ## Termination condition: Reached nk 21 ## Importance: x ## Number of terms at each degree of interaction: 1 11 (additive model) ## GCV 108.6592 RSS 82109.44 GRSq 0.861423 RSq 0.86894 fm2 <- earth(x, y, fast.k = 0) # (more extensive forward pass) summary(fm2) all.equal(fm1, fm2)# they are identical (apart from 'call'): fm3 <- earth(x, y, fast.k = 0, pmethod = "none", trace = 3) # extensive
forward pass; *no* pruning
## still no change: fm3 "==" fm1 all.equal(predict(fm1, xx), predict(fm3, xx)) ## BTW: The chosen knots and coefficients are mat <- with(fm1, cbind(dirs, cuts=c(cuts), coef = c(coefficients))) ## Plots : fine grid for visualization: instead of xx <- seq(x[1],
x[length(x)], length.out = 1024)
rnx <- extendrange(x) ## to extrapolate a bit
xx <- do.call(seq.int, c(rnx, list(length.out = 1200)))
cbind(f = f(xx),
sspl = predict(m.sspl, xx)$y,
mars = predict(fm1, xx)) -> fits
plot(x,y, xlim=rnx, cex = 1/4, col = adjustcolor(1, 1/2))
cols <- c(adjustcolor(2, 1/3),
adjustcolor(4, 2/3),
adjustcolor("orange4", 2/3))
lwds <- c(3, 2, 2)
matlines(xx, fits, col = cols, lwd = lwds, lty=1)
legend("topleft", c("true f(x)", "smooth.spline()", "earth()"),
col=cols, lwd=lwds, bty = "n")
title(paste("earth() linear spline vs. smooth.spline(); n =",
length(x)))
mtext(substitute(f(x) == FDEF, list(FDEF = body(f))))
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- ================================================= Vito M.R. Muggeo, PhD Professor of Statistics Dip.to Sc Econom, Az e Statistiche Universit? di Palermo viale delle Scienze, edificio 13 90128 Palermo - ITALY tel: 091 23895240; fax: 091 485726 http://www.unipa.it/persone/docenti/m/vito.muggeo Associate Editor: Statistical Modelling past chair, Statistical Modelling Society coordinator, PhD Program in Econ, Businss, Statist ==================================================
Anupam. [[alternative HTML version deleted]]