natural splines
On Thu, 8 May 2003 15:07:56 +0100 (BST)
iwhite at staffmail.ed.ac.uk wrote:
Apologies if this is this too obscure for R-help. In package splines, ns(x,,knots,intercept=TRUE) produces an n by K+2 matrix N, the values of K+2 basis functions for the natural splines with K (internal) knots, evaluated at x. It does this by first generating an n by K+4 matrix B of unconstrained splines, then postmultiplying B by H, a K+4 by K+2 representation of the nullspace of C (2 by K+4), which contains the 2nd derivatives of the unconstrained splines evaluated at the boundary knots. E.g. see Hastie and Tibshirani, Generalized Additive Models, exercise 2.5, p36. The QR decomposition is used to get H. This can produce basis functions which, while technically correct (they span the K+2 dim space of natural splines), can be counterintuitive. E.g. equally spaced knots symmetrically placed between the data extremes can produce very asymmetric arrangements, with N(K+2) not the mirror image of N(1), for example, and considerable loss of sparseness. This approach works for any basis B, but for B-splines, the second derivatives are zero for all the unconstrained basis functions, apart from 3 at each end. All that is required is to combine these 3 so that the contributions to the 2nd derivatives cancel. In other words, we only need to find the null space of two 2 by 3 matrices, rather than a 2 by K+4. If the left-most internal knots are t(1) and t(2), and the left-hand boundary knot is t(0), we can replace B(1...3) with B(1)+B(2)+B(3) and [t(2)-t(0)]*B(2) + [t(1)+t(2)-2*t(0)]*B(1), (for example), and similarly at the right-hand end. This seems simpler and more elegant than brute force QR on the full matrix of derivatives. But I may have missed some reason why it can't be used. Perhaps it doesn't work when intercept=FALSE? ====================================== I.White ICAPB, University of Edinburgh Ashworth Laboratories, West Mains Road Edinburgh EH9 3JT Fax: 0131 650 6564 Tel: 0131 650 5490 E-mail: iwhite at staffmail.ed.ac.uk
______________________________________________ R-help at stat.math.ethz.ch mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
This is also a vote for using the truncated power basis which is extremely simple and exceptionally fast for large datasets (see the rcspline.eval function in the Hmisc package). With modern matrix arithmetic (as in S), the collinearity of the bases produced by these simple regression splines is a moot point. --- Frank E Harrell Jr Prof. of Biostatistics & Statistics Div. of Biostatistics & Epidem. Dept. of Health Evaluation Sciences U. Virginia School of Medicine http://hesweb1.med.virginia.edu/biostat