Specify a correct formula in R for Piecewise Linear Functions?
On Thu, 3 Jan 2008, zhijie zhang wrote:
Some developments with confusions. I tried the spline method and dummy variable approach to do it. But their results are very different. See following.
[volumes of output and gratuitous SAS code deleted]
Q1: Why are these two methods so different for the results, e.g. the coefficients?
For the same reason that Thomas replied to my email suggesting a different approach than the one I showed you. viz. The spline basis differs from the basis vectors he constructed.
Q2: The spline method is useful for piecewise linear functions, e.g. bs(distance_trans,degree=1,knots=c(13,25)), but how should i do if i want to fit a linear function for the case the distance_trans<13,and quadratic curve when distance_trans>=13? "bs(distance_trans,degree=c(1,2),knots=13)" cannot works. And even for more than three parts. <13,13~25, >25.
Whew! My response would be "don't go there". Fit a richer basis than you need and use penalization to damp out unneeded variation in the fit. Or use GAMs. But if you feel you must, you can construct things like bs( pmax( 13, pmin( 25 , x ) )
Q3:"fit <- glm( y ~ pmax(x,20)+pmin(x,20), family=binomial)" is good. But if i divide x into three or more parts, how should i specify it in this way?
As above.
Hope somone can help.Thanks a lot.
You can help youself a lot by taking a few minutes to learn to do in R what you did in SAS. Reading the help pages AND running the examples is often illuminating. For example, example( pmin ) should give you some helpful hints. HTH, Chuck
On Jan 2, 2008 11:58 PM, Thomas Lumley <tlumley at u.washington.edu> wrote:
On Tue, 1 Jan 2008, Charles C. Berry wrote:
On Tue, 1 Jan 2008, zhijie zhang wrote:
Dear all, I have two variables, y and x. It seems that the relationship between
them
is Piecewise Linear Functions. The cutpoint is 20. That is, when x<20,
there
is a linear relationship between y and x; while x>=20, there is another different linear relationship between them. How can i specify their relationships in R correctly? # glm(y~I(x<20)+I(x>=20),family = binomial, data = point) something
like
this?
Try this:
library(splines) fit <- glm( y ~ bs( x, deg=1, knots=20 ), family=binomial)
In the linear case I would actually argue that there is a benefit from
constructing the spline basis by hand, so that you know what the
coefficients mean. (For quadratic and higher order splines I agree that
pre-existing code for the B-spline basis makes a lot more sense).
For example, in
fit <- glm( y ~ pmax(x,20)+pmin(x,20), family=binomial)
the coefficients are the slope when is < 20 and the slope when x>20.
-thomas
Thomas Lumley Assoc. Professor, Biostatistics
tlumley at u.washington.edu University of Washington, Seattle
-- With Kind Regards, oooO::::::::: (..)::::::::: :\.(:::Oooo:: ::\_)::(..):: :::::::)./::: ::::::(_/:::: ::::::::::::: [***********************************************************************] Zhi Jie,Zhang ,PHD Tel:+86-21-54237149 Dept. of Epidemiology,School of Public Health,Fudan University Address:No. 138 Yi Xue Yuan Road,Shanghai,China Postcode:200032 Email:epistat at gmail.com Website: www.statABC.com [***********************************************************************] oooO::::::::: (..)::::::::: :\.(:::Oooo:: ::\_)::(..):: :::::::)./::: ::::::(_/:::: :::::::::::::
Charles C. Berry (858) 534-2098
Dept of Family/Preventive Medicine
E mailto:cberry at tajo.ucsd.edu UC San Diego
http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego 92093-0901