Question about mars() -function
On Dec 26, 2010, at 17:54 , Tiina Hakanen wrote:
Hi! I have some questions about MARS model's coefficient of determination. I use the MARS method in my master's thesis and I have noticed some problems with the MARS model's R^2. You can see the following example that the MARS model's R^2 is too big when i have used mars() -function for MARS model building, and when I have made MARS-model using a linear regression, it gives much smaller R^2. So can you please tell me some information about why the MARS model R^2 is so big? How can I get the MARS model?s correct R^2 in R-projector some another way than in the following example or by calculating it myself using R^2-formula?
This isn't really to do with MARS as such. You have two equivalent linear models, one with and one without an intercept (i.e., the first column m$x1 is the constant 1). R computes the R^2 so that it is consistent with the overall F test, which you can see has three numerator DF in the marsmodel, but only two in the corresponding linear model. Put differently, the null model is zero in one case and a constant in the other. This sometimes catches people out, but without such a convention, no-intercept models could get negative R^2. Pragmatically, if you are sure that the marsmodel will always contain the intercept-only model, does lm(data[,1]~m$x) not provide the desired R^2, with a warning that one parameter is aliased?
I hope you can reply soon.
Best regards,
Tiina Hakanen
library(ElemStatLearn)
library(mda)
data<-ozone
m<-mars(data[,-1], data[,1], nk=4)
m$factor[m$s,]
m$cuts[m$s,]
m$coef
marsmodel<-lm(data[,1]~m$x-1)
summary(marsmodel)
Call:
lm(formula = data[, 1] ~ m$x - 1)
Residuals:
Min 1Q Median 3Q Max
-36.264 -15.993 -2.351 9.993 122.793
Coefficients:
Estimate Std. Error t value Pr(>|t|)
m$x1 52.9783 3.8894 13.621 < 2e-16 ***
m$x2 4.7383 0.9599 4.936 2.92e-06 ***
m$x3 -1.9428 0.3084 -6.300 6.61e-09 ***
---
Signif. codes: 0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1
Residual standard error: 23.38 on 108 degrees of freedom
Multiple R-squared: 0.8147, Adjusted R-squared: 0.8095
F-statistic: 158.2 on 3 and 108 DF, p-value: < 2.2e-16
knot1 <- function (x,k) ifelse(x > k, x-k, 0)
knot2 <- function(x, k) ifelse(x < k, k-x, 0)
reg <- lm(ozone ~knot1(temperature,85)+knot2(temperature,85),data=data)
summary(reg)
Call:
lm(formula = ozone ~ knot1(temperature, 85) + knot2(temperature,
85), data = data)
Residuals:
Min 1Q Median 3Q Max
-36.264 -15.993 -2.351 9.993 122.793
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 52.9783 3.8894 13.621 < 2e-16 ***
knot1(temperature, 85) 4.7383 0.9599 4.936 2.92e-06 ***
knot2(temperature, 85) -1.9428 0.3084 -6.300 6.61e-09 ***
---
Signif. codes: 0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1
Residual standard error: 23.38 on 108 degrees of freedom
Multiple R-squared: 0.5153, Adjusted R-squared: 0.5064
F-statistic: 57.42 on 2 and 108 DF, p-value: < 2.2e-16
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Peter Dalgaard Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com