Dear community, I've check it while working, but just to reassure myself. Let's say we have 2 models: model1 <- lm(vdep ~ log(v1) + v2 + v3 + I(v4^2) , data = mydata) model2 <- lm(vdep ~ log(v1) + v2 + v3 + v4^2, data = mydata) So in model1 you really square v4; and in model2, v4*^2 *doesn't do anything, does it? Model2 could be rewritten: model2b <- lm(vdep ~ log(v1) + v2 + v3 + v4, data = mydata) and nothing changes, doesn't it? This "I" caret is essential with powering or when including transformations as I(1/(v2+v3)) but not with log transformation, isn't it?. Is there any other transformation where I muss use also this "I", as is caret? Thanks in advance, user at host.com -- View this message in context: http://r.789695.n4.nabble.com/when-to-use-I-as-is-caret-tp4643113.html Sent from the R help mailing list archive at Nabble.com.
when to use "I", "as is" caret
3 messages · agent dunham, Uwe Ligges, David Winsemius
On 14.09.2012 09:41, agent dunham wrote:
Dear community, I've check it while working, but just to reassure myself. Let's say we have 2 models: model1 <- lm(vdep ~ log(v1) + v2 + v3 + I(v4^2) , data = mydata) model2 <- lm(vdep ~ log(v1) + v2 + v3 + v4^2, data = mydata) So in model1 you really square v4; and in model2, v4*^2 *doesn't do anything, does it? Model2 could be rewritten: model2b <- lm(vdep ~ log(v1) + v2 + v3 + v4, data = mydata) and nothing changes, doesn't it? This "I" caret is essential with powering or when including transformations as I(1/(v2+v3)) but not with log transformation, isn't it?. Is there any other transformation where I muss use also this "I", as is caret?
You need it whenever you are using operators with a special meaning in formulas such as "+", "-", "*", "/", "|", "^", ":" etc. v4^2 means: Take the variables v4 and all their two-way interactions. Since v4 is singular, there are no two-way interactiosn available and it is not changed. Best, Uwe Ligges
Thanks in advance, user at host.com -- View this message in context: http://r.789695.n4.nabble.com/when-to-use-I-as-is-caret-tp4643113.html Sent from the R help mailing list archive at Nabble.com.
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
On Sep 14, 2012, at 12:41 AM, agent dunham wrote:
Dear community, I've check it while working, but just to reassure myself. Let's say we have 2 models: model1 <- lm(vdep ~ log(v1) + v2 + v3 + I(v4^2) , data = mydata)
If you want to create a second degree polynomial for "proper" statisical inference via a formula, the way forward is: ?poly model1 <- lm(vdep ~ log(v1) + v2 + v3 + poly(v4,2) , data = mydata) You will get orthogonal polynomials, which are different than most people's naive expectations, but they do allow your to fairly assess departures from linearity. It's interesting to compare two methods with the cars dataset: Proper use of poly():
fm <- lm(dist ~ poly(speed, 2), data = cars) summary(fm)
Call:
lm(formula = dist ~ poly(speed, 2), data = cars)
Residuals:
Min 1Q Median 3Q Max
-28.720 -9.184 -3.188 4.628 45.152
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 42.980 2.146 20.026 < 2e-16 ***
poly(speed, 2)1 145.552 15.176 9.591 1.21e-12 ***
poly(speed, 2)2 22.996 15.176 1.515 0.136
---
Signif. codes: 0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1
Residual standard error: 15.18 on 47 degrees of freedom
Multiple R-squared: 0.6673, Adjusted R-squared: 0.6532
F-statistic: 47.14 on 2 and 47 DF, p-value: 5.852e-12
Improper use of linear and "I-quadratic":
fm2 <- lm(dist ~ speed+I(speed^2), data = cars) summary(fm2)
Call:
lm(formula = dist ~ speed + I(speed^2), data = cars)
Residuals:
Min 1Q Median 3Q Max
-28.720 -9.184 -3.188 4.628 45.152
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 2.47014 14.81716 0.167 0.868
speed 0.91329 2.03422 0.449 0.656
I(speed^2) 0.09996 0.06597 1.515 0.136
Residual standard error: 15.18 on 47 degrees of freedom
Multiple R-squared: 0.6673, Adjusted R-squared: 0.6532
F-statistic: 47.14 on 2 and 47 DF, p-value: 5.852e-12
#---------
If you wanted the same results as you would get from I(v4^2) and you were using poly() it would look like :
(z <- poly(1:10, 2, raw=TRUE)[,2])
[1] 1 4 9 16 25 36 49 64 81 100
I didn't know off whether one could use the raw-poly column within a formula for lm but it seems to work as I expected:
fm <- lm(dist ~ I(speed^2), data = cars) fm
Call:
lm(formula = dist ~ I(speed^2), data = cars)
Coefficients:
(Intercept) I(speed^2)
8.860 0.129
fm <- lm(dist ~ poly(speed, 2, raw=TRUE)[,2], data = cars) fm
Call:
lm(formula = dist ~ poly(speed, 2, raw = TRUE)[, 2], data = cars)
Coefficients:
(Intercept) poly(speed, 2, raw = TRUE)[, 2]
8.860 0.129
(And Uwe's answer covers the rest.)
model2 <- lm(vdep ~ log(v1) + v2 + v3 + v4^2, data = mydata) So in model1 you really square v4; and in model2, v4*^2 *doesn't do anything, does it? Model2 could be rewritten: model2b <- lm(vdep ~ log(v1) + v2 + v3 + v4, data = mydata) and nothing changes, doesn't it?
This "I" caret is essential with powering or when including transformations as I(1/(v2+v3)) but not with log transformation, isn't it?. Is there any other transformation where I muss use also this "I", as is caret?
David Winsemius, MD Alameda, CA, USA