Skip to content

when to use "I", "as is" caret

3 messages · agent dunham, Uwe Ligges, David Winsemius

#
Dear community, 

I've check it while working, but just to reassure myself.  Let's say we have
2 models: 

model1 <-  lm(vdep ~ log(v1) + v2 + v3 + I(v4^2) , data = mydata)
model2 <-   lm(vdep ~ log(v1) + v2 + v3 + v4^2, data = mydata)

So in model1 you really square v4; and in model2,  v4*^2 *doesn't do
anything, does it? Model2 could be rewritten:
model2b <-   lm(vdep ~ log(v1) + v2 + v3 + v4, data = mydata) and nothing
changes, doesn't it?

This "I" caret is essential with powering or when including transformations
as I(1/(v2+v3)) but not with log transformation, isn't it?. Is there any
other transformation where I muss use also this "I", as is caret?

Thanks in advance, 
user at host.com



--
View this message in context: http://r.789695.n4.nabble.com/when-to-use-I-as-is-caret-tp4643113.html
Sent from the R help mailing list archive at Nabble.com.
#
On 14.09.2012 09:41, agent dunham wrote:
You need it whenever you are using operators with a special meaning in 
formulas such as "+", "-", "*", "/", "|", "^", ":" etc.

v4^2 means: Take the variables v4 and all their two-way interactions. 
Since v4 is singular, there are no two-way interactiosn available and it 
is not changed.

Best,
Uwe Ligges
#
On Sep 14, 2012, at 12:41 AM, agent dunham wrote:

            
If you want to create a second degree polynomial for "proper" statisical inference via a formula, the way forward is:

?poly
model1 <-  lm(vdep ~ log(v1) + v2 + v3 + poly(v4,2) , data = mydata)

You will get orthogonal polynomials, which are different than most people's naive expectations, but they do allow your to fairly assess departures from linearity.

It's interesting to compare two methods with the cars dataset:

Proper use of poly():
Call:
lm(formula = dist ~ poly(speed, 2), data = cars)

Residuals:
    Min      1Q  Median      3Q     Max 
-28.720  -9.184  -3.188   4.628  45.152 

Coefficients:
                Estimate Std. Error t value Pr(>|t|)    
(Intercept)       42.980      2.146  20.026  < 2e-16 ***
poly(speed, 2)1  145.552     15.176   9.591 1.21e-12 ***
poly(speed, 2)2   22.996     15.176   1.515    0.136    
---
Signif. codes:  0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1 

Residual standard error: 15.18 on 47 degrees of freedom
Multiple R-squared: 0.6673,	Adjusted R-squared: 0.6532 
F-statistic: 47.14 on 2 and 47 DF,  p-value: 5.852e-12 

Improper use of linear and "I-quadratic":
Call:
lm(formula = dist ~ speed + I(speed^2), data = cars)

Residuals:
    Min      1Q  Median      3Q     Max 
-28.720  -9.184  -3.188   4.628  45.152 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)
(Intercept)  2.47014   14.81716   0.167    0.868
speed        0.91329    2.03422   0.449    0.656
I(speed^2)   0.09996    0.06597   1.515    0.136

Residual standard error: 15.18 on 47 degrees of freedom
Multiple R-squared: 0.6673,	Adjusted R-squared: 0.6532 
F-statistic: 47.14 on 2 and 47 DF,  p-value: 5.852e-12 

#---------

If you wanted the same results as you would get from I(v4^2) and you were using poly() it would look like :

(z <- poly(1:10, 2, raw=TRUE)[,2])
 [1]   1   4   9  16  25  36  49  64  81 100

I didn't know off whether one could use the raw-poly column within a formula for lm but it seems to work as I expected:
Call:
lm(formula = dist ~ I(speed^2), data = cars)

Coefficients:
(Intercept)   I(speed^2)  
      8.860        0.129
Call:
lm(formula = dist ~ poly(speed, 2, raw = TRUE)[, 2], data = cars)

Coefficients:
                    (Intercept)  poly(speed, 2, raw = TRUE)[, 2]  
                          8.860                            0.129  


(And Uwe's answer covers the rest.)
David Winsemius, MD
Alameda, CA, USA