An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20111202/14da5882/attachment.pl>
simple lm question
7 messages · Worik R, R. Michael Weylandt, David Winsemius
On Dec 1, 2011, at 10:50 PM, Worik R wrote:
I really would like to be able to read about this in a document but I
cannot find my way around the documentation properly
Given the code...
M <- matrix(runif(5*20), nrow=20)
colnames(M) <- c('a', 'b', 'c', 'd', 'e')
ind <- c(1,2,3,4)
dep <- 5
I can then do...
l2 <- lm(M[,dep]~M[,ind]) ## Clearly not useful!
summary(l2)
I am not sure what my regression formula is.
The results are (edited for brevity)
summary(l2)$coefficients
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.63842 0.16036 3.981 0.00120 **
M[, ind]a -0.02912 0.17566 -0.166 0.87054
M[, ind]b -0.21172 0.19665 -1.077 0.29865
M[, ind]c 0.00752 0.18551 0.041 0.96820
M[, ind]d 0.06357 0.18337 0.347 0.73366
Is there some way I can do this so the coefficients have better
names ('a'
through 'd')?
Use `lm` the way it is designed to be used, with a data argument:
> l2 <- lm(e~. , data=as.data.frame(M))
> summary(l2)
Call:
lm(formula = e ~ ., data = as.data.frame(M))
Residuals:
Min 1Q Median 3Q Max
-0.5558 -0.2396 0.1257 0.2213 0.4586
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.41110 0.25484 1.613 0.128
a -0.03258 0.28375 -0.115 0.910
b 0.09088 0.25971 0.350 0.731
c 0.09382 0.29555 0.317 0.755
d 0.14725 0.33956 0.434 0.671
Residual standard error: 0.3317 on 15 degrees of freedom
Multiple R-squared: 0.04667, Adjusted R-squared: -0.2076
F-statistic: 0.1836 on 4 and 15 DF, p-value: 0.9433
David. > As for what I am doing it is not... > > l3 <- lm(M[,1]+M[,2]+M[,3]+M[,4]~M[,5]) > > or > > l4 <- lm(M[,1]*M[,2]*M[,3]*M[,4]~M[,5]) > > cheers > W > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD West Hartford, CT
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20111203/213b6a94/attachment.pl>
In your code by supplying a vector M[,"e"] you are regressing "e" against all the variables provided in the data argument, including "e" itself -- this gives the very strange regression coefficients you observe. R has no way to know that that's somehow related to the "e" it sees in the data argument. In the suggested way, lm(formula = e ~ ., data = as.data.frame(M)) e is regressed against everything that is not e and sensible results are given. Michael
On Fri, Dec 2, 2011 at 11:03 PM, Worik R <worikr at gmail.com> wrote:
Use `lm` the way it is designed to be used, with a data argument:
l2 <- lm(e~. , data=as.data.frame(M)) summary(l2)
Call: lm(formula = e ~ ., data = as.data.frame(M))
And what is the regression being done in this case? ?How are the independent ?variables used? It looks like M[,5]~M[,1]+M[,2]+M[,3]+M[,4] as those are the coefficients. ? But the results are different when I do that explicitly:
M <- matrix(runif(5*20), nrow=20)
colnames(M) <- c('a', 'b', 'c', 'd', 'e')
l1 <- lm(df[,'e']~., data=df)
summary(l1)
Call: lm(formula = df[, "e"] ~ ., data = df) Residuals: ? ? ? Min ? ? ? ? 1Q ? ? Median ? ? ? ? 3Q ? ? ? ?Max -9.580e-17 -3.360e-17 -8.596e-18 ?9.114e-18 ?2.032e-16 Coefficients: ? ? ? ? ? ? ?Estimate Std. Error ? ?t value Pr(>|t|) (Intercept) -7.505e-17 ?7.158e-17 -1.048e+00 ? ?0.312 a ? ? ? ? ? -1.653e-17 ?7.117e-17 -2.320e-01 ? ?0.820 b ? ? ? ? ? -5.042e-17 ?5.480e-17 -9.200e-01 ? ?0.373 c ? ? ? ? ? ?4.236e-17 ?5.774e-17 ?7.340e-01 ? ?0.475 d ? ? ? ? ? -3.878e-17 ?4.946e-17 -7.840e-01 ? ?0.446 e ? ? ? ? ? ?1.000e+00 ?6.083e-17 ?1.644e+16 ? <2e-16 *** --- Signif. codes: ?0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1 Residual standard error: 6.763e-17 on 14 degrees of freedom Multiple R-squared: ? ? 1, ? ?Adjusted R-squared: ? ? 1 F-statistic: 6.435e+31 on 5 and 14 DF, ?p-value: < 2.2e-16
l3 <- lm(M[,5]~M[,1]+M[,2]+M[,3]+M[,4]) summary(l3)
Call: lm(formula = M[, 5] ~ M[, 1] + M[, 2] + M[, 3] + M[, 4]) Residuals: ? ? Min ? ? ? 1Q ? Median ? ? ? 3Q ? ? ?Max -0.49398 -0.14203 ?0.01588 ?0.14157 ?0.31335 Coefficients: ? ? ? ? ? ?Estimate Std. Error t value Pr(>|t|) (Intercept) ? 0.6681 ? ? 0.1859 ? 3.594 ?0.00266 ** M[, 1] ? ? ? -0.1767 ? ? 0.2419 ?-0.730 ?0.47644 M[, 2] ? ? ? -0.3874 ? ? 0.2135 ?-1.814 ?0.08970 . M[, 3] ? ? ? ?0.3695 ? ? 0.2180 ? 1.695 ?0.11078 M[, 4] ? ? ? ?0.1361 ? ? 0.2366 ? 0.575 ?0.57360 --- Signif. codes: ?0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1 Residual standard error: 0.2449 on 15 degrees of freedom Multiple R-squared: 0.2988, ? ?Adjusted R-squared: 0.1119 F-statistic: 1.598 on 4 and 15 DF, ?p-value: 0.2261 cheers Worik ? ? ? ?[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20111203/f7923bce/attachment.pl>
On Dec 2, 2011, at 11:20 PM, Worik R wrote:
Duh! Silly me! But my confusion persits: What is the regression being done? See below....
<Sigh> Please note that your "df" and "M" are undoubtedly different
objects by now:
> M <- matrix(runif(5*20), nrow=20)
> colnames(M) <- c('a', 'b', 'c', 'd', 'e')
> l1 <- lm(e~., data=as.data.frame(M))
> l1
Call:
lm(formula = e ~ ., data = as.data.frame(M))
Coefficients:
(Intercept) a b c d
0.40139 -0.15032 -0.06242 0.13139 0.23905
> l3 <- lm(M[,5]~M[,1]+M[,2]+M[,3]+M[,4])
> l3
Call:
lm(formula = M[, 5] ~ M[, 1] + M[, 2] + M[, 3] + M[, 4])
Coefficients:
(Intercept) M[, 1] M[, 2] M[, 3] M[, 4]
0.40139 -0.15032 -0.06242 0.13139 0.23905
As expected.
David.
>
> On Sat, Dec 3, 2011 at 5:10 PM, R. Michael Weylandt <
> michael.weylandt at gmail.com> wrote:
>
>> In your code by supplying a vector M[,"e"] you are regressing "e"
>> against all the variables provided in the data argument, including
>> "e"
>> itself -- this gives the very strange regression coefficients you
>> observe. R has no way to know that that's somehow related to the "e"
>> it sees in the data argument.
>>
>
>> In the suggested way,
>>
>> lm(formula = e ~ ., data = as.data.frame(M))
>>
>> e is regressed against everything that is not e and sensible
>> results are
>> given.
>>
>
> But still 'l1 <- lm(e~., data=df)' is not the same as 'l3 <-
> lm(M[,5]~M[,1]+M[,2]+M[,3]+M[,4])'
>
>> M <- matrix(runif(5*20), nrow=20)
>> colnames(M) <- c('a', 'b', 'c', 'd', 'e')
>> l1 <- lm(e~., data=df)
>> summary(l1)
>
> Call:
> lm(formula = e ~ ., data = df)
>
> Residuals:
> Min 1Q Median 3Q Max
> -0.38343 -0.21367 0.03067 0.13757 0.49080
>
> Coefficients:
> Estimate Std. Error t value Pr(>|t|)
> (Intercept) 0.28521 0.29477 0.968 0.349
> a 0.09283 0.30112 0.308 0.762
> b 0.23921 0.22425 1.067 0.303
> c -0.16027 0.24154 -0.664 0.517
> d 0.24025 0.20054 1.198 0.250
>
> Residual standard error: 0.2871 on 15 degrees of freedom
> Multiple R-squared: 0.1602, Adjusted R-squared: -0.06375
> F-statistic: 0.7153 on 4 and 15 DF, p-value: 0.5943
>
>> l3 <- lm(M[,5]~M[,1]+M[,2]+M[,3]+M[,4])
>> summary(l3)
>
> Call:
> lm(formula = M[, 5] ~ M[, 1] + M[, 2] + M[, 3] + M[, 4])
>
> Residuals:
> Min 1Q Median 3Q Max
> -0.36355 -0.22679 -0.01202 0.18462 0.37377
>
> Coefficients:
> Estimate Std. Error t value Pr(>|t|)
> (Intercept) 0.76972 0.24501 3.142 0.00672 **
> M[, 1] -0.23830 0.24123 -0.988 0.33890
> M[, 2] -0.02046 0.21958 -0.093 0.92699
> M[, 3] -0.29518 0.22559 -1.308 0.21040
> M[, 4] -0.31545 0.24570 -1.284 0.21866
> ---
> Signif. codes: 0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1
>
> Residual standard error: 0.2668 on 15 degrees of freedom
> Multiple R-squared: 0.2762, Adjusted R-squared: 0.08317
> F-statistic: 1.431 on 4 and 15 DF, p-value: 0.272
>
>>
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
David Winsemius, MD
West Hartford, CT
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20111203/dcee9499/attachment.pl>