naive "collinear" weighted linear regression
On Nov 11, 2009, at 7:45 PM, Mauricio Calvao wrote:
Hi there Sorry for what may be a naive or dumb question. I have the following data:
x <- c(1,2,3,4) # predictor vector
y <- c(2,4,6,8) # response vector. Notice that it is an exact,
perfect straight line through the origin and slope equal to 2
error <- c(0.3,0.3,0.3,0.3) # I have (equal) ``errors'', for
instance, in the measured responses
Which means those x, y, and "error" figures did not come from an experiment, but rather from theory???
Of course the best fit coefficients should be 0 for the intercept and 2 for the slope. Furthermore, it seems completely plausible (or not?) that, since the y_i have associated non-vanishing ``errors'' (dispersions), there should be corresponding non- vanishing ``errors'' associated to the best fit coefficients, right? When I try:
fit_mod <- lm(y~x,weights=1/error^2)
I get Warning message: In lm.fit(x, y, offset = offset, singular.ok = singular.ok, ...) : extra arguments weigths are just disregarded.
(Actually the weights are for adjusting for sampling, and I do not see any sampling in your "design".)
Keeping on, despite the warning message, which I did not quite understand, when I type:
summary(fit_mod)
I get
Call:
lm(formula = y ~ x, weigths = 1/error^2)
Residuals:
1 2 3 4
-5.067e-17 8.445e-17 -1.689e-17 -1.689e-17
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.000e+00 8.776e-17 0.000e+00 1
x 2.000e+00 3.205e-17 6.241e+16 <2e-16 ***
---
Signif. codes: 0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1
Residual standard error: 7.166e-17 on 2 degrees of freedom
Multiple R-squared: 1, Adjusted R-squared: 1
F-statistic: 3.895e+33 on 1 and 2 DF, p-value: < 2.2e-16
Naively, should not the column Std. Error be different from zero??
What I have in mind, and sure is not what Std. Error means, is that
if I carried out a large simulation, assuming each response y_i a
Gaussian random variable with mean y_i and standard deviation
2*error=0.6, and then making an ordinary least squares fitting of
the slope and intercept, I would end up with a mean for these
simulated coefficients which should be 2 and 0, respectively,
Well, not precisely 2 and 0, but rather something very close ... i.e, within "experimental error". Please note that numbers in the range of 10e-17 are effectively zero from a numerical analysis perspective. http://cran.r-project.org/doc/FAQ/R-FAQ.html#Why-doesn_0027t-R-think-these-numbers-are-equal_003f > .Machine$double.eps ^ 0.5 [1] 1.490116e-08
and, that's the point, a non-vanishing standard deviation for these fitted coefficients, right?? This somehow is what I expected should be an estimate or, at least, a good indicator, of the degree of uncertainty which I should assign to the fitted coefficients; it seems to me these deviations, thus calculated as a result of the simulation, will certainly not be zero (or 3e-17, for that matter). So this Std. Error does not provide what I, naively, think should be given as a measure of the uncertainties or errors in the fitted coefficients...
You are trying to impose an error structure on a data situation that you constructed artificially to be perfect.
What am I not getting right??
That if you input "perfection" into R's linear regression program, you get appropriate warnings?
Thanks and sorry for the naive and non-expert question!
You are a Professor of physics, right? You do experiments, right? You replicate them. S0 perhaps I'm the one who should be puzzled.
-- ####################################### Prof. Mauricio Ortiz Calvao Federal University of Rio de Janeiro Institute of Physics, P O Box 68528 CEP 21941-972 Rio de Janeiro, RJ Brazil
David Winsemius, MD Heritage Laboratories West Hartford, CT