In reference to [1], how would you solve the following regression problem: Given observations (X_i,Y_i) with known respective error distributions (e_X_i,e_Y_i) (say, 0-mean Gaussian with known STD), find the parameters a and b which maximize the Likelihood of Y = a*X + b Taking the example further, how many of the very simplified assumptions from the above example can be lifted or eased and R still has a method for finding an errors-in-variables fit?
Errors-In-Variables in R
7 messages · Rui Barradas, Cedric Sodhi, R. Michael Weylandt +2 more
There's a no homework policy in R-help. Rui Barradas Em 02-03-2013 18:28, Cedric Sodhi escreveu:
In reference to [1], how would you solve the following regression problem: Given observations (X_i,Y_i) with known respective error distributions (e_X_i,e_Y_i) (say, 0-mean Gaussian with known STD), find the parameters a and b which maximize the Likelihood of Y = a*X + b Taking the example further, how many of the very simplified assumptions from the above example can be lifted or eased and R still has a method for finding an errors-in-variables fit?
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Perhaps it would have been clearer that this is no homework if I hadn't forgotten to say what [1] is. Sorry for that. [1] https://bugs.r-project.org/bugzilla3/show_bug.cgi?id=15225 (This is no homework but genuinely adresses the problem that R to my knowledge does not have models for error in variables)
On Sat, Mar 02, 2013 at 09:34:21PM +0000, Rui Barradas wrote:
There's a no homework policy in R-help. Rui Barradas Em 02-03-2013 18:28, Cedric Sodhi escreveu:
In reference to [1], how would you solve the following regression problem: Given observations (X_i,Y_i) with known respective error distributions (e_X_i,e_Y_i) (say, 0-mean Gaussian with known STD), find the parameters a and b which maximize the Likelihood of Y = a*X + b Taking the example further, how many of the very simplified assumptions from the above example can be lifted or eased and R still has a method for finding an errors-in-variables fit?
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Based on your comments in the (not-a-)bug report, I *think* this might help: quanttrader.info/public/betterHedgeRatios.pdf or more generally, the idea of total least squares regression. Cheers, MW
On Sat, Mar 2, 2013 at 9:55 PM, Cedric Sodhi <manday at gmx.net> wrote:
Perhaps it would have been clearer that this is no homework if I hadn't forgotten to say what [1] is. Sorry for that. [1] https://bugs.r-project.org/bugzilla3/show_bug.cgi?id=15225 (This is no homework but genuinely adresses the problem that R to my knowledge does not have models for error in variables) On Sat, Mar 02, 2013 at 09:34:21PM +0000, Rui Barradas wrote:
There's a no homework policy in R-help. Rui Barradas Em 02-03-2013 18:28, Cedric Sodhi escreveu:
In reference to [1], how would you solve the following regression problem: Given observations (X_i,Y_i) with known respective error distributions (e_X_i,e_Y_i) (say, 0-mean Gaussian with known STD), find the parameters a and b which maximize the Likelihood of Y = a*X + b Taking the example further, how many of the very simplified assumptions from the above example can be lifted or eased and R still has a method for finding an errors-in-variables fit?
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Hello,
Like you say, apparently R doesn't have models for error in variables.
But R packages might have.
library(sos)
findFn('errors-in-variables')
Some look promising. Hope you find something.
Rui Barradas
Em 02-03-2013 21:55, Cedric Sodhi escreveu:
Perhaps it would have been clearer that this is no homework if I hadn't forgotten to say what [1] is. Sorry for that. [1] https://bugs.r-project.org/bugzilla3/show_bug.cgi?id=15225 (This is no homework but genuinely adresses the problem that R to my knowledge does not have models for error in variables) On Sat, Mar 02, 2013 at 09:34:21PM +0000, Rui Barradas wrote:
There's a no homework policy in R-help. Rui Barradas Em 02-03-2013 18:28, Cedric Sodhi escreveu:
In reference to [1], how would you solve the following regression problem: Given observations (X_i,Y_i) with known respective error distributions (e_X_i,e_Y_i) (say, 0-mean Gaussian with known STD), find the parameters a and b which maximize the Likelihood of Y = a*X + b Taking the example further, how many of the very simplified assumptions from the above example can be lifted or eased and R still has a method for finding an errors-in-variables fit?
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
On Mar 2, 2013, at 1:55 PM, Cedric Sodhi wrote:
Perhaps it would have been clearer that this is no homework if I hadn't forgotten to say what [1] is. Sorry for that. [1] https://bugs.r-project.org/bugzilla3/show_bug.cgi?id=15225 (This is no homework but genuinely adresses the problem that R to my knowledge does not have models for error in variables)
In addition to searching for "errors in variables" you should also be searching for "deming regression", 'orthogonal regression", "total least squares regression", and "measurement error models" Here are a few links to get you started: http://markmail.org/message/4mo62jqfyudrchzi?q=list:org%2Er-project%2Er-help+deming+orthogonal http://markmail.org/message/htlptlcccunsd5mm?q=list:org%2Er-project%2Er-help+deming+orthogonal http://markmail.org/message/zhogz6337m3ofl7d?q=list:org%2Er-project%2Er-help+deming+orthogonal
David. > > On Sat, Mar 02, 2013 at 09:34:21PM +0000, Rui Barradas wrote: >> There's a no homework policy in R-help. >> >> Rui Barradas >> >> Em 02-03-2013 18:28, Cedric Sodhi escreveu: >>> In reference to [1], how would you solve the following regression >>> problem: >>> >>> Given observations (X_i,Y_i) with known respective error distributions >>> (e_X_i,e_Y_i) (say, 0-mean Gaussian with known STD), find the parameters >>> a and b which maximize the Likelihood of >>> >>> Y = a*X + b >>> >>> Taking the example further, how many of the very simplified assumptions >>> from the above example can be lifted or eased and R still has a method >>> for finding an errors-in-variables fit? >>> >>> ______________________________________________ >>> R-help at r-project.org mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. >>> > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. David Winsemius Alameda, CA, USA
Dear Cedric, If I understand correctly what you want to do, and if you're willing to assume that the variables are normally distributed, then you should be able to use any of the latent-variable structural-equation-modeling packages in R, such as sem, OpenMX, or lavaan. Here's an artificial example using the sem package: ------------ snip ----------
set.seed(12345) zeta <- rnorm(1000) y <- 1 + 2*zeta + rnorm(1000, 0, 1) x <- zeta + rnorm(1000) plot(x, y) Data <- data.frame(x, y) summary(lm(y ~ x)) # biased
Call:
lm(formula = y ~ x)
Residuals:
Min 1Q Median 3Q Max
-6.6339 -1.1406 0.0299 1.1573 6.5652
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.04007 0.05514 18.86 <2e-16 ***
x 1.06089 0.04012 26.44 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 1.743 on 998 degrees of freedom
Multiple R-squared: 0.4119, Adjusted R-squared: 0.4113
F-statistic: 699.1 on 1 and 998 DF, p-value: < 2.2e-16
plot(x, y) # not shown library(sem) eqns <- specifyEquations()
1: y = alpha*Intercept + beta*zeta 2: x = 1*zeta 3: V(y) = sigma 4: V(x) = 1 5: V(zeta) = phi 6: Read 5 items
model <- sem(eqns, data=Data, raw=TRUE, fixed.x="Intercept") summary(model)
Model fit to raw moment matrix.
Model Chisquare = 0.2264654 Df = 1 Pr(>Chisq) = 0.6341572
AIC = 8.226465
BIC = -6.68129
Normalized Residuals
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.0000 0.1635 0.1711 0.2189 0.2564 0.4759
Parameter Estimates
Estimate Std Error z value Pr(>|z|)
alpha 1.0400668 0.05507397 18.884905 1.518098e-79 y <--- Intercept
beta 2.2553406 0.14197058 15.885971 7.926103e-57 y <--- zeta
sigma 0.6404697 0.25612060 2.500657 1.239632e-02 y <--> y
phi 0.8881856 0.08444223 10.518263 7.117323e-26 zeta <--> zeta
Iterations = 15
library(car)
linearHypothesis(model, c("alpha = 1", "beta = 2", "sigma = 1", "phi =
1")) # true parameter values Linear hypothesis test Hypothesis: alpha = 1 beta = 2 sigma = 1 phi = 1 Model 1: restricted model Model 2: model Res.Df Df Chisq Pr(>Chisq) 1 5 2 1 4 3.8285 0.4297 ------------ snip ---------- For other distributional assumptions, you'd have to write your own objective function but you may still be able to use sem or one of the other SEM packages. I hope this helps, John ----------------------------------------------- John Fox Senator McMaster Professor of Social Statistics Department of Sociology McMaster University Hamilton, Ontario, Canada
-----Original Message----- From: r-help-bounces at r-project.org [mailto:r-help-bounces at r- project.org] On Behalf Of Cedric Sodhi Sent: Saturday, March 02, 2013 4:56 PM To: Rui Barradas Cc: r-help at r-project.org Subject: Re: [R] Errors-In-Variables in R Perhaps it would have been clearer that this is no homework if I hadn't forgotten to say what [1] is. Sorry for that. [1] https://bugs.r-project.org/bugzilla3/show_bug.cgi?id=15225 (This is no homework but genuinely adresses the problem that R to my knowledge does not have models for error in variables) On Sat, Mar 02, 2013 at 09:34:21PM +0000, Rui Barradas wrote:
There's a no homework policy in R-help. Rui Barradas Em 02-03-2013 18:28, Cedric Sodhi escreveu:
In reference to [1], how would you solve the following regression problem: Given observations (X_i,Y_i) with known respective error
distributions
(e_X_i,e_Y_i) (say, 0-mean Gaussian with known STD), find the
parameters
a and b which maximize the Likelihood of Y = a*X + b Taking the example further, how many of the very simplified
assumptions
from the above example can be lifted or eased and R still has a
method
for finding an errors-in-variables fit?
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-
guide.html
and provide commented, minimal, self-contained, reproducible code.
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code.