Skip to content

Errors-In-Variables in R

7 messages · Rui Barradas, Cedric Sodhi, R. Michael Weylandt +2 more

#
In reference to [1], how would you solve the following regression
problem:

Given observations (X_i,Y_i) with known respective error distributions
(e_X_i,e_Y_i) (say, 0-mean Gaussian with known STD), find the parameters
a and b which maximize the Likelihood of

Y = a*X + b

Taking the example further, how many of the very simplified assumptions
from the above example can be lifted or eased and R still has a method
for finding an errors-in-variables fit?
#
There's a no homework policy in R-help.

Rui Barradas

Em 02-03-2013 18:28, Cedric Sodhi escreveu:
#
Perhaps it would have been clearer that this is no homework if I
hadn't forgotten to say what [1] is. Sorry for that.

[1] https://bugs.r-project.org/bugzilla3/show_bug.cgi?id=15225

(This is no homework but genuinely adresses the problem that R to my
knowledge does not have models for error in variables)
On Sat, Mar 02, 2013 at 09:34:21PM +0000, Rui Barradas wrote:
#
Based on your comments in the (not-a-)bug report, I *think* this might help:

quanttrader.info/public/betterHedgeRatios.pdf

or more generally, the idea of total least squares regression.

Cheers,
MW
On Sat, Mar 2, 2013 at 9:55 PM, Cedric Sodhi <manday at gmx.net> wrote:
#
Hello,

Like you say, apparently R doesn't have models for error in variables.
But R packages might have.

library(sos)
findFn('errors-in-variables')

Some look promising. Hope you find something.

Rui Barradas

Em 02-03-2013 21:55, Cedric Sodhi escreveu:
#
On Mar 2, 2013, at 1:55 PM, Cedric Sodhi wrote:

            
In addition to searching for "errors in variables" you should also be searching for "deming regression", 'orthogonal regression", "total least squares regression", and "measurement error models"

Here are a few links to get you started:


http://markmail.org/message/4mo62jqfyudrchzi?q=list:org%2Er-project%2Er-help+deming+orthogonal
http://markmail.org/message/htlptlcccunsd5mm?q=list:org%2Er-project%2Er-help+deming+orthogonal
http://markmail.org/message/zhogz6337m3ofl7d?q=list:org%2Er-project%2Er-help+deming+orthogonal
#
Dear Cedric,

If I understand correctly what you want to do, and if you're willing to
assume that the variables are normally distributed, then you should be able
to use any of the latent-variable structural-equation-modeling packages in
R, such as sem, OpenMX, or lavaan. 

Here's an artificial example using the sem package:

------------ snip ----------
Call:
lm(formula = y ~ x)

Residuals:
    Min      1Q  Median      3Q     Max 
-6.6339 -1.1406  0.0299  1.1573  6.5652 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  1.04007    0.05514   18.86   <2e-16 ***
x            1.06089    0.04012   26.44   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 1.743 on 998 degrees of freedom
Multiple R-squared:  0.4119,	Adjusted R-squared:  0.4113 
F-statistic: 699.1 on 1 and 998 DF,  p-value: < 2.2e-16
1: y = alpha*Intercept + beta*zeta
2: x = 1*zeta
3: V(y) = sigma
4: V(x) = 1
5: V(zeta) = phi
6: 
Read 5 items
Model fit to raw moment matrix.

 Model Chisquare =  0.2264654   Df =  1 Pr(>Chisq) = 0.6341572
 AIC =  8.226465
 BIC =  -6.68129

 Normalized Residuals
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
 0.0000  0.1635  0.1711  0.2189  0.2564  0.4759 

 Parameter Estimates
      Estimate  Std Error  z value   Pr(>|z|)                     
alpha 1.0400668 0.05507397 18.884905 1.518098e-79 y <--- Intercept
beta  2.2553406 0.14197058 15.885971 7.926103e-57 y <--- zeta     
sigma 0.6404697 0.25612060  2.500657 1.239632e-02 y <--> y        
phi   0.8881856 0.08444223 10.518263 7.117323e-26 zeta <--> zeta  

 Iterations =  15
1")) # true parameter values
Linear hypothesis test

Hypothesis:
alpha = 1
beta = 2
sigma = 1
phi = 1

Model 1: restricted model
Model 2: model

  Res.Df Df  Chisq Pr(>Chisq)
1      5                     
2      1  4 3.8285     0.4297

------------ snip ----------

For other distributional assumptions, you'd have to write your own objective
function but you may still be able to use sem or one of the other SEM
packages.

I hope this helps,
 John

-----------------------------------------------
John Fox
Senator McMaster Professor of Social Statistics
Department of Sociology
McMaster University
Hamilton, Ontario, Canada