Skip to content

Regression Testing

9 messages · David Winsemius, Mojo, Andrew Miles +1 more

#
I'm new to R and some what new to the world of stats.  I got frustrated 
with excel and found R.  Enough of that already.

I'm trying to test and correct for Heteroskedasticity

I have data in a csv file that I load and store in a dataframe.

 > ds <- read.csv("book2.csv")
 > df <- data.frame(ds)

I then preform a OLS regression:

 > lmfit <- lm(df$y~df$x)

To test for Heteroskedasticity, I run the BPtest:

 > bptest(lmfit)

         studentized Breusch-Pagan test

data:  lmfit
BP = 11.6768, df = 1, p-value = 0.0006329

 From the above, if I'm interpreting this correctly, there is 
Heteroskedasticity present.  To correct for this, I need to calculate 
robust error terms.  From my reading on this list, it seems like I need 
to vcovHC.

 > vcovHC(lmfit)
               (Intercept)         df$x
(Intercept)  1.057460e-03 -4.961118e-05
df$x       -4.961118e-05  2.378465e-06

I'm having a little bit of a hard time following the help pages.  So is 
the first column the intercepts and the second column new standard errors?

Thanks,
mojo
#
On Jan 20, 2011, at 2:08 PM, Mojo wrote:

            
No, It's a variance-covariance matrix, so all of the elements are  
variance estimates. To get what you are expecting ... the SE's of the  
coefficients (which are the diagonal elements of a var-covar  
matrix,   .... you would wrap sqrt(diag(.)) around that object.
David Winsemius, MD
West Hartford, CT
#
On 1/20/2011 3:37 PM, David Winsemius wrote:
Perfect.  Thank you very much!

Mojo
#
On Thu, 20 Jan 2011, Mojo wrote:

            
Just btw: lm(y ~ x, data = df) is somewhat easier to read and also easier 
to write when the formula involves more regressors.
That is one option. Another one would be using WLS instead of OLS - or 
maybe FGLS. As the model just has one regressor, this might be possible 
and result in a more efficient estimate than OLS.
That's another option, yes.
Yes, the manual page is somewhat technical but the first thing the 
"Details" section does is: It points you to some references that should be 
easier to read. I recommend starting with

      Zeileis A (2004), Econometric Computing with HC and HAC Covariance
      Matrix Estimators. _Journal of Statistical Software_, *11*(10),
      1-17. URL <URL: http://www.jstatsoft.org/v11/i10/>.

That has also some worked examples.
As David pointed out, it's the full covariance matrix estimate.

hth,
Z
#
Perhaps the easiest way to incorporate the heteroskedasticity  
consistent SE's and output them in a familiar and easy to interpret  
format is to use coeftest() in the lmtest package.

coeftest(myModel, vcov=vcovHC(myModel))

Andrew Miles
On Jan 20, 2011, at 4:42 PM, Achim Zeileis wrote:

            
#
On 1/20/2011 4:42 PM, Achim Zeileis wrote:
I thought that WLS (which I guessing is a weighted regression) is really 
only useful when you know or at least have an idea of what is causing 
the Heteroskedasticity?  I'm not familiar with FGLS.  I plan on adding 
additional independent variables as I get more comfortable with everything.
I will look into that.

Thanks,
Mojo
#
On Fri, 21 Jan 2011, Mojo wrote:

            
Yes. But with only a single variable that shouldn't be too hard to do. 
Also in the Breusch-Pagan test you specify a hypothesized functional form 
for the variance.
There is a worked example in

   demo("Ch-LinearRegression", package = "AER")

The corresponding book has some more details.

hth,
Z
#
On 1/21/2011 9:13 AM, Achim Zeileis wrote:
If I were to use vcovHAC instead of vcovHC, does that correct for serial 
correlation as well as Heteroskedasticity?

Thanks,
Mojo
#
On Fri, 21 Jan 2011, Mojo wrote:

            
Yes, as the name (HAC = Heteroskedasticity and Autocorrelation Consistent) 
conveys. But for details please read the papers that accompany the 
software package and the original references cited therein.
Z