I'm new to R and some what new to the world of stats. I got frustrated
with excel and found R. Enough of that already.
I'm trying to test and correct for Heteroskedasticity
I have data in a csv file that I load and store in a dataframe.
> ds <- read.csv("book2.csv")
> df <- data.frame(ds)
I then preform a OLS regression:
> lmfit <- lm(df$y~df$x)
To test for Heteroskedasticity, I run the BPtest:
> bptest(lmfit)
studentized Breusch-Pagan test
data: lmfit
BP = 11.6768, df = 1, p-value = 0.0006329
From the above, if I'm interpreting this correctly, there is
Heteroskedasticity present. To correct for this, I need to calculate
robust error terms. From my reading on this list, it seems like I need
to vcovHC.
> vcovHC(lmfit)
(Intercept) df$x
(Intercept) 1.057460e-03 -4.961118e-05
df$x -4.961118e-05 2.378465e-06
I'm having a little bit of a hard time following the help pages. So is
the first column the intercepts and the second column new standard errors?
Thanks,
mojo
Regression Testing
9 messages · David Winsemius, Mojo, Andrew Miles +1 more
On Jan 20, 2011, at 2:08 PM, Mojo wrote:
I'm new to R and some what new to the world of stats. I got frustrated with excel and found R. Enough of that already. I'm trying to test and correct for Heteroskedasticity I have data in a csv file that I load and store in a dataframe.
ds <- read.csv("book2.csv")
df <- data.frame(ds)
I then preform a OLS regression:
lmfit <- lm(df$y~df$x)
To test for Heteroskedasticity, I run the BPtest:
bptest(lmfit)
studentized Breusch-Pagan test
data: lmfit
BP = 11.6768, df = 1, p-value = 0.0006329
From the above, if I'm interpreting this correctly, there is
Heteroskedasticity present. To correct for this, I need to
calculate robust error terms. From my reading on this list, it
seems like I need to vcovHC.
vcovHC(lmfit)
(Intercept) df$x (Intercept) 1.057460e-03 -4.961118e-05 df$x -4.961118e-05 2.378465e-06 I'm having a little bit of a hard time following the help pages. So is the first column the intercepts and the second column new standard errors?
No, It's a variance-covariance matrix, so all of the elements are variance estimates. To get what you are expecting ... the SE's of the coefficients (which are the diagonal elements of a var-covar matrix, .... you would wrap sqrt(diag(.)) around that object.
Thanks, mojo
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
David Winsemius, MD West Hartford, CT
On 1/20/2011 3:37 PM, David Winsemius wrote:
On Jan 20, 2011, at 2:08 PM, Mojo wrote:
I'm new to R and some what new to the world of stats. I got frustrated with excel and found R. Enough of that already. I'm trying to test and correct for Heteroskedasticity I have data in a csv file that I load and store in a dataframe.
ds <- read.csv("book2.csv")
df <- data.frame(ds)
I then preform a OLS regression:
lmfit <- lm(df$y~df$x)
To test for Heteroskedasticity, I run the BPtest:
bptest(lmfit)
studentized Breusch-Pagan test
data: lmfit
BP = 11.6768, df = 1, p-value = 0.0006329
From the above, if I'm interpreting this correctly, there is
Heteroskedasticity present. To correct for this, I need to calculate
robust error terms. From my reading on this list, it seems like I
need to vcovHC.
vcovHC(lmfit)
(Intercept) df$x (Intercept) 1.057460e-03 -4.961118e-05 df$x -4.961118e-05 2.378465e-06 I'm having a little bit of a hard time following the help pages. So is the first column the intercepts and the second column new standard errors?
No, It's a variance-covariance matrix, so all of the elements are variance estimates. To get what you are expecting ... the SE's of the coefficients (which are the diagonal elements of a var-covar matrix, .... you would wrap sqrt(diag(.)) around that object.
Perfect. Thank you very much! Mojo
On Thu, 20 Jan 2011, Mojo wrote:
I'm new to R and some what new to the world of stats. I got frustrated with excel and found R. Enough of that already. I'm trying to test and correct for Heteroskedasticity I have data in a csv file that I load and store in a dataframe.
ds <- read.csv("book2.csv")
df <- data.frame(ds)
I then preform a OLS regression:
lmfit <- lm(df$y~df$x)
Just btw: lm(y ~ x, data = df) is somewhat easier to read and also easier to write when the formula involves more regressors.
To test for Heteroskedasticity, I run the BPtest:
bptest(lmfit)
studentized Breusch-Pagan test
data: lmfit
BP = 11.6768, df = 1, p-value = 0.0006329
From the above, if I'm interpreting this correctly, there is
Heteroskedasticity present. To correct for this, I need to calculate robust
error terms.
That is one option. Another one would be using WLS instead of OLS - or maybe FGLS. As the model just has one regressor, this might be possible and result in a more efficient estimate than OLS.
From my reading on this list, it seems like I need to vcovHC.
That's another option, yes.
vcovHC(lmfit)
(Intercept) df$x (Intercept) 1.057460e-03 -4.961118e-05 df$x -4.961118e-05 2.378465e-06 I'm having a little bit of a hard time following the help pages.
Yes, the manual page is somewhat technical but the first thing the
"Details" section does is: It points you to some references that should be
easier to read. I recommend starting with
Zeileis A (2004), Econometric Computing with HC and HAC Covariance
Matrix Estimators. _Journal of Statistical Software_, *11*(10),
1-17. URL <URL: http://www.jstatsoft.org/v11/i10/>.
That has also some worked examples.
So is the first column the intercepts and the second column new standard errors?
As David pointed out, it's the full covariance matrix estimate. hth, Z
Thanks, mojo
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Perhaps the easiest way to incorporate the heteroskedasticity consistent SE's and output them in a familiar and easy to interpret format is to use coeftest() in the lmtest package. coeftest(myModel, vcov=vcovHC(myModel)) Andrew Miles
On Jan 20, 2011, at 4:42 PM, Achim Zeileis wrote:
On Thu, 20 Jan 2011, Mojo wrote:
I'm new to R and some what new to the world of stats. I got frustrated with excel and found R. Enough of that already. I'm trying to test and correct for Heteroskedasticity I have data in a csv file that I load and store in a dataframe.
ds <- read.csv("book2.csv")
df <- data.frame(ds)
I then preform a OLS regression:
lmfit <- lm(df$y~df$x)
Just btw: lm(y ~ x, data = df) is somewhat easier to read and also easier to write when the formula involves more regressors.
To test for Heteroskedasticity, I run the BPtest:
bptest(lmfit)
studentized Breusch-Pagan test
data: lmfit
BP = 11.6768, df = 1, p-value = 0.0006329
From the above, if I'm interpreting this correctly, there is
Heteroskedasticity present. To correct for this, I need to
calculate robust error terms.
That is one option. Another one would be using WLS instead of OLS - or maybe FGLS. As the model just has one regressor, this might be possible and result in a more efficient estimate than OLS.
From my reading on this list, it seems like I need to vcovHC.
That's another option, yes.
vcovHC(lmfit)
(Intercept) df$x (Intercept) 1.057460e-03 -4.961118e-05 df$x -4.961118e-05 2.378465e-06 I'm having a little bit of a hard time following the help pages.
Yes, the manual page is somewhat technical but the first thing the
"Details" section does is: It points you to some references that
should be easier to read. I recommend starting with
Zeileis A (2004), Econometric Computing with HC and HAC Covariance
Matrix Estimators. _Journal of Statistical Software_, *11*(10),
1-17. URL <URL: http://www.jstatsoft.org/v11/i10/>.
That has also some worked examples.
So is the first column the intercepts and the second column new standard errors?
As David pointed out, it's the full covariance matrix estimate. hth, Z
Thanks, mojo
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
On 1/20/2011 4:42 PM, Achim Zeileis wrote:
On Thu, 20 Jan 2011, Mojo wrote:
I'm new to R and some what new to the world of stats. I got frustrated with excel and found R. Enough of that already. I'm trying to test and correct for Heteroskedasticity I have data in a csv file that I load and store in a dataframe.
ds <- read.csv("book2.csv")
df <- data.frame(ds)
I then preform a OLS regression:
lmfit <- lm(df$y~df$x)
Just btw: lm(y ~ x, data = df) is somewhat easier to read and also easier to write when the formula involves more regressors.
To test for Heteroskedasticity, I run the BPtest:
bptest(lmfit)
studentized Breusch-Pagan test
data: lmfit
BP = 11.6768, df = 1, p-value = 0.0006329
From the above, if I'm interpreting this correctly, there is
Heteroskedasticity present. To correct for this, I need to calculate
robust error terms.
That is one option. Another one would be using WLS instead of OLS - or maybe FGLS. As the model just has one regressor, this might be possible and result in a more efficient estimate than OLS.
I thought that WLS (which I guessing is a weighted regression) is really only useful when you know or at least have an idea of what is causing the Heteroskedasticity? I'm not familiar with FGLS. I plan on adding additional independent variables as I get more comfortable with everything.
From my reading on this list, it seems like I need to vcovHC.
That's another option, yes.
vcovHC(lmfit)
(Intercept) df$x (Intercept) 1.057460e-03 -4.961118e-05 df$x -4.961118e-05 2.378465e-06 I'm having a little bit of a hard time following the help pages.
Yes, the manual page is somewhat technical but the first thing the
"Details" section does is: It points you to some references that
should be easier to read. I recommend starting with
Zeileis A (2004), Econometric Computing with HC and HAC Covariance
Matrix Estimators. _Journal of Statistical Software_, *11*(10),
1-17. URL <URL: http://www.jstatsoft.org/v11/i10/>.
I will look into that. Thanks, Mojo
On Fri, 21 Jan 2011, Mojo wrote:
On 1/20/2011 4:42 PM, Achim Zeileis wrote:
On Thu, 20 Jan 2011, Mojo wrote:
I'm new to R and some what new to the world of stats. I got frustrated with excel and found R. Enough of that already. I'm trying to test and correct for Heteroskedasticity I have data in a csv file that I load and store in a dataframe.
ds <- read.csv("book2.csv")
df <- data.frame(ds)
I then preform a OLS regression:
lmfit <- lm(df$y~df$x)
Just btw: lm(y ~ x, data = df) is somewhat easier to read and also easier to write when the formula involves more regressors.
To test for Heteroskedasticity, I run the BPtest:
bptest(lmfit)
studentized Breusch-Pagan test
data: lmfit
BP = 11.6768, df = 1, p-value = 0.0006329
From the above, if I'm interpreting this correctly, there is
Heteroskedasticity present. To correct for this, I need to calculate
robust error terms.
That is one option. Another one would be using WLS instead of OLS - or maybe FGLS. As the model just has one regressor, this might be possible and result in a more efficient estimate than OLS.
I thought that WLS (which I guessing is a weighted regression) is really only useful when you know or at least have an idea of what is causing the Heteroskedasticity?
Yes. But with only a single variable that shouldn't be too hard to do. Also in the Breusch-Pagan test you specify a hypothesized functional form for the variance.
I'm not familiar with FGLS.
There is a worked example in
demo("Ch-LinearRegression", package = "AER")
The corresponding book has some more details.
hth,
Z
I plan on adding additional independent variables as I get more comfortable with everything.
From my reading on this list, it seems like I need to vcovHC.
That's another option, yes.
vcovHC(lmfit)
(Intercept) df$x (Intercept) 1.057460e-03 -4.961118e-05 df$x -4.961118e-05 2.378465e-06 I'm having a little bit of a hard time following the help pages.
Yes, the manual page is somewhat technical but the first thing the
"Details" section does is: It points you to some references that should be
easier to read. I recommend starting with
Zeileis A (2004), Econometric Computing with HC and HAC Covariance
Matrix Estimators. _Journal of Statistical Software_, *11*(10),
1-17. URL <URL: http://www.jstatsoft.org/v11/i10/>.
I will look into that. Thanks, Mojo
On 1/21/2011 9:13 AM, Achim Zeileis wrote:
On Fri, 21 Jan 2011, Mojo wrote:
On 1/20/2011 4:42 PM, Achim Zeileis wrote:
On Thu, 20 Jan 2011, Mojo wrote:
I'm new to R and some what new to the world of stats. I got frustrated with excel and found R. Enough of that already. I'm trying to test and correct for Heteroskedasticity I have data in a csv file that I load and store in a dataframe.
ds <- read.csv("book2.csv")
df <- data.frame(ds)
I then preform a OLS regression:
lmfit <- lm(df$y~df$x)
Just btw: lm(y ~ x, data = df) is somewhat easier to read and also easier to write when the formula involves more regressors.
To test for Heteroskedasticity, I run the BPtest:
bptest(lmfit)
studentized Breusch-Pagan test
data: lmfit
BP = 11.6768, df = 1, p-value = 0.0006329
From the above, if I'm interpreting this correctly, there is
Heteroskedasticity present. To correct for this, I need to
calculate robust error terms.
That is one option. Another one would be using WLS instead of OLS - or maybe FGLS. As the model just has one regressor, this might be possible and result in a more efficient estimate than OLS.
I thought that WLS (which I guessing is a weighted regression) is really only useful when you know or at least have an idea of what is causing the Heteroskedasticity?
Yes. But with only a single variable that shouldn't be too hard to do. Also in the Breusch-Pagan test you specify a hypothesized functional form for the variance.
I'm not familiar with FGLS.
There is a worked example in
demo("Ch-LinearRegression", package = "AER")
The corresponding book has some more details.
hth,
Z
I plan on adding additional independent variables as I get more comfortable with everything.
From my reading on this list, it seems like I need to vcovHC.
That's another option, yes.
vcovHC(lmfit)
(Intercept) df$x (Intercept) 1.057460e-03 -4.961118e-05 df$x -4.961118e-05 2.378465e-06 I'm having a little bit of a hard time following the help pages.
Yes, the manual page is somewhat technical but the first thing the
"Details" section does is: It points you to some references that
should be easier to read. I recommend starting with
Zeileis A (2004), Econometric Computing with HC and HAC Covariance
Matrix Estimators. _Journal of Statistical Software_, *11*(10),
1-17. URL <URL: http://www.jstatsoft.org/v11/i10/>.
I will look into that. Thanks, Mojo
If I were to use vcovHAC instead of vcovHC, does that correct for serial correlation as well as Heteroskedasticity? Thanks, Mojo
On Fri, 21 Jan 2011, Mojo wrote:
On 1/21/2011 9:13 AM, Achim Zeileis wrote:
On Fri, 21 Jan 2011, Mojo wrote:
On 1/20/2011 4:42 PM, Achim Zeileis wrote:
On Thu, 20 Jan 2011, Mojo wrote:
I'm new to R and some what new to the world of stats. I got frustrated with excel and found R. Enough of that already. I'm trying to test and correct for Heteroskedasticity I have data in a csv file that I load and store in a dataframe.
ds <- read.csv("book2.csv")
df <- data.frame(ds)
I then preform a OLS regression:
lmfit <- lm(df$y~df$x)
Just btw: lm(y ~ x, data = df) is somewhat easier to read and also easier to write when the formula involves more regressors.
To test for Heteroskedasticity, I run the BPtest:
bptest(lmfit)
studentized Breusch-Pagan test
data: lmfit
BP = 11.6768, df = 1, p-value = 0.0006329
From the above, if I'm interpreting this correctly, there is
Heteroskedasticity present. To correct for this, I need to calculate
robust error terms.
That is one option. Another one would be using WLS instead of OLS - or maybe FGLS. As the model just has one regressor, this might be possible and result in a more efficient estimate than OLS.
I thought that WLS (which I guessing is a weighted regression) is really only useful when you know or at least have an idea of what is causing the Heteroskedasticity?
Yes. But with only a single variable that shouldn't be too hard to do. Also in the Breusch-Pagan test you specify a hypothesized functional form for the variance.
I'm not familiar with FGLS.
There is a worked example in
demo("Ch-LinearRegression", package = "AER")
The corresponding book has some more details.
hth,
Z
I plan on adding additional independent variables as I get more comfortable with everything.
From my reading on this list, it seems like I need to vcovHC.
That's another option, yes.
vcovHC(lmfit)
(Intercept) df$x (Intercept) 1.057460e-03 -4.961118e-05 df$x -4.961118e-05 2.378465e-06 I'm having a little bit of a hard time following the help pages.
Yes, the manual page is somewhat technical but the first thing the
"Details" section does is: It points you to some references that should
be easier to read. I recommend starting with
Zeileis A (2004), Econometric Computing with HC and HAC Covariance
Matrix Estimators. _Journal of Statistical Software_, *11*(10),
1-17. URL <URL: http://www.jstatsoft.org/v11/i10/>.
I will look into that. Thanks, Mojo
If I were to use vcovHAC instead of vcovHC, does that correct for serial correlation as well as Heteroskedasticity?
Yes, as the name (HAC = Heteroskedasticity and Autocorrelation Consistent) conveys. But for details please read the papers that accompany the software package and the original references cited therein. Z
Thanks, Mojo