R: [Fwd: R-SIG-Finance Digest, Vol 60, Issue 18]
I just realized I used Robust in my Stata 9.2 analysis. When I remove this, the Chi-sq values are much closer to the values I get in R (but negative, as the consistent model must be listed first in a chi-sq calculation). However, with my own data I do get this positive definite error in Stata. Is this a result of unbalanced data? R doesn't give an error, so I am inclined to ignore it in Stata. I am posting my own results from R and Stata, and attaching the data as a csv. Thanks, hope I am not wasting too much of your time here. -Steve ###R-Output###
library("plm")
fdi <- read.csv("C:/data/mydata.csv", na.strings=".")
fdiplm<-plm.data(fdi, index = c("id_code_id", "year"))
series are constants and have been removed
fdi_test<-(lfdi_2000~ lagdlfdi+ laglnstock2000+ lagtradegdp +lagdlgdp) fdi_test_fe <- plm(fdi_test, data=fdiplm, model="within") fdi_test_re <- plm(fdi_test, data=fdiplm, model="random") summary (fdi_test_fe)
Oneway (individual) effect Within Model
Call:
plm(formula = fdi_test, data = fdiplm, model = "within")
Unbalanced Panel: n=149, T=3-27, N=2697
Residuals :
Min. 1st Qu. Median 3rd Qu. Max.
-8.2100 -0.4760 0.0452 0.5670 4.8700
Coefficients :
Estimate Std. Error t-value Pr(>|t|)
lagdlfdi 0.1564759 0.0180645 8.6621 < 2.2e-16 ***
laglnstock2000 0.7621350 0.0246798 30.8809 < 2.2e-16 ***
lagtradegdp 0.0178568 0.0025859 6.9055 5.003e-12 ***
lagdlgdp 0.2601477 0.0427744 6.0818 1.188e-09 ***
---
Signif. codes: 0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1
Total Sum of Squares: 4606.7
Residual Sum of Squares: 2938
F-statistic: 361.237 on 4 and 2544 DF, p-value: < 2.22e-16
summary (fdi_test_re)
Oneway (individual) effect Random Effect Model
(Swamy-Arora's transformation)
Call:
plm(formula = fdi_test, data = fdiplm, model = "random")
Unbalanced Panel: n=149, T=3-27, N=2697
Effects:
var std.dev share
idiosyncratic 1.15487 1.07465 0.6617
individual 0.59044 0.76840 0.3383
theta :
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.3718 0.6700 0.7081 0.6955 0.7355 0.7401
Residuals :
Min. 1st Qu. Median Mean 3rd Qu. Max.
-9.15000 -0.47900 0.07270 -0.00713 0.59800 3.95000
Coefficients :
Estimate Std. Error t-value Pr(>|t|)
(Intercept) 16.7744214 0.1552868 108.0222 < 2.2e-16 ***
lagdlfdi 0.1632388 0.0181005 9.0185 < 2.2e-16 ***
laglnstock2000 0.8314432 0.0196444 42.3247 < 2.2e-16 ***
lagtradegdp 0.0119453 0.0020737 5.7605 8.386e-09 ***
lagdlgdp 0.2558009 0.0424599 6.0245 1.696e-09 ***
---
Signif. codes: 0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1
Total Sum of Squares: 9522.3
Residual Sum of Squares: 3140.8
F-statistic: 1367.42 on 4 and 2692 DF, p-value: < 2.22e-16
phtest(fdi_test_re, fdi_test_fe)
Hausman Test
data: fdi_test
chisq = 23.7021, df = 4, p-value = 9.164e-05
alternative hypothesis: one model is inconsistent
###end R output###
###Stata 9.2 Output--canned###
xtreg lfdi_2000 lagdlfdi laglnstock2000 lagtradegdp lagdlgdp, fe;
Fixed-effects (within) regression Number of obs =
2697
Group variable (i): id_code_id Number of groups =
149
R-sq: within = 0.3622 Obs per group: min =
3
between = 0.8234 avg =
18.1
overall = 0.6998 max =
27
F(4,2544) =
361.24
corr(u_i, Xb) = 0.3536 Prob > F =
0.0000
------------------------------------------------------------------------------
lfdi_2000 | Coef. Std. Err. t P>|t| [95% Conf.
Interval]
-------------+----------------------------------------------------------------
lagdlfdi | .1564758 .0180645 8.66 0.000 .1210532
.1918985
laglnst~2000 | .762135 .0246798 30.88 0.000 .7137404
.8105295
lagtradegdp | .0178568 .0025859 6.91 0.000 .0127861
.0229274
lagdlgdp | .2601478 .0427744 6.08 0.000 .1762716
.3440241
_cons | 17.01131 .1701713 99.97 0.000 16.67762
17.345
-------------+----------------------------------------------------------------
sigma_u | .93048942
sigma_e | 1.0746505
rho | .42847396 (fraction of variance due to u_i)
------------------------------------------------------------------------------
F test that all u_i=0: F(148, 2544) = 10.73 Prob > F =
0.0000
. estimates store FIX, title(The FE) ;
. xtreg lfdi_2000 lagdlfdi laglnstock2000 lagtradegdp lagdlgdp, re;
Random-effects GLS regression Number of obs =
2697
Group variable (i): id_code_id Number of groups =
149
R-sq: within = 0.3606 Obs per group: min =
3
between = 0.8402 avg =
18.1
overall = 0.7128 max =
27
Random effects u_i ~ Gaussian Wald chi2(4) =
2225.46
corr(u_i, X) = 0 (assumed) Prob > chi2 =
0.0000
------------------------------------------------------------------------------
lfdi_2000 | Coef. Std. Err. z P>|z| [95% Conf.
Interval]
-------------+----------------------------------------------------------------
lagdlfdi | .1631662 .0180937 9.02 0.000 .1277032
.1986291
laglnst~2000 | .830845 .0196843 42.21 0.000 .7922645
.8694255
lagtradegdp | .011992 .0020779 5.77 0.000 .0079195
.0160645
lagdlgdp | .2558113 .0424486 6.03 0.000 .1726136
.3390091
_cons | 16.77702 .1556693 107.77 0.000 16.47191
17.08212
-------------+----------------------------------------------------------------
sigma_u | .77431228
sigma_e | 1.0746505
rho | .34173973 (fraction of variance due to u_i)
------------------------------------------------------------------------------
. estimates store RAND, title(The RE) ;
. hausman FIX RAND;
---- Coefficients ----
| (b) (B) (b-B)
sqrt(diag(V_b-V_B))
| FIX RAND Difference S.E.
-------------+----------------------------------------------------------------
lagdlfdi | .1564758 .1631662 -.0066903 .
laglnst~2000 | .762135 .830845 -.06871 .014887
lagtradegdp | .0178568 .011992 .0058648 .0015393
lagdlgdp | .2601478 .2558113 .0043365 .0052695
------------------------------------------------------------------------------
b = consistent under Ho and Ha; obtained from
xtreg
B = inconsistent under Ha, efficient under Ho; obtained from
xtreg
Test: Ho: difference in coefficients not systematic
chi2(4) = (b-B)'[(V_b-V_B)^(-1)](b-B)
= 22.94
Prob>chi2 = 0.0001
(V_b-V_B is not positive definite)
###End Stata 9.2####
On Mon, May 18, 2009 at 12:26 PM, Steven Archambault
<archstevej at gmail.com>wrote:
Giovani,
Thank you so much for your comments. I am a bit new to R, and to these
mailing lists, so I apologize for being sparse on the details and examples.
I am using Stata 9.2, which might be the answer to my problem, as you
described. I have done quite a bit of internet searching, and did not read
anywhere about the use of a different method for calculating the chi-sq
value, so thanks for that.
One more issue I have been thinking about. I am assuming your Plm package
knows that the FE is the consistient model, as the same results arrive if
the code is phtest(femod, remod) or phtest(remod, femod). The order does
matter in Stata.
For complteness I am going to post my results using the same Grumfeld
dataset for both stata 9.2 (by hand calculation and canned procedure) and
R. I am using the Plm package version 1 1-2.
Regards,
Steve
## begin Stata9.2 output##
xtreg inv value capital, robust re;
Random-effects GLS regression Number of obs =
200
Group variable (i): firmid Number of groups =
10
R-sq: within = 0.7668 Obs per group: min =
20
between = 0.8196 avg =
20.0
overall = 0.8061 max =
20
Random effects u_i ~ Gaussian Wald chi2(3) =
77.70
corr(u_i, X) = 0 (assumed) Prob > chi2 =
0.0000
------------------------------------------------------------------------------
| Robust
invest | Coef. Std. Err. z P>|z| [95% Conf.
Interval]
-------------+----------------------------------------------------------------
value | .1097811 .0197587 5.56 0.000 .0710547
.1485076
capital | .308113 .0418387 7.36 0.000 .2261107
.3901153
_cons | -57.83441 24.67795 -2.34 0.019 -106.2023
-9.466507
-------------+----------------------------------------------------------------
sigma_u | 84.20095
sigma_e | 52.767964
rho | .71800838 (fraction of variance due to u_i)
------------------------------------------------------------------------------
. matrix bfe=e(b);
. matrix vfe=e(V);
. estimates store remod;
. xtreg inv value capital, robust fe;
Fixed-effects (within) regression Number of obs =
200
Group variable (i): firmid Number of groups =
10
R-sq: within = 0.7668 Obs per group: min =
20
between = 0.8194 avg =
20.0
overall = 0.8060 max =
20
F(2,188) =
40.23
corr(u_i, Xb) = -0.1517 Prob > F =
0.0000
------------------------------------------------------------------------------
| Robust
invest | Coef. Std. Err. t P>|t| [95% Conf.
Interval]
-------------+----------------------------------------------------------------
value | .1101238 .019378 5.68 0.000 .0718975
.1483501
capital | .3100653 .042795 7.25 0.000 .2256452
.3944854
_cons | -58.74393 23.37422 -2.51 0.013 -104.8534
-12.63449
-------------+----------------------------------------------------------------
sigma_u | 85.732501
sigma_e | 52.767964
rho | .72525012 (fraction of variance due to u_i)
------------------------------------------------------------------------------
###Hausman by hand###
. estimates store femod;
. matrix vre=e(V);
. matrix bre=e(b);
. matrix bdif=bfe-bre;
. matrix list bdif;
bdif[1,3]
value capital _cons
y1 -.00034265 -.00195236 .90952273
. matrix bdifp=bdif';
. matrix dv=vfe-vre;
. matrix dvi=inv(dv);
. matrix list bdif;
bdif[1,3]
value capital _cons
y1 -.00034265 -.00195236 .90952273
. matrix list bdifp;
bdifp[3,1]
y1
value -.00034265
capital -.00195236
_cons .90952273
. matrix list dvi;
symmetric dvi[3,3]
value capital _cons
value -7739.3615
capital 5808.2905 -5305.811
_cons 3.6641311 .98569198 -.00051157
. matrix chisq=bdif*dvi*bdifp;
. matrix list chisq;
symmetric chisq[1,1]
y1
y1 -.01956929
###Hausman canned###
. hausman femod remod;
---- Coefficients ----
| (b) (B) (b-B)
sqrt(diag(V_b-V_B))
| femod remod Difference S.E.
-------------+----------------------------------------------------------------
value | .1101238 .1097811 .0003427 .
capital | .3100653 .308113 .0019524 .0089965
------------------------------------------------------------------------------
b = consistent under Ho and Ha; obtained from
xtreg
B = inconsistent under Ha, efficient under Ho; obtained from
xtreg
Test: Ho: difference in coefficients not systematic
chi2(2) = (b-B)'[(V_b-V_B)^(-1)](b-B)
= -0.01 chi2<0 ==> model fitted on these
data fails to meet the asymptotic
assumptions of the Hausman test;
see suest for a generalized test ##
end Stata9.2 output ##
##begin Output R, using PLM 1.1-2###
test<-data(Grunfeld, package="Ecdat") fm <- inv~value+capital femod <- plm(fm, Grunfeld, model="within") summary(femod)
Oneway (individual) effect Within Model
Call:
plm(formula = fm, data = Grunfeld, model = "within")
Balanced Panel: n=10, T=20, N=200
Residuals :
Min. 1st Qu. Median 3rd Qu. Max.
-184.000 -17.600 0.563 19.200 251.000
Coefficients :
Estimate Std. Error t-value Pr(>|t|)
value 0.110124 0.011857 9.2879 < 2.2e-16 ***
capital 0.310065 0.017355 17.8666 < 2.2e-16 ***
---
Signif. codes: 0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1
Total Sum of Squares: 2244400
Residual Sum of Squares: 523480
F-statistic: 309.014 on 2 and 188 DF, p-value: < 2.22e-16
remod <- plm(fm, Grunfeld, model="random") summary(remod)
Oneway (individual) effect Random Effect Model
(Swamy-Arora's transformation)
Call:
plm(formula = fm, data = Grunfeld, model = "random")
Balanced Panel: n=10, T=20, N=200
Effects:
var std.dev share
idiosyncratic 2784.458 52.768 0.282
individual 7089.800 84.201 0.718
theta: 0.86122
Residuals :
Min. 1st Qu. Median 3rd Qu. Max.
-178.00 -19.70 4.69 19.50 253.00
Coefficients :
Estimate Std. Error t-value Pr(>|t|)
(Intercept) -57.834415 28.898935 -2.0013 0.04536 *
value 0.109781 0.010493 10.4627 < 2e-16 ***
capital 0.308113 0.017180 17.9339 < 2e-16 ***
---
Signif. codes: 0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1
Total Sum of Squares: 2381400
Residual Sum of Squares: 548900
F-statistic: 328.837 on 2 and 197 DF, p-value: < 2.22e-16
phtest(femod, remod)
Hausman Test
data: fm
chisq = 2.3304, df = 2, p-value = 0.3119
alternative hypothesis: one model is inconsistent
###end Plm###
On Mon, May 18, 2009 at 6:01 AM, Millo Giovanni <
Giovanni_Millo at generali.com> wrote:
Dear Steve,
I got your inquiry courtesy of Christian Kleiber, who brought it to our
attention: please next time you post anything re a given package,
include the maintainer's address. We cannot guarantee to parse all the
daily digests of the R system!
Your problem: can you please provide a reproducible example? Else it is
difficult to help, not knowing your data, your results and even the
Stata version you're using.
In the following I replicate what you might have done on a well-known
dataset.
From Stata10, on the usual Grunfeld data taken from package "Ecdat":
## begin Stata10 output ##
. xtreg inv value capital
Random-effects GLS regression Number of obs =
200
Group variable: firm Number of groups =
10
R-sq: within = 0.7668 Obs per group: min =
20
between = 0.8196 avg =
20.0
overall = 0.8061 max =
20
Random effects u_i ~ Gaussian Wald chi2(2) =
657.67
corr(u_i, X) = 0 (assumed) Prob > chi2 =
0.0000
------------------------------------------------------------------------
------
inv | Coef. Std. Err. z P>|z| [95% Conf.
Interval]
-------------+----------------------------------------------------------
------
value | .1097811 .0104927 10.46 0.000 .0892159
.1303464
capital | .308113 .0171805 17.93 0.000 .2744399
.3417861
_cons | -57.83441 28.89893 -2.00 0.045 -114.4753
-1.193537
-------------+----------------------------------------------------------
------
sigma_u | 84.20095
sigma_e | 52.767964
rho | .71800838 (fraction of variance due to u_i)
------------------------------------------------------------------------
------
. estimates store remod
. xtreg inv value capital, fe
Fixed-effects (within) regression Number of obs =
200
Group variable: firm Number of groups =
10
R-sq: within = 0.7668 Obs per group: min =
20
between = 0.8194 avg =
20.0
overall = 0.8060 max =
20
F(2,188) =
309.01
corr(u_i, Xb) = -0.1517 Prob > F =
0.0000
------------------------------------------------------------------------
------
inv | Coef. Std. Err. t P>|t| [95% Conf.
Interval]
-------------+----------------------------------------------------------
------
value | .1101238 .0118567 9.29 0.000 .0867345
.1335131
capital | .3100653 .0173545 17.87 0.000 .2758308
.3442999
_cons | -58.74393 12.45369 -4.72 0.000 -83.31086
-34.177
-------------+----------------------------------------------------------
------
sigma_u | 85.732501
sigma_e | 52.767964
rho | .72525012 (fraction of variance due to u_i)
------------------------------------------------------------------------
------
F test that all u_i=0: F(9, 188) = 49.18 Prob > F =
0.0000
. estimates store femod
. hausman femod remod
---- Coefficients ----
| (b) (B) (b-B)
sqrt(diag(V_b-V_B))
| femod remod Difference S.E.
-------------+----------------------------------------------------------
------
value | .1101238 .1097811 .0003427 .0055213
capital | .3100653 .308113 .0019524 .0024516
------------------------------------------------------------------------
------
b = consistent under Ho and Ha; obtained from
xtreg
B = inconsistent under Ha, efficient under Ho; obtained from
xtreg
Test: Ho: difference in coefficients not systematic
chi2(2) = (b-B)'[(V_b-V_B)^(-1)](b-B)
= 2.33
Prob>chi2 = 0.3119
.
## end Stata10 output ##
while from plm I get
## begin R putput ##
data(Grunfeld, package="Ecdat") fm <- inv~value+capital femod <- plm(fm, Grunfeld) remod <- plm(fm, Grunfeld, model="random") phtest(femod, remod)
Hausman Test
data: fm
chisq = 2.3304, df = 2, p-value = 0.3119
alternative hypothesis: one model is inconsistent
## end R output ##
which, besides testifying to the goodness and parsimony of an
object-oriented approach as far as screen output is concerned, looks
rather consistent to me.
I cannot but guess that the problem might stem from different RE
estimates: previous versions of Stata used the Wallace-Hussein method by
default for computing the variance of random effects. Now Stata uses
Swamy-Arora, which has been the default of 'plm' since the beginning.
Yet as plm() allows to choose, you can experiment with different values
for the 'random.method' argument in order to see if you get the Stata
result. I suggest you start by comparing the coefficient estimates you
get from Stata and R: FE should be unambiguous, RE might vary as said
above, and for good reason.
You also didn't tell us whether your by-hand calculation agrees with
phtest() output? (I guess it does not)
Please let us know, possibly with a reproducible example and providing
all the above info
Giovanni
PS please also make sure you're not using any VEEEEERY old version of
'plm' (prior to, say, 0.3): these had a bug in the p-value calculation
which made it depend on the order of models compared (so that in the
wrong case you got p.value=1).
Giovanni Millo
Research Dept.,
Assicurazioni Generali SpA
Via Machiavelli 4,
34132 Trieste (Italy)
tel. +39 040 671184
fax +39 040 671160
---------------------------------------------------------------------- -- Subject: [R-SIG-Finance] Chi-sq Hausman test---R vs Stata From: Steven Archambault <archstevej at gmail.com> Date: Sun, 17 May 2009 23:14:13 -0600 To: r-sig-finance at stat.math.ethz.ch To: r-sig-finance at stat.math.ethz.ch Hi all, I am running a panel time series regression testing Fixed Effects and Random Effects. I decided to calculate the chi-sq value for the Hausman test in both R (Phtest) and Stata. I get different results. Even within Stata, calculating the Chi-sq value with the canned procedure or by hand (using matrices) gives different results. So, the question should come up
there as
well. Does anybody have any insight on how to pick which results to use? I guess the one that gives the result I want? Having different programs give quite different values for the same tests is frustrating me. I'd
be interested in any feedback folks have!
Thanks,
Steve
[[alternative HTML version deleted]]
-------------- next part -------------- An HTML attachment was scrubbed... URL: <https://stat.ethz.ch/pipermail/r-sig-finance/attachments/20090518/c6b65272/attachment.html> -------------- next part -------------- A non-text attachment was scrubbed... Name: fdi_data.csv Type: application/vnd.ms-excel Size: 174320 bytes Desc: not available URL: <https://stat.ethz.ch/pipermail/r-sig-finance/attachments/20090518/c6b65272/attachment.xlb>