Skip to content

About calculation of the gravity model in R and STATA software

4 messages · Сергей С., Jeff Newmiller, S Ellison +1 more

#
Dear colleagues!

We spent calculation of the gravity model in R and STATA software.
For calculations we used the standard package glmm in R (with parameter
family = quasipoisson)
and ppml in STATA.

Call the calculation procedure in R:

summary(glmm<-glm(formula=exports ~ ln_GDPimporter + ln_GDPexporter +
ln_GDPimppc + ln_GDPexppc + ln_Distance + ln_Tariff + ln_ExchangeRate +
Contig + Comlang + Colony_CIS + EAEU_CIS + EU_European_Union,
family=quasipoisson(link="log"),data=data_pua))

The results of the calculations in R following:

------------------------------------------------------------------------

Coefficients:
                           Estimate    Std. Error  t value Pr(>|t|)
(Intercept)           -12.53224   15.30072  -0.819  0.41357
ln_GDPimporter       0.10180    0.14988   0.679   0.49765
ln_GDPexporter       0.14612    0.79823    0.183   0.85491
ln_GDPimppc           0.34998    0.30247   1.157   0.24840
ln_GDPexppc           0.65811    0.82189    0.801   0.42409
ln_Distance             0.21838    0.16623    1.314   0.19020
ln_Tariff                 -0.05499    0.04913  -1.119  0.26411
ln_ExchangeRate     -0.11748    0.04275  -2.748  0.00646 **
Contig                      1.48321    0.28684   5.171  4.92e-07 ***
Comlang                  1.50727    0.26199    5.753   2.67e-08 ***
Colony_CIS               2.15272    0.46899   4.590  7.16e-06 ***
EAEU_CIS                -0.94417    0.29315  -3.221  0.00146 **
EU_European_Union  -0.08335    0.76733  -0.109  0.91359
---
Signif. codes:  0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1

(Dispersion parameter for quasipoisson family taken to be 2100.979)

    Null deviance: 1886758  on 251  degrees of freedom
Residual deviance:  316332  on 239  degrees of freedom
AIC: NA

Number of Fisher Scoring iterations: 8

------------------------------------------------------------------------

On the same data, we done calculations in STATA, using ppml procedure.
Call of the calculation procedure in STATA was next:

ppml exports ln_gdpimporter ln_gdpexporter ln_gdpimppc ln_gdpexppc
ln_distance ln_tariff ln_exchangerate contig comlang colony_cis eaeu_cis
eu_european_union

The results of the calculations in STATA were following:

------------------------------------------------------------------------

Iteration 1:   deviance =  425911.3
Iteration 2:   deviance =  327020.8
Iteration 3:   deviance =  316763.3
Iteration 4:   deviance =  316335.1
Iteration 5:   deviance =  316332.3
Iteration 6:   deviance =  316332.3
Iteration 7:   deviance =  316332.3

Number of parameters: 13
Number of observations: 252
Pseudo log-likelihood: -158930.64
R-squared: .75348104
Option strict is: off
-----------------------------------------------------------------------------------
                           |                    Robust
          exports       |      Coef.      Std. Err.          z    P>|z|
   [95% Conf. Interval]
------------------
+----------------------------------------------------------------
   ln_gdpimporter    |   .1018021   .0982091     1.04   0.300
-.0906843    .2942885
   ln_gdpexporter    |   .1461135   1.084255     0.13   0.893
-1.978988    2.271215
      ln_gdpimppc     |    .349982    .201011      1.74   0.082
-.0439924    .7439564
      ln_gdpexppc     |   .6581201   1.098236     0.60   0.549
-1.494383    2.810624
      ln_distance       |   .2183809    .156757     1.39    0.164
-.0888572     .525619
        ln_tariff          |  -.0549914   .0551489    -1.00   0.319
-.1630811    .0530984
  ln_exchangerate    |  -.1174816   .0343881    -3.42   0.001
-.1848812   -.0500821
           contig          |   1.483213    .168467     8.80   0.000
1.153024    1.813402
          comlang       |   1.507272   .2745761     5.49   0.000
.9691126    2.045431
       colony_cis        |   2.152723   .2338133     9.21   0.000
1.694457    2.610988
         eaeu_cis        |  -.9441651   .2469764    -3.82   0.000
-1.42823   -.4601003
eu_european_union |  -.0833477   .4955678    -0.17   0.866     -1.054643
.8879474
            _cons         |  -12.53206   21.18599    -0.59   0.554
-54.05585    28.99172
-----------------------------------------------------------------------------------

As you can see, model coefficients (second column in the results table) are
the same at least until the 4th mark (!)
However, other results (columns in the table of results, since the third)
is not the same.
Could you explain differences in the results?
In particular, why coefficients the same (the first result table columns),
but standard errors is not?
With best regards,
Sergey S.
#
I am not the person to answer your question, but have some suggestions:

Make your examples reproducible so others can confirm your results and explore other issues you may not have seen. [1] [2] 

Post your question on the R-sig-mixed mailing list, where mixed models experts hang out, instead of here where the R language is on topic rather than contributed packages like glmm.

I don't know SAS or the glmm package, but the fact that the top description line for the glmm package says it uses Monte  Carlo suggests to me that it might not be "standard" in SAS or R and that your results may vary depending on how you configure it and how the random number seed is initialized. 

[1] http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example
[2] See also the Posting Guide mentioned at the bottom of this and every email on this list.
#
R and stata are clearly doing different things.
Look up how stata calculates its standard errors (which I note are described as '_robust_ ') and compare that with the documentation for the R function.
Without knowing anything about stata or your data set, my guess would be that R's quasibinomial family is allowing for overdispersion and stata is not.

S Ellison



*******************************************************************
This email and any attachments are confidential. Any use, copying or
disclosure other than by the intended recipient is unauthorised. If 
you have received this message in error, please notify the sender 
immediately via +44(0)20 8943 7000 or notify postmaster at lgcgroup.com 
and delete this message and any copies from your computer and network. 
LGC Limited. Registered in England 2991879. 
Registered office: Queens Road, Teddington, Middlesex, TW11 0LY, UK
#
On 16 Mar 2016, at 14:10 , S Ellison <S.Ellison at LGCGroup.com> wrote:

            
With a dispersion parameter of 2100, that would be rather more of a dramatic difference....

More likely, STATA is using some sort of sandwich estimator whereas R is just upscaling the results for a Poisson regression. You might want to check out package "sandwich".

-pd