Skip to content

Testing general hypotheses on regression coefficients

10 messages · Søren Højsgaard, Bert Gunter, Scott Kostyshak +3 more

#
Hi.

Say I have a model like

y = a + B1*x1 + B2*x2 + B3*x3 + B4*x4 + e

and I want to test

H0: B2/B1 = 0

or

H0: B2/B1=B4/B3

(whatever H1). How can I proceed?

I now about car::linearHypothesis, but I can't figure out a way to do the 
tests above.

Any hint?

Thanks.

C
#
AFAICS you are not testing a linear hypothesis (which is of the form Lb=b0 where L is a matrix and b=(a,B1,B2,B3,B3) is the parameter vector).

If, for simplicity, your model is E(y) = a + bx then -a/b is the x-value for which y is zero.

When you turn to estimates then u = -a/b is the ratio of two (typically correlated) normal variables and such a ratio is *not* normal. (Just think of the Cauchy distribution.)

One approach is to calculate the approximate variance of u and then construct a Wald test or similar while hoping for the best. Alternatively one could perhaps try with a parametric bootstrap test. 

Just ideas. Good luck.
S?ren




-----Original Message-----
From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of Chris
Sent: 6. september 2014 04:17
To: r-help at stat.math.ethz.ch
Subject: [R] Testing general hypotheses on regression coefficients

Hi.

Say I have a model like

y = a + B1*x1 + B2*x2 + B3*x3 + B4*x4 + e

and I want to test

H0: B2/B1 = 0

or

H0: B2/B1=B4/B3

(whatever H1). How can I proceed?

I now about car::linearHypothesis, but I can't figure out a way to do the tests above.

Any hint?

Thanks.

C

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
#
Well:

1) 8th grade algebra tells me B2/B1 == 0 <==> B2 =0;

2) I suspect you would need to provide more context for the other, as
you may be going about this entirely incorrectly (have you consulted a
local statistician?):  your nonlinear hypothesis probably can be made
linear under the right parametrization, but context might suggest
something entirely different than the approach that motivated your
query.

3) But forget all that! -- this is a list about the R language, not
statistics -- which seems to be the essence of your query --  although
I grant that the intersection is nonempty. But for statistics help,
you should try a statistics list like stats.stackexchange.com instead.

Cheers,
Bert

Bert Gunter
Genentech Nonclinical Biostatistics
(650) 467-7374

"Data is not information. Information is not knowledge. And knowledge
is certainly not wisdom."
Clifford Stoll
On Fri, Sep 5, 2014 at 7:17 PM, Chris <bonsxanco at yahoo.com> wrote:
#
Hi Chris,
As noted by Bert, think about this.
Take a look at car::deltaMethod. I suggest you study the theory of the
delta method. If you happen to have taken a graduate
statistics/econometrics class it should not be difficult and can
provide some insights. If not, at least consider that the delta method
can lead to misleading estimates (biased standard errors) in many
cases for finite samples. You might want to run some simulations to
get a feel for it.

Best,

Scott


--
Scott Kostyshak
Economics PhD Candidate
Princeton University
#
Hi.

First of all, thanks to all who have replied.
EViews (econometrics program) doesn't have the same opinion:

Wald test on my real model (edited):

* H0: B3/B2 = 0 -> F-stat = 37.82497 
* H0: B3 = 0    -> F-stat = 16.31689
The context is this: I'm estimating a model which is:

d(y) = a + B1*y(-1) + B2*X_p(-1) + B3*X_n(-1) + other + error

where X_p and X_n are partial sum decompositions of positive and negative shocks:

X_p(t) = X_p(t-1) + (d(X_p(t))>0)*d(X_p(t)) ; X_p(0)=0
X_n(t) = X_n(t-1) + (d(X_n(t))>0)*d(X_n(t)) ; X_p(0)=0

I think this is enough, but I can provide the full references.

Now, back to the problem: testing B2/B1=0 tells me about that the long term effect, while testing for B2/B1=B3/B1 tells me that about the equality of long term effects to negative and positive shocks.
I just gave a quick look and searched about delta method, but I can't see how it would help in testing the restrictions above. I'll read more about it, though, as it seems interesting, thanks for the pointer.

(Sorry if this e-mail goes out of context, but the first time I sent it through gmane, as I wasn't subscribed.)

Chris
#
Scott said:
I said:
Actually it seems that it should be the way to go: I just noticed under the EViews Wald test window the message "Delta method computed using analytic derivatives.
"

Anyway, I wonder if there's some easier way to do it.

Best,

Chris
1 day later
#
On 06 Sep 2014, at 12:24 , bonsxanco <bonsxanco at yahoo.com> wrote:

            
And when the econometrics program contradicts what you learned in 8th grade, surely the latter is wrong and the former is right, because it is done by a computer and computers cannot be wrong? ;-)

Probably what this shows most of all is a weakness of the Wald test approach: The s.e. of (b3hat/b2hat) will likely differ from s.e.(b3hat)/b2hat and hence the test statistics will differ even though they really test the same hypothesis. Actually, there are two generic weaknesses: (a) the somewhat arbitrary choice of test statistic and (b) the fact that the s.e. is not calculated at the null value of the parameter, but at the estimate.
1 day later
#
Others have discussed some of the theoretical approaches (delta
method), but as has also been pointed out, this is a mailing list
about R, not theory, so here are some approaches to your question from
the approach of those of us who like programming R more than
remembering theory.

I assume that one reason you may be interested in B2/B1 is that you
want the confidence interval on the quantity, not just the test of
whether it is 0 (that test being equivalent to B2=0 unless B1 is
exactly equal to 0).  So I will focus more on confidence intervals
(which you can use as tests by seeing if the null value is in the
interval/region or not).

Approach 1, simulation:

If all the assumptions hold for the linear regression, then the
parameter estimates are considered to by multivariate normal.  You can
get the covariance matrix for this normal using the vcov function on
the summary of your fitted object.  Now you can use the mvrnorm
function with the estimated means and covariance to generate a bunch
of observations from this multivariate normal and compute B2/B1 or
some combination of B2/B1 and B4/B3 for each observation.  These
values represent the distribution of interest and you can calculate a
confidence interval by finding the quantiles of the values (0.025 and
0.975 for 95%) or finding the HPD interval (minimum width interval),
the emp.hpd function in the TeachingDemos package is one way to do
this.  For your second hypothesis you could look at B2/B1 - B4/B3 = 0
or (B2/B1) / (B4/B3) = 1, or create a joint confidence region on the 2
ratios and see if the x=y line intersects that region.

Approach 2, bootstrap:

Bootstrap the whole process, fit the regression model then find the
ratio of the estimates.  Find the bootstrap confidence interval of the
ratio(s), follow above advice.

Approach 3, Bayes:

Fit a Bayesian regression model and look at the posterior distribution
of the ratio(s) of interest, calculate the credible interval/region
(the steps will be similar to the previous approaches).

Approach 4, simulate from the null:

Fit your regression model under then null hypothesis of interest being
true (for a more complicated null, your second, you may need to use
optimization or quadratic programming to allow some values to vary,
but have others dependent on those, then find the least squares
solution).  Now simulate data based on that model, fit the full
regression to the simulated data sets and compare the parameter
estimates (or ratios thereof) to the parameter estimates from the
original data.


You could try any of these approaches for hypotheses where traditional
linear hypotheses work and compare the results from the traditional
approach to the above approaches to see how they compare (and how many
iterations/samples you will need).
On Fri, Sep 5, 2014 at 8:17 PM, Chris <bonsxanco at yahoo.com> wrote:

  
    
4 days later
#
On Sunday, September 7, 2014 5:47 PM, peter dalgaard <pdalgd at gmail.com> wrote:
I simply thought that there was a "standard" way to do this: EViews and Stata 
both give the exact same F statistic for my original problem. Given that these 
programs were not developed by the same author (AFAIK), there is some specific 
way to reformulate the restriction which make EViews and Stata give the same 
answer.
#
On Monday, September 8, 2014 6:46 PM, Greg Snow <538280 at gmail.com> wrote:

            
Thank you Greg for dedicating some time to my problem and giving
advice on how I can tackle the issue. It is very appreciated.
Unfortunately I think I will use another program for my original
problem. Anyway, I'll go through all your suggestions, time
permitting.

Best,

Chris