Skip to content
Prev 260053 / 398502 Next

Question on approximations of full logistic regression model

My usual rule is that whatever gives the widest confidence intervals
in a particular problem is most accurate for that problem :-)

Bootstrap percentile intervals tend to be too narrow.
Consider the case of the sample mean; the usual formula CI is
    xbar +- t_alpha sqrt( (1/(n-1)) sum((x_i - xbar)^2)) / sqrt(n)
The bootstrap percentile interval for symmetric data is roughly
    xbar +- z_alpha sqrt( (1/(n  )) sum((x_i - xbar)^2)) / sqrt(n)
It is narrower than the formula CI because
  * z quantiles rather than t quantiles
  * standard error uses divisor of n rather than (n-1)

In stratified sampling, the narrowness factor depends on the
stratum sizes, not the overall n.
In regression, estimates for some quantities may be based on a small
subset of the data (e.g. coefficients related to rare factor levels).

This doesn't mean we should give up on the bootstrap.
There are remedies for the bootstrap biases, see e.g.
Hesterberg, Tim C. (2004), Unbiasing the Bootstrap-Bootknife Sampling
vs. Smoothing, Proceedings of the Section on Statistics and the
Environment, American Statistical Association, 2924-2930.
http://home.comcast.net/~timhesterberg/articles/JSM04-bootknife.pdf

And other methods have their own biases, particularly in nonlinear
applications such as logistic regression.

Tim Hesterberg