Stratified Bootstrap question

Qian An

Thu, Apr 7, 2005 11:23 AM

Dear Tim,

Thank you very much for taking time giving me advices on my questions. I
talked with my professor about this bootstrapping question whether to
resample clinic or resample clinic + resample patients within clinic.

I was told that the second method might destroy the correlation structure
between the patients within a clinic. So I am thinking if it is worthy
that I do a simulation to compare the two kinds of bootstrapping method. I
mean, is this comparision meaningful and is it worth of doing? What do you
think? Thank you.

Qian

On 1 Apr 2005, Tim Hesterberg wrote:

Qian wrote:

I talked with my advisor yesterday about how to do bootstrapping for my
scenario: random clinic + random subject within clinic. She suggested that
only clinic are independent units, so I can only resample clinic. But I
think that since subjects are also independent within clinic, shall I
resample subjects within clinic, which means I have two-stage resampling?
Which one do you think makes sense?

This is a tough issue; I don't have a complete answer.  I'd
appreciate input from other r-help readers.

If you randomly select clinics, then randomly select patients within
the clinics:
  (1) by bootstrapping just clinics, you capture both sources of
  variation -- the between-subject variation is incorporated in the
  results for each clinic.

  (2) by bootstrapping clinics, then subjects within clinics, you
  end up double-counting the between-subject variation
That argues for resampling just clinics.

By analogy, if you have multiple subjects, and multiple measurements
per subject, you should just resample subjects.

However, I'm not comfortable with this if you have a small number of
clinics, and relatively large numbers of patients in each clinic, and
think that the between-clinic variation should be small.  Then it
seems better to resample both clinics and patients.

I'm leery about resampling just clinics if there are a small number
of clinics.  Bootstrapping isn't particularly effective for small
samples -- it is subject to skewness in small samples, and it
underestimates variances (it's advantages over classical methods
really show up with medium size samples).
There are remedies for the small variance, see
	Hesterberg, Tim C. (2004), "Unbiasing the Bootstrap-Bootknife Sampling
	vs. Smoothing", Proceedings of the Section on Statistics and the
	Environment, American Statistical Association, 2924-2930
	www.insightful.com/Hesterberg/articles/JSM04-bootknife.pdf

Tim Hesterberg

========================================================
| Tim Hesterberg       Research Scientist              |
| timh at insightful.com  Insightful Corp.                |
| (206)802-2319        1700 Westlake Ave. N, Suite 500 |
| (206)283-8691 (fax)  Seattle, WA 98109-3044, U.S.A.  |
|                      www.insightful.com/Hesterberg   |
========================================================
Download the S+Resample library from www.insightful.com/downloads/libraries

***************************************
Qian An
Division of Biostatistics
University of Minnesota
(phone) 612-626-2263
(fax) 612-626-8892
Email: qiana at biostat.umn.edu

Stratified Bootstrap question

Thread (3 messages)