Stratified Bootstrap question
Dear Tim, Thank you very much for taking time giving me advices on my questions. I talked with my professor about this bootstrapping question whether to resample clinic or resample clinic + resample patients within clinic. I was told that the second method might destroy the correlation structure between the patients within a clinic. So I am thinking if it is worthy that I do a simulation to compare the two kinds of bootstrapping method. I mean, is this comparision meaningful and is it worth of doing? What do you think? Thank you. Qian
On 1 Apr 2005, Tim Hesterberg wrote:
Qian wrote:
I talked with my advisor yesterday about how to do bootstrapping for my scenario: random clinic + random subject within clinic. She suggested that only clinic are independent units, so I can only resample clinic. But I think that since subjects are also independent within clinic, shall I resample subjects within clinic, which means I have two-stage resampling? Which one do you think makes sense?
This is a tough issue; I don't have a complete answer. I'd appreciate input from other r-help readers. If you randomly select clinics, then randomly select patients within the clinics: (1) by bootstrapping just clinics, you capture both sources of variation -- the between-subject variation is incorporated in the results for each clinic. (2) by bootstrapping clinics, then subjects within clinics, you end up double-counting the between-subject variation That argues for resampling just clinics. By analogy, if you have multiple subjects, and multiple measurements per subject, you should just resample subjects. However, I'm not comfortable with this if you have a small number of clinics, and relatively large numbers of patients in each clinic, and think that the between-clinic variation should be small. Then it seems better to resample both clinics and patients. I'm leery about resampling just clinics if there are a small number of clinics. Bootstrapping isn't particularly effective for small samples -- it is subject to skewness in small samples, and it underestimates variances (it's advantages over classical methods really show up with medium size samples). There are remedies for the small variance, see Hesterberg, Tim C. (2004), "Unbiasing the Bootstrap-Bootknife Sampling vs. Smoothing", Proceedings of the Section on Statistics and the Environment, American Statistical Association, 2924-2930 www.insightful.com/Hesterberg/articles/JSM04-bootknife.pdf Tim Hesterberg ======================================================== | Tim Hesterberg Research Scientist | | timh at insightful.com Insightful Corp. | | (206)802-2319 1700 Westlake Ave. N, Suite 500 | | (206)283-8691 (fax) Seattle, WA 98109-3044, U.S.A. | | www.insightful.com/Hesterberg | ======================================================== Download the S+Resample library from www.insightful.com/downloads/libraries
*************************************** Qian An Division of Biostatistics University of Minnesota (phone) 612-626-2263 (fax) 612-626-8892 Email: qiana at biostat.umn.edu