Dear List, I am a beginner in the use of robust methods. Is there a minimum sample size for which the robust analog of a two sample t-test using rlm with default parameters and categorical explanatory variables may be trusted to yield reliable p-values? Is so, can you please point me at a reference which treats this problem. Thanks and best wishes, Rich ------------------------------------------------------------ Richard A. Friedman, PhD Associate Research Scientist, Biomedical Informatics Shared Resource Herbert Irving Comprehensive Cancer Center (HICCC) Lecturer, Department of Biomedical Informatics (DBMI) Educational Coordinator, Center for Computational Biology and Bioinformatics (C2B2)/ National Center for Multiscale Analysis of Genomic Networks (MAGNet) Room 824 Irving Cancer Research Center Columbia University 1130 St. Nicholas Ave New York, NY 10032 (212)851-4765 (voice) friedman at cancercenter.columbia.edu http://cancercenter.columbia.edu/~friedman/ I am a Bayesian. When I see a multiple-choice question on a test and I don't know the answer I say "eeney-meaney-miney-moe". Rose Friedman, Age 14
[RsR] minimum sample size for the robust counterpart of the t-test
2 messages · Richard Friedman, Martin Maechler
Richard Friedman <friedman at cancercenter.columbia.edu>
on Wed, 15 Jun 2011 15:10:03 -0400 writes:
> Dear List, I am a beginner in the use of robust methods. Is
> there a minimum sample size for which the robust analog of a two
> sample t-test using rlm with default parameters and categorical
> explanatory variables may be trusted to yield reliable p-values?
> Is so, can you please point me at a reference which treats this
> problem.
It's a bit more complicated, because "the" robust analog does
not exist: There are an infinite number of possible robust
analogues to the t-test,
and my two colleagues have been actively researching on this,
not just for the two-sample case, but the general lm() case,
*with* an emphasis on small-sample performance:
Originally, (I think) they started answering the question (+/-):
How do you have to estimate sj^2 := \Var(\hat{\beta_j}) such that
\hat{\beta_j} +/- 1.96 * sj
has the correct coverage probability of 95%, also for small
samples (and of course generalizing to other probs. alpha).
Here's their main publication :
Koller, M. and Stahel, W.A. (2011), Sharpening Wald-type inference
in robust regression for small samples, _Computational Statistics
& Data Analysis_ *55*(8), 2504-2515.
and the good news is that Manuel Koller has implemented
everything in package robustbase, lmrob() {which you'd use
instead of rlm()},
and to use the new methods, you use (something like)
summary( fmod <- lmrob(Y ~ ., data=..., setting = 'KS2011') )
I've CC'ed them here, just in case they are accidentally not yet
subscribed to the R-SIG-Robust mailing list.
I hope this helps,
Martin Maechler, ETH Zurich
> Thanks and best wishes, Rich
> ------------------------------------------------------------
> Richard A. Friedman, PhD Associate Research Scientist,
................
................
> R-SIG-Robust at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-robust