Skip to content

[RsR] minimum sample size for the robust counterpart of the t-test

2 messages · Richard Friedman, Martin Maechler

#
Dear List,

	I am a beginner in the use of robust methods. Is there a minimum  
sample size
for which the robust analog of a two sample t-test using rlm with  
default parameters and categorical
explanatory variables may be trusted to yield reliable p-values?
Is so, can you please point me at a reference which treats this problem.

Thanks and best wishes,
Rich
------------------------------------------------------------
Richard A. Friedman, PhD
Associate Research Scientist,
Biomedical Informatics Shared Resource
Herbert Irving Comprehensive Cancer Center (HICCC)
Lecturer,
Department of Biomedical Informatics (DBMI)
Educational Coordinator,
Center for Computational Biology and Bioinformatics (C2B2)/
National Center for Multiscale Analysis of Genomic Networks (MAGNet)
Room 824
Irving Cancer Research Center
Columbia University
1130 St. Nicholas Ave
New York, NY 10032
(212)851-4765 (voice)
friedman at cancercenter.columbia.edu
http://cancercenter.columbia.edu/~friedman/

I am a Bayesian. When I see a multiple-choice question on a test and I  
don't
know the answer I say "eeney-meaney-miney-moe".

Rose Friedman, Age 14
#
> Dear List, I am a beginner in the use of robust methods. Is
    > there a minimum sample size for which the robust analog of a two
    > sample t-test using rlm with default parameters and categorical
    > explanatory variables may be trusted to yield reliable p-values?
    > Is so, can you please point me at a reference which treats this
    > problem.

It's a bit more complicated, because "the" robust analog does
not exist:  There are an infinite number of possible robust
analogues to the t-test,
and my two colleagues have been actively researching on this,
not just for the two-sample case, but the general  lm() case,
*with* an emphasis on small-sample performance:

Originally, (I think) they started answering the question (+/-): 

  How do you have to estimate  sj^2 := \Var(\hat{\beta_j}) such that
    \hat{\beta_j}  +/- 1.96 * sj
  has the correct coverage probability of 95%, also for small
  samples (and of course generalizing to other probs. alpha).
 
Here's their main publication :     

 Koller, M. and Stahel, W.A. (2011), Sharpening Wald-type inference
 in robust regression for small samples, _Computational Statistics
 & Data Analysis_ *55*(8), 2504-2515.

and the good news is that Manuel Koller has implemented
everything in package robustbase, lmrob() {which you'd use
instead of rlm()},
and to use the new methods, you use (something like)

   summary( fmod <- lmrob(Y ~ ., data=..., setting = 'KS2011') )

I've CC'ed them here, just in case they are accidentally not yet
subscribed to the R-SIG-Robust mailing list.

I hope this helps,
Martin Maechler, ETH Zurich


    > Thanks and best wishes, Rich
    > ------------------------------------------------------------
    > Richard A. Friedman, PhD Associate Research Scientist,
    ................
    ................

    > R-SIG-Robust at r-project.org mailing list
    > https://stat.ethz.ch/mailman/listinfo/r-sig-robust