Date: Thu, 12 Jul 2007 16:47:31 -0700 (PDT)
From: bmb at bmbolstad.com
Subject: Re: [Bioc-devel] Bioc-devel Digest, Vol 40, Issue 3
To: "Gordon Smyth" <smyth at wehi.EDU.AU>
Cc: bioc-devel at stat.math.ethz.ch
Note that the current C code does appropriately handle ties (depending on
your definition of appropriate) and has done so for a long time (over 5
years).
Cut from code comment:
" ** Apr 19, 2002 - Update to deal more correctly with ties (equal rank)"
Best,
Ben
Hi Seth & Ben,
thanks for your clarifying comments!
[moved to bioc-devel, where this should have started I think]
Sorry if I have been stepping on feet... the reason for posting to the
bioc user list was that more than once I have (sadly) seen people
looking at histogrammes such as that of qx shown in my previous post,
and using the suggested "cutoff" e.g. to discriminate between expressed
and un-expressed genes, and the like. I hope that this does not sound to
presumptuous, but I think it is a good thing to educate users to
critically assess such results.
Btw, normalizeQuantiles from the limma package appears to deal with NA
values more gracefully (but it is written in R, hence slower). I think
it assumes that the missingness mechanism is random.
Yes it does.
The reason the R version is a bit slower than C is mainly because of
the need to handle NAs and to treat ties carefully. Without these
considerations, the R implementation is nearly as fast. Try
normalizeQuantiles(ties=FALSE) for more speed.
Regards
Gordon