Is there any chance that you'd consider having limma depend on a "fixed"
version in preprocessCore rather than having your own separate code?
Note, I am not trying to pick on you specifically since I know there are a
number of other quantile normalization implementations in various
packages. Additionally my personal stance has always been that I am not
going to play policeman on this issue and developers are free to make
there own choices.
In any case, addressing this issue is push to the top of my stack (likely
this upcoming weekend).
Best,
Ben
Ben Bolstad <bmb at bmbolstad.com> writes:
Wolfgang,
The code in preprocessCore for quantile normalization shows its legacy
being that it was developed around probe-level Affymetrix data
from CEL files where NA values are not to be expected. There may or
not be comments to that effect in the C code documentation (actually
there is further down in the qnorm.c file for a slight variation on
implementation).
If you are willing to make the assumption that the missing data
mechanism is "missing at random" then I think the fix is fairly
just estimate the distribution using the non-missing data. If it is
instead driven by say a truncation mechanism a different fix would be
needed.
In either case I don't think the current situation is desirable and
should be fixed.
How about:
1. Let's add code to check for and raise an error if any NA's are
found. This should be easy and can be done quickly.
2. Then we could consider adding an argument that allows NA's and
handles things under the missing at random assumption, along with
documentation.
+ seth
As noted later by Wolfgang, the normalizeQuantiles() function in
limma does exactly this.
Regards
Gordon