[moved to bioc-devel, where this should have started I think]
Ben Bolstad <bmb at bmbolstad.com> writes:
Wolfgang,
The code in preprocessCore for quantile normalization shows its legacy
being that it was developed around probe-level Affymetrix data
from CEL files where NA values are not to be expected. There may or
not be comments to that effect in the C code documentation (actually
there is further down in the qnorm.c file for a slight variation on
implementation).
If you are willing to make the assumption that the missing data
mechanism is "missing at random" then I think the fix is fairly
just estimate the distribution using the non-missing data. If it is
instead driven by say a truncation mechanism a different fix would be
needed.
In either case I don't think the current situation is desirable and
should be fixed.
How about:
1. Let's add code to check for and raise an error if any NA's are
found. This should be easy and can be done quickly.
2. Then we could consider adding an argument that allows NA's and
handles things under the missing at random assumption, along with
documentation.
+ seth
As noted later by Wolfgang, the normalizeQuantiles() function in
limma does exactly this.
Regards
Gordon