Date: Wed, 11 Jul 2007 07:04:48 -0700
From: Seth Falcon <sfalcon at fhcrc.org>
Subject: Re: [Bioc-devel] [BioC] Peculiar behaviour of
normalize.quantiles (in affy, preprocessCore) if there
are NA data
To: Ben Bolstad <bmb at bmbolstad.com>
Cc: bioc-devel <bioc-devel at stat.math.ethz.ch>
[moved to bioc-devel, where this should have started I think]
Ben Bolstad <bmb at bmbolstad.com> writes:
Wolfgang,
The code in preprocessCore for quantile normalization shows its legacy
being that it was developed around probe-level Affymetrix data straight
from CEL files where NA values are not to be expected. There may or may
not be comments to that effect in the C code documentation (actually
there is further down in the qnorm.c file for a slight variation on the
implementation).
If you are willing to make the assumption that the missing data
mechanism is "missing at random" then I think the fix is fairly trivial,
just estimate the distribution using the non-missing data. If it is
instead driven by say a truncation mechanism a different fix would be
needed.
In either case I don't think the current situation is desirable and
should be fixed.
How about:
1. Let's add code to check for and raise an error if any NA's are
found. This should be easy and can be done quickly.
2. Then we could consider adding an argument that allows NA's and
handles things under the missing at random assumption, along with
documentation.
+ seth