Skip to content
Prev 961 / 21312 Next

[Bioc-devel] medianpolish for Affymetrix genechip probesets

On 3/9/07, Ben Bolstad <bmb at bmbolstad.com> wrote:
I fully support your idea of providing one code base for all "basic"
modeling and operators used in BioC.   I think affyPLM is a good step
towards this.

May I suggest taking this even step further and provide a low-level
package for single-step modeling / transformations operating on "as
basic data types as possible"?  This package should be used by
developers only to be incorporated inside other functions.  By keeping
the API to use basic data types only (e.g. vectors, matrices, data
frames, lists), it will also be possible to keep it stable over a very
long time.  Such an API does relies much less on the current designs
(e.g. AffyBatch, eSet etc) and other packages (minimal number of
packages in DESCRIPTION's Depends).  It will therefore provide a
common code base that is more stable over time.

Advantages with a low-level API package for single-step methods:
* More likely to be reused.
* Easier to define/document exactly what is done and what is changed
between versions.
* Easier to provide compare algorithms and verify consistency between versions.
* Bug fixes in one package, not many.

This would apply to more packages than affyPLM.  Several of the
probe-level model (PLM) functions in affyPLM could be written so they
accept matrices only, and at a probeset by probeset basis.  For
example, fitPLM.matrix(X, flavor=c("rlm", "median.polish"), ...) could
accept a matrix of probe signals for a single probeset.  This would be
the API closest to the internal code (i.e. the .Call():s).
Higher-level packages could then build up/add fitPLM.AffyBatch(),
fitPLM.eSet() etc.  These higher level packages would evolve with BioC
as new classes/structures are introduced, but fitPLM.matrix() would
remain the same. [I'm using S3 notation because it is shorter to
write; analogously for S4].

I am aware that the above is somewhat the idea of Biobase, but I would
like to take it one step further and make it "more low-level".  For
instance, methods like rowQ(), rowMedians(), and rowMax() in Biobase
could be lifted out to a even lower-level package.  What I am
suggesting should basically never be used explicitly by the end user,
but will be very handy for developers.

Cheers

Henrik

PS. Ben, we've chatted about this offline before, but I would like to
bring it up in public too. DS.