The first task view rendering can now be seen at
http://www.biostat.harvard.edu/~carey/top.html
comments welcome. this was a fairly haphazard construction
of topic-set and mapping to packages.
Are you trying to put each package into one and only one category? I think
that this approach tends to emphasis fragmentation rather than unity and
doesn't do justice to packages with virtual integration. Topics like
"Bayes", "linear modelling", "factorial design", "differential expression"
and "multiple testing", for example, don't split into non-overlapping
topics. Rather they are all aspects of the same thing. Pre-processing and
normalization is somewhat more separate, but even this interacts deeply
with the same topics.
I suggest that a grouping into larger and not mutually exclusive groups
might be more helpful.
Let's think about the objectives. My gut feeling is that the
multiplicity of packages that perform effectively the same
tasks is a source of confusion and lost energy for users.
One finds that several packages perform quantile normalization
or MA plots, possibly slightly differently, and then one has to
figure out which one is the most desirable for a given activity.
To me it is relevant to the project to minimize this sort of
occurrence. It may not be possible to eliminate it, but it
would be good to minimize it, and it requires collaboration
among developers.
The task view set was started in part to help us identify
redundancies in package functionality. Larger overlapping
groups are less helpful for this aspect of the aim.
To go out on a limb: I believe the package system is very
good for developers, but somewhat cumbersome for users. We
want users to be concerned with identifying and carrying out
tasks, not identifying and loading packages. We need some
packages explicitly loaded of course, but as MLInterfaces has
demonstrated, the namespace mechanism can be used to get
access to software without requiring manual loading of
enclosing packages. This has led to my suggestion that
a few task wrappers be put in Biobase, possibly named
Preproc, DiffExp, LinMod, ReportGen, which have relatively
simple calling sequences with parameters that select approaches
in software that ultimately comes from different packages.
Developers go on as before, and users can
always go directly to the package level if they want.
Developers will communicate with the Biobase maintainer to
make sure that options are available that call their particular
wares.
I believe that teaching about Bioconductor would be greatly
simplified if we could say: to do preprocessing, use the
Preproc function of Biobase -- here are the various parameter
settings. Presently we have to distinguish marray, limma,
vsn, affy, etc., as packages, in addition to distinguishing
them as methodologies. An integrated interface like
limmaGUI could draw its preprocessing workflow component
from the Biobase specification of preprocessing.
Here are some specific problems:
The limma package isn't mentioned under "Bayesian", "FactorialDesigns",
"TimeSeries" or "MultipleTesting", even though it is probably the most used
package in each of these categories.
Limma isn't mentioned anywhere in any of the Preprocessing categories,
although it is one of the most used packages for non-affy pre-processing.
The preprocessing package arrayMagic package, for example, calls limma
functions to do normalization and pre-processing.
and does limma call marray anymore?
How are "ErrorModels" different from statistical models used to assess
differential expression? The LPE and plgem packages should surely get a
mention under differential expression rather than here. The LPE package is
very similar in spirit to the empirical Bayes approach for differential
expression, and should be grouped somewhere with packages like limma,
EBayes, siggenes etc.
I have not had enough time to use these packages to understand
how they fit in the spectrum of solutions. Your input here
is most welcome. I am particularly concerned with simply exposing
these things to potential users so that their particular virtues are
available for assessment.
As Robert has suggested, once we get a set of view terms, developers
can incorporate these terms into their package metadata or perhaps into
the "Title" fields of DESCRIPTION, so that view maintenance is simpler.
twilight is another package which seems to me to be doing differential
expression, but is currently only listed under "MultipleTesting".
What sort of graphics is included under "Visualization"? Many of the
packages with pre-processing functionality including plotting functions,
including marray, limma, arrayMagic.
Should these all be providing their own visualization software? Can
there be a centralized package for QC visualizations for two channel
arrays? Would this be good for developers/users? I don't want to
constrain developers in any way, but I would like the question to
be considered. Can a good set of task views help us set developmental
agendas and reduce duplication of effort?