[Bioc-devel] eSet questions

Thu, Jan 11, 2007 9:04 AM

small "s" for Dataset ... threw me for a minute

I am not sure annotatedDataset is going anywhere.  It probably should
be removed.


Now to the real question.  While trying to think about how to handle a
couple of data sets, I've started to become convinced that the current
design of an eSet could be improved.  As it stands now, the design makes
some assumptions about how the data should be stored and interpreted
that I think are unnecessary and make it harder to generalize to other
data types.

I have three use cases in mind:
[1] A vanilla two-color mRNA microarray expression data set, but one
that is not quantified with a software package currently recognized by
either limma or marray.
[2] A MINiML format file containing glass array data from GEO
[3] Reverse phase protein array (RPPA) data

In the first two cases, I'd like to be able to get all the raw data
files into R as quickly as possible, and work there to figure out which
columns represent red and green foreground and background, after which I
can convert from the input format to something where I can use limma or
marray.

In the RPPA case, the notions of "featureData" and "phenoData" are
reversed.  Lysates of individual patient samples are spotted on the
array, which is then probed with a mono-specific antibody targeting one
protein. (See, for example, Tibes et al., Mol Cancer Ther 2006; 5:2512-21.)

One way to handle all three cases would be in something I'm tentatively
calling an "ArrayCube", which should correspond fairly closely to a set
of files on a hard drive.  Each file holds a two-dimensional table,
where the rows correspond to spots on an array and the columns
correspond to various things measured by a quantification software
package.  An ArrayCube can be thought of conceptually as a list of these
two-dimensional objects, where this third (list) dimension corresponds
to whatever label-producing stuff was hybridized or incubated on the array.

Given this description, one might attempt a design something like

setClass("ArrayCube", representation=list(
	rawData = "AssayData",
	experimentData = "MIAME",
	featureData = "AnnotatedDataFrame",
	hybridizationData = "AnnotatedDataFrame",
	measurementData = "AnnotatedDataFrame"
))

This obviously looks a lot like an eSet.  The differences are

It seems to me that you don't want to adopt the "AssayData"-"phenoData"
relationship documented in the eSet man page.  So the above is
not like an eSet, and the conflict is mostly with the AssayData
structure.

This constraint is quite important for all the applications of eSet
in use, so abandoning it suggests designing another class.

I have not had time to think at length about the RPPA data structure.
It seems possible to use the eSet design to represent it, but there
is substantial reorganization of the data relative to its physical
origins.  There are costs and benefits to shoehorning the data into
an ExpressionSet-like structure and I don't know how to weigh
them at the moment.  The real question seems to me to be whether it
is valuable to request X[G, S] for any of these data structures, where
X is the basic container, G is a predicate identifying a gene selection
and S is a predicate identifying a sample selection.  If you want that
AND you want to inherit the infrastructure available for ExpressionSets
to get that, then it makes sense for you to try to extend what we have in Biobase
to cover what you are dealing with.  It seems to me that you might
want to combine AssayData and AnnotatedDataFrame components in a
structure that does not extend eSet to get what you want.

My reaction, based on very brief contemplation, is that you'll be designing
a structure that does not extend eSet but shares some components and some
functionalities.  If the ability to represent, e.g.,
RPPA and Expression arrays in a single container type becomes important
we'll consider how the eSet constraints need to evolve.  Thus far they
seem to be effective for the most common types of high-throughput data
encountered.

The folks who actually designed the key Biobase containers may well have
different views of this situation.  This is just my personal reaction.

[Bioc-devel] eSet questions

Thread (4 messages)