Skip to content

[Bioc-devel] Biobase eSet and exprSet validation

9 messages · Martin Morgan, Rafael A. Irizarry, Seth Falcon +2 more

#
Bioconductors,

A recent change to Biobase checks that eSet and exprSet objects have
the correct format, and in particular that the 'sampleNames' are
identical to the row names of the phenoData. This consistency is
important, because it is what allows coordination between the
expression and phenotype data.

The change creates a certain amount of havoc with other
packages. Often the consequences are minor (e.g., creating an invalid
object in a vignette), but it would be great to have these cleaned up
in the next couple of weeks. Please let me or the list know if we can
help, or if the object validation needs tweaking.

Now might also be a good time to replace any use of the deprecated
eset data with the new sample.eSet and sample.exprSet data sets.

Errors during build (e.g.,  functions constructing invalid objects)

GEOquery
LMGene
affycoretools
ecolitk
matchprobes
metaArray
sscore

Errors during check (e.g., examples, vignettes)

MergeMaid
affy
convert
edd
macat

Cheers!
#
how about converting all serialized exprSets to eSets?  there is an "as" operator
to help with this in Biobase...

should we try to eliminate all reliance on exprSets?  it can only be
confusing to newcomers that there are two basic containers
doing the same thing.
#
On 2 Feb 2006, stvjc at channing.harvard.edu wrote:
I think it would be beneficial to solidify and simplify the basic
container(s).  

A first step, IMO, is to update the vignette that describes eSets.
The vignette should probably renamed and the provisional comments
removed. 

A second step would be to updte the Biobase vignette to reflect that
eSets are the main show in town.

If there is consensus that eSets are "ready", then at the same time as
the above two steps, package maintainers could start to convert
vignettes and examples to using eSets and ensure that all functions
know how to deal with eSets.

But for functions that currently know how to process exprSets, what
should the behavior be when processing an eSet which may have multiple
expression matrices?  Should the function check that there is only
one?  Should the function gain an arg that specifies which matrix to
operate on?  Should functions be changed so that they always return a
list of results and always do operations on all matrices present?

I think we need to hash out some answers and examples to the above
before we can start calling as(foo, "eSet").  Am I making a mountain
out of a mole hill?

Best Wishes,

+ seth
#
i agree that we should get rid of exprSet. the oligo package uses the eSet 
exclusiveley. but we need to be very careful not lose functionailty, for 
example, i know that exprs<- does not exist for eSet. i suspect this 
method is used in various places
On Thu, 2 Feb 2006, Vincent Carey 525-2265 wrote:

            
#
On 2 Feb 2006, ririzarr at jhsph.edu wrote:
I'm a bit confused about what is going on in eSet.  I see that there
is an exprs method that is now Deprecated in favor of assayData.  

For using eSets in place of exprSets, I wonder if we want to revive
that interface and make it work like this:

   es <-  ## an eSet instance

   exprs(es) - If es at assayData has length 1, return the first element,
   otherwise, error.  Same for replacement.

   exprs(es, n) - Return the n-th element of assayData.  This won't
   work if assayData is an environment unless we store some additional
   info that defines the order.

Also, I'm confused about the constraint on the number of columns in
the expression matrices stored in assayData.  Can they be different?
My glance at the validEset function gives me the feeling that they
cannot, but then I'm confused about why ncol should report ncol for
each if they are constrained to be the same.

+ seth
#
we need use cases.  i don't think anyone ever tried to work with
multicolor arrays in this context.  i wrote the initial validity
conditions i think because no one had any objections to that constraint
(common dimensions)

eSet means (perhaps) "everything" set, and there is no justification to
interpret a single assayData matrix as expression.  so making the exprs
method look for something named "exprs" makes some sense.
#
On 2/2/06 6:57 PM, "Vincent Carey 525-2265" <stvjc at channing.harvard.edu>
wrote:
Going with the MIAME thought process, it makes sense to have an exprs()
matrix and an se.exprs() matrix that are treated as "special" (the "derived
measurement values" and the "reliability indicator" in the MIAME 1.1 draft).
I think having some standard here is important for general development
purposes (other packages need to know where to get data for processing).
Then, the other matrices in assayData are simply "raw data" and could be
obtained using standard list accessors.

Sean
#
On 2/2/06 4:55 PM, "Seth Falcon" <sfalcon at fhcrc.org> wrote:

            
What happens with subsetting if they are not the same?  Seems like a reason
to keep the constraint.  I didn't look, sub the number of rows is also
constrained to be the same?

Sean
#
On 3 Feb 2006, sdavis2 at mail.nih.gov wrote:
ok, I was missing some concepts and am in agreement that we don't want
to force an eSet with a length one assayData slot to be an exprSet
equivalent.
So it sounds like the original implementation in which special names
are recognized for identifying elements of the assayData is sensible?
If we put this back in, I propose more verbose names:

expressionValues 
expressionStandardErrors

The method names should not change: exprs() and se.exprs(),
respectively.

One thing I'm wondering, however, is whether these deserve their own
slots?  The magic list/env element names make me nervous --- going
down that road, we could start putting more and more special names in
the list/env and it really starts to look like an object without any
real definition.

+ seth