Hi Florian -- Seth had forwarded an email of his to you about eSets. I
wanted to make sure you were aware of some last minute changes that
had to be made, as well as the overall revisions to the
class. Attached is a summary of changes made. You might find the
example at the end (extending eSet) to be useful for your
situation. Let me know if I can be of any assistance.
Martin
--------
Biobase/eSet developers,
Here is a brief summary of the version of eSet to be included in the
this release; the code builds and checks without error, though missing
documentation (to be corrected within the week) mean that there are
still warning messages during check. The most recent changes are in
svn.
There is one very recent change, to the overall class structure, that
we agonized over a great deal before making at the last moment. We
recognize that this is very unfortunate timing, and that it will cause
needless work for bioconductors; we will help out as much as possible.
There are three major changes:
1. Change in class structure.
eSet -- VIRTUAL
ExpressionSet
SnpSet
(TilingSet -- not implemented)
The main functionality of eSet is to coordinate assayData, phenoData,
experimentData, and the annoation. eSet is also a generalized
container, with high-throughput data stored in the assayData
slot. eSet is a VIRTUAL class; if you want to store and manipulate a
consistent set of elements in the assay data slot you should create a
subclass of eSet. An example of how to do this is below.
ExpressionSet requires that the assayData slot contain matrix element
'exprs'; other elements (of dimension identical to exprs) are
permitted. as(exprSet, "ExpressionSet") coerces exprSet objects to
ExpressionSet, perhaps issuing warnings if ambiguities arise.
library(Biobase)
data(sample.exprSet)
obj <- as(sample.exprSet, "ExpressionSet")
obj
SnpSet is meant to contain SNP data in a manner analogous to
ExpressionSet; 'call' and 'callProbability' are required assayData
elements providing information on the call and a statement of
confidence in the call. The exact structure of these matricies is not
specified, but the idea is that 'call' encodes diploid genotypes.
2. Change in assayData storage
The assayData slot is an AssayData class union of 'list' and
'environment'; as a class union, there is no 'initialize'
method. Instead, the list or environment can be populated with
elements using a call to assayDataNew(...).
An innovation is the storageMode method, which can be used to change
how elements in assayData are stored. In particular the storageMode
can be 'lockedEnvironment', and indeed this is the default. An
environment is locked in the sense that new elements cannot be added
to the environment, and existing elements cannot be changed. This
means that the pass-by-reference semantics of environments will not
catch users off-guard:
obj <- as(sample.exprSet,"ExpressionSet") # default: lockedEnvironment
storageMode(obj) <- "environment"
obj1 <- obj
exprs(obj1) <- exprs(obj1)[1:10,1:5]
dims(obj) # yikes! obj exprs dimensions changed!
obj <- as(sample.exprSet,"ExpressionSet") # default: lockedEnvironment
storageMode(obj) <- "environment"
obj1 <- obj
exprs(obj1) <- log(exprs(obj))
identical(exprs(obj1),exprs(obj)) # TRUE: yikes again!
obj <- as(sample.exprSet,"ExpressionSet") # default: lockedEnvironment
obj1 <- obj
exprs(obj1) <- log(exprs(obj1))
identical(exprs(obj1),exprs(obj)) # FALSE: good!
Note that attempts to directly change slots in locked environments
cause an error
assayData(obj1)$exprs <- NULL
Error: cannot change value of a locked binding.
The setReplaceMethod for exprs (and assayData) succeeds by performing
a deep copy of the entire environment. Becaue this is very
inefficient, the recommended paradigm to update an element in a
lockedEnvironment is to extract it, make many changes, and then
reassign it, e.g.,
ex <- exprs(obj1)
# many changes, ex <- log(ex), ...
exprs(obj1) <- ex
lockedEnvironment offers some efficiency in copying objects, because
the environment is not copied during function calls. This is not
completely satisfactory, though
func <- function(assayData) # good: contents of env will not be copied
max(exprs(assayData)) # not so good: exprs copied from environment
3. Changes in other slots
Other slots have been changed to treat variable metadata more
efficiently (in the AnnotatedDataFrame class of slot phenoData) and to
simplify the type of data stored as experimentData. These changes are
mostly in line with the web discussions.
In making these changes, I have tried not to break the existing
interface beyond what is necessary for the new functionality (e.g.,
pData still returns the 'data' part of phenoData). One difference,
though, is that the methods dim, ncol, etc return a vector of
dimensions reflecting the shared dimensionality of the assayData
memebers; dims returns an array of dimensions of each element.
These changes affect eSets; any difficulties you might have with
exprSet probably reflect changes made several months ago to validity
checking.
Please let me know of any feedback,
Martin
--
The original 'sample.eSet' contains four elements in the assayData
slot: R, G, Rb, Gb. To derive a class from eSet for this data, create
a class, and provide initializaation and validation
methods. Optionally, update previous eSet data structures to your new
class. For instance,
setClass("SwirlSet", contains="eSet")
setMethod("initialize", "SwirlSet",
function(.Object,
phenoData = new("AnnotatedDataFrame"),
experimentData = new("MIAME"),
annotation = character(),
R = new("matrix"),
G = new("matrix"),
Rb = new("matrix"),
Gb = new("matrix"),
... ) {
callNextMethod(.Object,
assayData = assayDataNew(
R=R, G=G, Rb=Rb, Gb=Gb,
...),
phenoData = phenoData,
experimentData = experimentData,
annotation = annotation)
})
setValidity("SwirlSet", function(object) {
assayDataValidMembers(assayData(object), c("R", "G", "Rb", "Gb"))
})
data(sample.eSet)
obj <- updateOldESet(sample.eSet,"SwirlSet")
Seth Falcon <sfalcon at fhcrc.org> writes:
Hi Florian,
As you may have noticed, the prada package is not building against the
new Biobase code.
My apologies for not keeping you more in the loop regarding the
refactoring of Biobase and eSet in particular.
Here's the story:
There has been a consensus for awhile that exprSet is not general
enough to handle the new chip technologies that are emerging. eSet
was proposed (awhile ago) as a replacement, but its design has been
provisional. I know you have been using it.
We've recently had time to revisit the design. For the gory details,
see the discussion here:
http://wiki.fhcrc.org/bioc/Core_Bioconductor_Classes_Discussion
Martin Morgan (cc'd) has implemented a refactored eSet along with some
subclasses. See the latest Biobase svn for details.
Briefly, the idea is that eSet is now an abstract superclass and that
for each technology we will have concrete subclasses.
So to make prada work, I suspect you will need to create a subclass of
eSet of your own, unless one of Martin's subclasses will work for you.
I realize this may not have been the news your were hoping for.
Please have a look at the changes and feel free to ask me or Martin
any questions (might be good to send the questions to bioc-devel,
however).
Thanks,
--
+ seth