After receiving your collective feedback on a recent eSet class definition modifications I made, we changed our design to better reflect the reality microarray data analysts face. In the newly updated Biobase package we take the view of there being two main curators of covariates (be they of type phenotype, genotype, experimental, etc.): those covariates recorded by the experimenter and those covariates recorded by the measuring device (typically stored as header information in a data file). In terms of the eSet class and its derivatives, experimenter curated covariates should be housed in the phenoData (AnnotatedDataFrame) slot and manufacturer curated covariates are to be stored in a new protocolData (AnnotatedDataFrame) slot. The intent is that the new protocolData slot is only modified by a data file read operation, like affy's read.affybatch function. Read function ownership of the protocolData slot will provide the developer with the assurance that they are not stomping on the user's data as well as provide the end user with a clean representation of what metadata was contained within the original data files. The end user can always copy the protocolData information into the phenoData slot to make analysis easier. The power of this new protocolData slot will be dependent on the maintainers of packages that read in microarray data since there is a fair amount of metadata that can be harvested from data file headers that, through standard conventions, is currently being ignored. As part of this change, the scanDate slot has been removed from the eSet class. To make this transition smoother, I spent a day or so updating all the serialized eSet objects I could find so they will have the protocolData slot and pass a validObject() check. From my examination, all is well in the BioC 2.5 branch, but now the BioC 2.4 branch has a number of build failures because we don't fork data experiment packages and data experiment packages that have newly serialized eSet objects will not work with the release branch. These build package failures do not affect end users of BioC 2.4 since bioconductor.org has versions of the data packages that contain the old serialized eSet objects. The main problem we have now is that if you wish to patch a BioC 2.4 package and your package does not build due to a dependency on a data experiment package, we will need to hand build and push your package. This issue may get us to rethink our policy of not forking data experiment packages in svn. There is a cost with forking the data experiment packages and so far it has outweighed the benefits we would receive. If we find the benefits rising, we will adopt the same svn forking approach we have with the software packages. Thanks again for your feedback the first go around. - The Biocore Team
[Bioc-devel] Modified eSet class definition, updated serialized eSet objects, and broken BioC 2.4 builds
1 message · Patrick Aboyoun