[Bioc-devel] file registry - feedback
On 03/11/2014 09:57 AM, Valerie Obenchain wrote:
Hi Herve, On 03/10/2014 10:31 PM, Herv? Pag?s wrote:
Hi Val, I think it would help understand the motivations behind this proposal if you could give an example of a method where the user cannot supply a file name but has to create a 'File' (or 'FileList') object first. And how the file registry proposal below would help. It looks like you have such an example in the GenomicFileViews package. Do you think you could give more details?
The most recent motivating use case was in creating subclasses of GenomicFileViews objects (BamFileViews, BigWigFileViews, etc.) We wanted to have a general constructor, something like GenomicFileViews(), that would create the appropriate subclass. However to create the correct subclass we needed to know if the files were bam, bw, fasta etc. Recognition of the file type by extension would allow us to do this with no further input from the user.
That helps, thanks! Having this kind of general constructor sounds like it could indeed be useful. Would be an opportunity to put all these *File classes (the 22 RTLFile subclasses defined in rtracklayer and the 5 RsamtoolsFile subclasses defined in Rsamtools) under the same umbrella (i.e. a parent virtual class) and use the name of this virtual class (e.g. File) for the general constructor. Allowing a registration mechanism to extend the knowledge of this File() constructor is an implementation detail. I don't see a lot of benefit to it. Only a package that implements a concrete File subclass would actually need to register the new subclass. Sounds easy enough to ask to whoever has commit access to the File() code to modify it. This kind of update might also require adding the name of the package where the new File subclass is implemented to the Depends/Imports/Suggests of the package where File() lives, which is something that cannot be done via a registration mechanism. H.
Val
Thanks, H. On 03/10/2014 08:46 PM, Valerie Obenchain wrote:
Hi all, I'm soliciting feedback on the idea of a general file 'registry' that would identify file types by their extensions. This is similar in spirit to FileForformat() in rtracklayer but a more general abstraction that could be used across packages. The goal is to allow a user to supply only file name(s) to a method instead of first creating a 'File' class such as BamFile, FaFile, BigWigFile etc. A first attempt at this is in the GenomicFileViews package (https://github.com/Bioconductor/GenomicFileViews). A registry (lookup) is created as an environment at load time: .fileTypeRegistry <- new.env(parent=emptyenv() Files are registered with an information triplet consisting of class, package and regular expression to identify the extension. In GenomicFileViews we register FaFileList, BamFileList and BigWigFileList but any 'File' class can be registered that has a constructor of the same name. .onLoad <- function(libname, pkgname) { registerFileType("FaFileList", "Rsamtools", "\\.fa$") registerFileType("FaFileList", "Rsamtools", "\\.fasta$") registerFileType("BamFileList", "Rsamtools", "\\.bam$") registerFileType("BigWigFileList", "rtracklayer", "\\.bw$") } The makeFileType() helper creates the appropriate class. This function is used behind the scenes to do the lookup and coerce to the correct 'File' class.
> makeFileType(c("foo.bam", "bar.bam"))
BamFileList of length 2 names(2): foo.bam bar.bam New types can be added at any time with registerFileType(): registerFileType(NewClass, NewPackage, "\\.NewExtension$") Thoughts: (1) If this sounds generally useful where should it live? rtracklayer, GenomicFileViews or other? Alternatively it could be its own lightweight package (FileRegister) that creates the registry and provides the helpers. It would be up to the package authors that depend on FileRegister to register their own files types at load time. (2) To avoid potential ambiguities maybe searching should be by regex and package name. Still a work in progress. Valerie
_______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
Herv? Pag?s Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpages at fhcrc.org Phone: (206) 667-5791 Fax: (206) 667-1319