Skip to content
Prev 12995 / 21312 Next

[Bioc-devel] on to AnnotationTrack with rtracklayer [was Re: IGV VCF demo, other suggestions? [was Re: IGV - a new package in preparation]]

I have now implemented VCF tracks for IGV, supporting both 

   a local VCF object read and filtered by the VariantAnnotation package, and
   a remote webserver-hosted vcf file.   

In normal use I expect (and recommend) that the local VCF object will be relatively small (< 1Mb, < 50 samples - or some tradeoff of those approximate numbers), and that the genome scale vcf file is accompanied by an index.  

I am now turning to annotation tracks: bed, bed9, gff, gff3, gtf.  rtracklayer provides a good set of importers for these formats, and S4 classes to represent them (apparently all are subclasses of GenomicRanges): 
 
   BEDFile (3 required fields, up to 9 optional fields - https://genome.ucsc.edu/FAQ/FAQformat.html#format1)
   GFFFile (includes gff, gff3, gtf)

I propose to support four different representations of these data in R:

   data.frame
   the two rtracklayer classes
   a url pointing to a web-hosted and indexed annotation

The AnnotationTrack constructor accepts all three in the ?annotation? parameter, a simple version of which (with many parameters defaulted) is:

  track <- AnnotationTrack(trackName, annotation, color, displayMode)

The annotation parameter will be inspected by the constructor: is it a data.frame? a BEDFile?  a GFFFile?  a url?

The local data is reformatted as needed into a file with a format igv.js understands - native bed and gff text files - then passed to igv as a local url. Remote urls are transmitted without change.

Does this sound right?  If you have a minute to comment, now is a good time to offer critique and suggestions on annotation tracks.

Next up after the AnnotationTrack class will be alignment (bam) tracks and, if I get to it before package submission data, a ?seg? track for segmented copy number data.

Last week Gabe asked:
I hope I don?t sound disrespectful by describing these shorter methods as only syntactic simplifications with a little S4 dispatch thrown in.    They have value, for sure, but are they not just a relatively thin layer on top of the classes I am writing now?   *If* that description is accurate, then I?d rather consider adding them later, after the nuts and bolts and basic operations are all written, tested, and subjected to a few months of user QC.  I admit that I also prefer the greater operational clarity which for me, with my plodding brain, comes from using by explicit data types and explicit constructors.)  

 - Paul