[Bioc-devel] IGV - a new package in preparation - Bioc-devel

Wed, Mar 7, 2018 2:15 PM #

To avoid duplication of effort, and perhaps to attract some early reviewers, I figure I?d let this group know that I plan to submit a new package ?IGV? for inclusion in the next Bioconductor release.

The package will provide an interface to the excellent and quite new browser-based genome viewer written by Jim Robison and colleagues, igv.js:

   https://github.com/igvteam/igv.js

IGV depends upon RStudio?s httpuv websocket library for passing JSON messages between an R session and igv.js running in the browser.  Communication goes both ways - both ends are fully and independently interactive.  

My goal with IGV is to support all of the tracks mentioned here

   https://github.com/igvteam/igv.js/wiki/Tracks

Note that though igv.js typically gets its track data from CORS/indexed webservers, the IGV package will also support locally created R data.frames describing either bed or wig tracks - annotation and quantitative, respectively - without any need to host those tracks on a pre-existing webserver.  httpuv includes a minimal webserver which can adequately serve the temporary files IGV creates from your data.frames.

In the years since the first appearance of my RCyjs package (which has a similarly design, and the same base class, using websockets to communicate between R and the browser), RStudio and Hector Corrado Bravo have added async web socket support for Windows to httpuv.  This means IGV (and RCyjs also) will run on all platforms.  A refactored BrowserViz package (which might be useful to anyone wishing to do similar R-to-browser communication) will accompany my submission.

For javascript development, I have adopted commonly used strategies and tools, using npm and webpack to build a single, all-libraries-included html/js/css file for loading into the browser.  This allows us to control library versioning and to improve browser load times.  The single combined html/js/css file is created, not as part of R CMD build, but with a prior and separate, developer-only makefile maintained in the package?s inst/browserCode directory.  Only that combined html/js/css file is included in the package tarball, along with configuration files to rebuild it, but not including all of the usually large number of node_modules that contributed to its construction.

Comments and suggestions welcome.

 - Paul

Gabriel Becker

Wed, Mar 7, 2018 2:40 PM #

Paul,

Sounds cool! My one note after a quick first pass is that here:

On Wed, Mar 7, 2018 at 2:15 PM, Paul Shannon <pshannon at systemsbiology.org>
wrote:

It seems to me that those data.frames should be replaced with the core
Bioconductor object classes which represent the types of information being
displayed.  You might look to epivizr for  inspiration here, which (IIRC)
allows "tracks" within epiviz to be backed by bioconductor objects.

Best,
~G

_______________________________________________
Bioc-devel at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

Gabriel Becker, Ph.D
Scientist
Bioinformatics and Computational Biology
Genentech Research

	[[alternative HTML version deleted]]

Paul Shannon

Wed, Mar 7, 2018 3:29 PM #

Thanks, Gabe.   

You make an excellent point: bioc objects get first class support.  In some instance, base R data types deserve that also, and data.frames lead the list for me, being useful, concise, universally available, expressive.

So perhaps not ?data.frames replaced by? but ?accompanied by? appropriate bioc data types?

 - Paul

_______________________________________________
Bioc-devel at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel



-- 
Gabriel Becker, Ph.D
Scientist
Bioinformatics and Computational Biology
Genentech Research

Levi Waldron

Fri, Mar 9, 2018 4:15 AM #

On Thu, Mar 8, 2018 at 12:29 AM, Paul Shannon <pshannon at systemsbiology.org>
wrote:

Definitely +1 for supporting GenomicRanges, including what's in genome()
and mcols(). There's a demo of an rtracklayer -> GRanges -> UCSC genome
browser workflow in the rtracklayer vignette
<http://bioconductor.org/packages/release/bioc/vignettes/rtracklayer/inst/doc/rtracklayer.pdf>
that
I've made use of. I wouldn't necessarily say *don't* support data.frame,
but I would certainly encourage Bioc users to import data with rtracklayer
instead of generic read* functions, and to take advantage of the vast
AnnotationHub and OrganismDbi-based annotations which provide GenomicRanges
objects.

Thanks and looking forward to it!

Paul Shannon

Fri, Mar 9, 2018 9:59 AM #

Thanks, Levi. Your comments, and Gabe?s are very helpful, getting me to consider things I have overlooked.

Support for GenomicRanges is essential, as you and Gabe point out.

In all cases IGV will convert a GRanges object to an appropriate track, then write it out as a temporary file. igv supports bed, gff, gff3, gtf, wig, bigWig, bedGraph, bam, vcf, and seg formats, and a variety of sources: files via http, google cloud storage, GA4GH; recent limited support has been provided for direct javascript data. Maybe someday AnnotationHub?

GenomicRanges as I understand them are very flexible, not subclassed into types as are track formats. So I propose that in many cases it will be he user?s responsibility to specify track type, call the appropriate constructor, maybe specify column names so that the right scores can be extracted from the mcols - whose names are, so far as I know, are not standardized.

If the GRanges object is too big - greater than a densely packed megabase, for instance, igv works best if the track file is indexed and served up by an index- and CORS-savvy webserver. Thus the IGV should politely fail - or at least issue a warning - when encounters big tracks. This ?too big? threshold may change over time.

Reading through Michael?s rtracklayer vignette I came across this:

The rtracklayer package currently interfaces with the UCSC web-based genome browser.
Other packages may provide drivers for other genome browsers through a plugin system.

Can anyone (maybe Michael himself?) comment on how I can evaluate an rtracklayer plugin strategy for igv?

- Paul

Michael Lawrence

Fri, Mar 9, 2018 11:48 AM #

Couple of things:

1) Check out epivizr and the surrounding infrastructure (maybe Hector can
chime in). It's able to serve up data directly from R; would be nice if we
could do that with IGV, instead of writing out to files. That would require
it to talk to some standard API, like the old DAS.

2) The rtracklayer API is in rtracklayer/R/browser.R. See ucsc.R for how
that is implemented for UCSC.

On Fri, Mar 9, 2018 at 9:59 AM, Paul Shannon <pshannon at systemsbiology.org>
wrote:

Thanks, Levi. Your comments, and Gabe?s are very helpful, getting me to
consider things I have overlooked.

Support for GenomicRanges is essential, as you and Gabe point out.

In all cases IGV will convert a GRanges object to an appropriate track,
then write it out as a temporary file.  igv supports bed, gff, gff3, gtf,
wig, bigWig, bedGraph, bam, vcf, and seg formats, and a variety of
sources:  files via http, google cloud storage, GA4GH; recent limited
support has been provided for direct javascript data.   Maybe someday
AnnotationHub?

GenomicRanges as I understand them are very flexible, not subclassed into
types as are track formats.  So I propose that in many cases it will be he
user?s responsibility to specify track type, call the appropriate
constructor, maybe specify column names so that the right scores can be
extracted from the mcols - whose names are, so far as I know, are not
standardized.

If the GRanges object is too big - greater than a densely packed megabase,
for instance, igv works best if the track file is indexed and served up by
an index- and CORS-savvy webserver.   Thus the IGV should politely fail -
or at least issue a warning -  when encounters big tracks.  This ?too big?
threshold may change over time.

Reading through Michael?s rtracklayer vignette I came across this:

   The rtracklayer package currently interfaces with the UCSC web-based
genome browser.
   Other packages may provide drivers for other genome browsers through a
plugin system.

Can anyone (maybe Michael himself?) comment on how I can evaluate an
rtracklayer plugin strategy for igv?

 - Paul

On Mar 9, 2018, at 4:15 AM, Levi Waldron <lwaldron.research at gmail.com>

wrote:

On Thu, Mar 8, 2018 at 12:29 AM, Paul Shannon <

pshannon at systemsbiology.org> wrote:

Thanks, Gabe.

You make an excellent point: bioc objects get first class support.  In

some instance, base R data types deserve that also, and data.frames lead
the list for me, being useful, concise, universally available, expressive.

So perhaps not ?data.frames replaced by? but ?accompanied by?

appropriate bioc data types?

 - Paul

Definitely +1 for supporting GenomicRanges, including what's in genome()

and mcols(). There's a demo of an rtracklayer -> GRanges -> UCSC genome
browser workflow in the rtracklayer vignette that I've made use of. I
wouldn't necessarily say *don't* support data.frame, but I would certainly
encourage Bioc users to import data with rtracklayer instead of generic
read* functions, and to take advantage of the vast AnnotationHub and
OrganismDbi-based annotations which provide GenomicRanges objects.

Thanks and looking forward to it!

_______________________________________________
Bioc-devel at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

Paul Shannon

Fri, Mar 9, 2018 11:57 AM #

Thanks, Michael.

httpuv, to which Hector made crucial contributions, makes it easy to send data directly between R and the browser, using websockets.   I resort to files, however, because when the data, rendered as json, exceeds 500k, the websocket hangs.  I never identified the weak spot.   Some Juypter developers recently had good luck with binary websocket data exchange.  I am cautious, though, about pushing limits and using the latest websocket extension, and found the fallback to local files quite adequate for now.

I?ll look at ucsc.R.

- Paul

On Mar 9, 2018, at 11:48 AM, Michael Lawrence <lawrence.michael at gene.com> wrote:

Couple of things:

1) Check out epivizr and the surrounding infrastructure (maybe Hector can chime in). It's able to serve up data directly from R; would be nice if we could do that with IGV, instead of writing out to files. That would require it to talk to some standard API, like the old DAS.

2) The rtracklayer API is in rtracklayer/R/browser.R. See ucsc.R for how that is implemented for UCSC.

On Fri, Mar 9, 2018 at 9:59 AM, Paul Shannon <pshannon at systemsbiology.org> wrote:
Thanks, Levi. Your comments, and Gabe?s are very helpful, getting me to consider things I have overlooked.

Support for GenomicRanges is essential, as you and Gabe point out.

In all cases IGV will convert a GRanges object to an appropriate track, then write it out as a temporary file. igv supports bed, gff, gff3, gtf, wig, bigWig, bedGraph, bam, vcf, and seg formats, and a variety of sources: files via http, google cloud storage, GA4GH; recent limited support has been provided for direct javascript data. Maybe someday AnnotationHub?

GenomicRanges as I understand them are very flexible, not subclassed into types as are track formats. So I propose that in many cases it will be he user?s responsibility to specify track type, call the appropriate constructor, maybe specify column names so that the right scores can be extracted from the mcols - whose names are, so far as I know, are not standardized.

If the GRanges object is too big - greater than a densely packed megabase, for instance, igv works best if the track file is indexed and served up by an index- and CORS-savvy webserver. Thus the IGV should politely fail - or at least issue a warning - when encounters big tracks. This ?too big? threshold may change over time.

Reading through Michael?s rtracklayer vignette I came across this:

The rtracklayer package currently interfaces with the UCSC web-based genome browser.
Other packages may provide drivers for other genome browsers through a plugin system.

Can anyone (maybe Michael himself?) comment on how I can evaluate an rtracklayer plugin strategy for igv?

- Paul

On Mar 9, 2018, at 4:15 AM, Levi Waldron <lwaldron.research at gmail.com> wrote:

On Thu, Mar 8, 2018 at 12:29 AM, Paul Shannon <pshannon at systemsbiology.org> wrote:
Thanks, Gabe.

You make an excellent point: bioc objects get first class support.  In some instance, base R data types deserve that also, and data.frames lead the list for me, being useful, concise, universally available, expressive.

So perhaps not ?data.frames replaced by? but ?accompanied by? appropriate bioc data types?

 - Paul

Definitely +1 for supporting GenomicRanges, including what's in genome() and mcols(). There's a demo of an rtracklayer -> GRanges -> UCSC genome browser workflow in the rtracklayer vignette that I've made use of. I wouldn't necessarily say *don't* support data.frame, but I would certainly encourage Bioc users to import data with rtracklayer instead of generic read* functions, and to take advantage of the vast AnnotationHub and OrganismDbi-based annotations which provide GenomicRanges objects.

Thanks and looking forward to it!

_______________________________________________
Bioc-devel at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

Michael Lawrence

Fri, Mar 9, 2018 12:11 PM #

Yea I wouldn't use JSON, particularly "row-oriented" JSON, as a means of
scalable data transmission.

On Fri, Mar 9, 2018 at 11:57 AM, Paul Shannon <pshannon at systemsbiology.org>
wrote:

Thanks, Michael.

httpuv, to which Hector made crucial contributions, makes it easy to send
data directly between R and the browser, using websockets.   I resort to
files, however, because when the data, rendered as json, exceeds 500k, the
websocket hangs.  I never identified the weak spot.   Some Juypter
developers recently had good luck with binary websocket data exchange.  I
am cautious, though, about pushing limits and using the latest websocket
extension, and found the fallback to local files quite adequate for now.

I?ll look at ucsc.R.

- Paul

On Mar 9, 2018, at 11:48 AM, Michael Lawrence <lawrence.michael at gene.com>

wrote:

Couple of things:

1) Check out epivizr and the surrounding infrastructure (maybe Hector

can chime in). It's able to serve up data directly from R; would be nice if
we could do that with IGV, instead of writing out to files. That would
require it to talk to some standard API, like the old DAS.

2) The rtracklayer API is in rtracklayer/R/browser.R. See ucsc.R for how

that is implemented for UCSC.

On Fri, Mar 9, 2018 at 9:59 AM, Paul Shannon <

pshannon at systemsbiology.org> wrote:

Thanks, Levi. Your comments, and Gabe?s are very helpful, getting me to

consider things I have overlooked.

Support for GenomicRanges is essential, as you and Gabe point out.

In all cases IGV will convert a GRanges object to an appropriate track,

then write it out as a temporary file.  igv supports bed, gff, gff3, gtf,
wig, bigWig, bedGraph, bam, vcf, and seg formats, and a variety of
sources:  files via http, google cloud storage, GA4GH; recent limited
support has been provided for direct javascript data.   Maybe someday
AnnotationHub?

GenomicRanges as I understand them are very flexible, not subclassed

into types as are track formats.  So I propose that in many cases it will
be he user?s responsibility to specify track type, call the appropriate
constructor, maybe specify column names so that the right scores can be
extracted from the mcols - whose names are, so far as I know, are not
standardized.

If the GRanges object is too big - greater than a densely packed

megabase, for instance, igv works best if the track file is indexed and
served up by an index- and CORS-savvy webserver.   Thus the IGV should
politely fail - or at least issue a warning -  when encounters big tracks.
This ?too big? threshold may change over time.

Reading through Michael?s rtracklayer vignette I came across this:

   The rtracklayer package currently interfaces with the UCSC web-based

genome browser.

   Other packages may provide drivers for other genome browsers through

a plugin system.

Can anyone (maybe Michael himself?) comment on how I can evaluate an

rtracklayer plugin strategy for igv?

 - Paul

On Mar 9, 2018, at 4:15 AM, Levi Waldron <lwaldron.research at gmail.com>

wrote:

On Thu, Mar 8, 2018 at 12:29 AM, Paul Shannon <

pshannon at systemsbiology.org> wrote:

Thanks, Gabe.

You make an excellent point: bioc objects get first class support.  In

some instance, base R data types deserve that also, and data.frames lead
the list for me, being useful, concise, universally available, expressive.

So perhaps not ?data.frames replaced by? but ?accompanied by?

appropriate bioc data types?

 - Paul

Definitely +1 for supporting GenomicRanges, including what's in

genome() and mcols(). There's a demo of an rtracklayer -> GRanges -> UCSC
genome browser workflow in the rtracklayer vignette that I've made use of.
I wouldn't necessarily say *don't* support data.frame, but I would
certainly encourage Bioc users to import data with rtracklayer instead of
generic read* functions, and to take advantage of the vast AnnotationHub
and OrganismDbi-based annotations which provide GenomicRanges objects.

Thanks and looking forward to it!

_______________________________________________
Bioc-devel at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

_______________________________________________
Bioc-devel at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

Cook, Malcolm

Fri, Mar 9, 2018 12:33 PM #

Hello,

Jumping in the conversation perhaps late.

If it helps the effort, below are some IGV related R functions I've used in the past to good effect communicating with IGV running on local/remote host and issuing GOTO and Save Snapshot commands.

They use utils::write.socket

One things that helped my experience greatly was ensuring socket always gets closed.

################################################################################
### IGV Interaction
################################################################################

library(functional)

OKCmd.socket<-function(socket,cmds,OK="^OK\\s*"){
  ## PURPOSE: write each cmd in <cmds>, a character vector, to the
  ## <socket> (appending newlines), warning for those that don't
  ## return <OK> (such as IGV responds)
  lapply(cmds,function(cmd) {
    write.socket(socket,paste(cmd,"\n"))
    result=read.socket(socket)
    if(1 != regexpr(OK,result))
      warning(OK,' was expected in OKCmd.socket while executing [',cmd,'] but received [',result,']')
    result
  })
}

IGV.make.socket<-
  ## PURPOSE: a version of make.socket which abides to IGV's default
  ## port convention.
  Curry(make.socket,port=60151)

IGV.tell<-function(cmds,...) {
  ## PURPOSE: send all the <cmds> to IGV on port determined by <...>,
  ## warning if unexpected IGV response, and being careful to cleanup
  ## by closing the socket.
  message(cmds)
  with(list(s=IGV.make.socket(...)),{
    on.exit(close.socket(s))
    OKCmd.socket(s,cmds)
  })}

attr(IGV.tell,'ex')<-function(){
  IGV.tell('goto B52','myComputer.myInstitute.org')
  IGV.tell('goto chr4:1234:12345')
  }

sanitize.path<-function(path) {
   ## YMMV
    path<-gsub('%','',path,perl=TRUE)
    path<-gsub('\\|',' by ',path,perl=TRUE) # good choice?
    path<-gsub('[\\,]','',path,perl=TRUE)
    path<-gsub('\\/','\\;',path,perl=TRUE)
    path<-gsub('\\n',' ',path,perl=TRUE)
    path
}

IGVSaveSnapshots<-structure(
    function (region
              ,dir='./'
              ,filename=gsub(':','@',sanitize.path(region))
              ,type='png'
              ,filepath=sprintf('%s.%s',filename,type)
              ,...) {
        ## PURPOSE: create snapshots of <region> in <dir> in IGV socket
        ## behind host/port in ....  NB: the dir must be relative to the
        ## host on which IGV is running, which may not exist on localhost.
        IGV.cmds.gosnap<-function(x,filepath)
            c(sprintf("goto %s",x)
              ,sprintf("snapshot %s" ,filepath)
              )
        IGV.tell(c(
            sprintf("snapshotDirectory %s",dir)
            ,mapply(IGV.cmds.gosnap,region,filepath)
            )
           ,...)
      sprintf("%s/%s",dir,filepath)
    },ex=function(){
        ## 1) Take two snapshots using IGV running on remote host
        IGVSaveSnapshots(dir='/Volumes/Users/lab_project/SR_Prot_projects/iClip/myGenes/'
                         ,c('ns1'        # can be a gene name...
                            ,'X:123-1234' # ...or locus
                            )
                         ,'M0050U1ZE6')
        IGVSaveSnapshots(dir='\\\\ion\\projects\\mec\\ShilatifardLab\\analysis\\fec\\triptolide\\fig'
                         ,region=c(
                              ## can be...:
                              'chrX:123-1234' # ...a locus
                                        #,'ASNS'         # .. a gene name...
                              )
                         ,type='svg'
                         ,'LA10MJDPKM5.sgc.loc')
    })

~malcolm_cook at stowers.org

 > -----Original Message-----
 > From: Bioc-devel <bioc-devel-bounces at r-project.org> On Behalf Of Paul
 > Shannon
 > Sent: Friday, March 09, 2018 1:58 PM
 > To: Michael Lawrence <lawrence.michael at gene.com>
 > Cc: Gabe Becker <becker.gabe at gene.com>; bioc-devel at r-project.org; Paul
 > Shannon <pshannon at systemsbiology.org>
 > Subject: Re: [Bioc-devel] IGV - a new package in preparation
 > 
 > Thanks, Michael.
 > 
 > httpuv, to which Hector made crucial contributions, makes it easy to send
 > data directly between R and the browser, using websockets.   I resort to
 > files, however, because when the data, rendered as json, exceeds 500k, the
 > websocket hangs.  I never identified the weak spot.   Some Juypter
 > developers recently had good luck with binary websocket data exchange.  I
 > am cautious, though, about pushing limits and using the latest websocket
 > extension, and found the fallback to local files quite adequate for now.
 > 
 > I?ll look at ucsc.R.
 > 
 > - Paul
 > 
 > 
 > > On Mar 9, 2018, at 11:48 AM, Michael Lawrence

> <lawrence.michael at gene.com> wrote:

> >
 > > Couple of things:
 > >
 > > 1) Check out epivizr and the surrounding infrastructure (maybe Hector can
 > chime in). It's able to serve up data directly from R; would be nice if we
 > could do that with IGV, instead of writing out to files. That would require it
 > to talk to some standard API, like the old DAS.
 > >
 > > 2) The rtracklayer API is in rtracklayer/R/browser.R. See ucsc.R for how
 > that is implemented for UCSC.
 > >
 > > On Fri, Mar 9, 2018 at 9:59 AM, Paul Shannon

> <pshannon at systemsbiology.org> wrote:

> > Thanks, Levi. Your comments, and Gabe?s are very helpful, getting me to
 > consider things I have overlooked.
 > >
 > > Support for GenomicRanges is essential, as you and Gabe point out.
 > >
 > > In all cases IGV will convert a GRanges object to an appropriate track, then
 > write it out as a temporary file.  igv supports bed, gff, gff3, gtf, wig, bigWig,
 > bedGraph, bam, vcf, and seg formats, and a variety of sources:  files via
 > http, google cloud storage, GA4GH; recent limited support has been
 > provided for direct javascript data.   Maybe someday AnnotationHub?
 > >
 > > GenomicRanges as I understand them are very flexible, not subclassed
 > into types as are track formats.  So I propose that in many cases it will be he
 > user?s responsibility to specify track type, call the appropriate constructor,
 > maybe specify column names so that the right scores can be extracted from
 > the mcols - whose names are, so far as I know, are not standardized.
 > >
 > > If the GRanges object is too big - greater than a densely packed
 > megabase, for instance, igv works best if the track file is indexed and served
 > up by an index- and CORS-savvy webserver.   Thus the IGV should politely
 > fail - or at least issue a warning -  when encounters big tracks.  This ?too big?
 > threshold may change over time.
 > >
 > > Reading through Michael?s rtracklayer vignette I came across this:
 > >
 > >    The rtracklayer package currently interfaces with the UCSC web-based
 > genome browser.
 > >    Other packages may provide drivers for other genome browsers through
 > a plugin system.
 > >
 > > Can anyone (maybe Michael himself?) comment on how I can evaluate an
 > rtracklayer plugin strategy for igv?
 > >
 > >  - Paul
 > >
 > >
 > > > On Mar 9, 2018, at 4:15 AM, Levi Waldron

> <lwaldron.research at gmail.com> wrote:

> > >
 > > > On Thu, Mar 8, 2018 at 12:29 AM, Paul Shannon

> <pshannon at systemsbiology.org> wrote:

> > > Thanks, Gabe.
 > > >
 > > > You make an excellent point: bioc objects get first class support.  In
 > some instance, base R data types deserve that also, and data.frames lead
 > the list for me, being useful, concise, universally available, expressive.
 > > >
 > > > So perhaps not ?data.frames replaced by? but ?accompanied by?
 > appropriate bioc data types?
 > > >
 > > >  - Paul
 > > >
 > > > Definitely +1 for supporting GenomicRanges, including what's in
 > genome() and mcols(). There's a demo of an rtracklayer -> GRanges -> UCSC
 > genome browser workflow in the rtracklayer vignette that I've made use of.
 > I wouldn't necessarily say *don't* support data.frame, but I would certainly
 > encourage Bioc users to import data with rtracklayer instead of generic
 > read* functions, and to take advantage of the vast AnnotationHub and
 > OrganismDbi-based annotations which provide GenomicRanges objects.
 > > >
 > > > Thanks and looking forward to it!
 > > >
 > >
 > > _______________________________________________
 > > Bioc-devel at r-project.org mailing list
 > > https://stat.ethz.ch/mailman/listinfo/bioc-devel
 > >
 > >
 > 
 > _______________________________________________
 > Bioc-devel at r-project.org mailing list
 > https://stat.ethz.ch/mailman/listinfo/bioc-devel

Cook, Malcolm

Fri, Mar 9, 2018 12:36 PM #

Hi,


 > -----Original Message-----
 > From: Bioc-devel <bioc-devel-bounces at r-project.org> On Behalf Of
 > Michael Lawrence
 > Sent: Friday, March 09, 2018 1:49 PM
 > To: Paul Shannon <pshannon at systemsbiology.org>
 > Cc: Gabe Becker <becker.gabe at gene.com>; bioc-devel at r-project.org
 > Subject: Re: [Bioc-devel] IGV - a new package in preparation
 > 
 > Couple of things:
 > 
 > 1) Check out epivizr and the surrounding infrastructure (maybe Hector can
 > chime in). It's able to serve up data directly from R; would be nice if we
 > could do that with IGV, instead of writing out to files. That would require
 > it to talk to some standard API, like the old DAS.

One value of writing to files is that if IGV is running on remote host then retrieval via byte-range encoding continues to just work.

Of course this is dependent upon what you are trying to do.

~malcolm_cook at stowers.org

 > 
 > 2) The rtracklayer API is in rtracklayer/R/browser.R. See ucsc.R for how
 > that is implemented for UCSC.
 > 
 > On Fri, Mar 9, 2018 at 9:59 AM, Paul Shannon
 > <pshannon at systemsbiology.org>

> wrote:

> 
 > > Thanks, Levi. Your comments, and Gabe?s are very helpful, getting me to
 > > consider things I have overlooked.
 > >
 > > Support for GenomicRanges is essential, as you and Gabe point out.
 > >
 > > In all cases IGV will convert a GRanges object to an appropriate track,
 > > then write it out as a temporary file.  igv supports bed, gff, gff3, gtf,
 > > wig, bigWig, bedGraph, bam, vcf, and seg formats, and a variety of
 > > sources:  files via http, google cloud storage, GA4GH; recent limited
 > > support has been provided for direct javascript data.   Maybe someday
 > > AnnotationHub?
 > >
 > > GenomicRanges as I understand them are very flexible, not subclassed
 > into
 > > types as are track formats.  So I propose that in many cases it will be he
 > > user?s responsibility to specify track type, call the appropriate
 > > constructor, maybe specify column names so that the right scores can be
 > > extracted from the mcols - whose names are, so far as I know, are not
 > > standardized.
 > >
 > > If the GRanges object is too big - greater than a densely packed
 > megabase,
 > > for instance, igv works best if the track file is indexed and served up by
 > > an index- and CORS-savvy webserver.   Thus the IGV should politely fail -
 > > or at least issue a warning -  when encounters big tracks.  This ?too big?
 > > threshold may change over time.
 > >
 > > Reading through Michael?s rtracklayer vignette I came across this:
 > >
 > >    The rtracklayer package currently interfaces with the UCSC web-based
 > > genome browser.
 > >    Other packages may provide drivers for other genome browsers through
 > a
 > > plugin system.
 > >
 > > Can anyone (maybe Michael himself?) comment on how I can evaluate an
 > > rtracklayer plugin strategy for igv?
 > >
 > >  - Paul
 > >
 > >
 > > > On Mar 9, 2018, at 4:15 AM, Levi Waldron
 > <lwaldron.research at gmail.com>

> > wrote:

> > >
 > > > On Thu, Mar 8, 2018 at 12:29 AM, Paul Shannon <

> > pshannon at systemsbiology.org> wrote:

> > > Thanks, Gabe.
 > > >
 > > > You make an excellent point: bioc objects get first class support.  In
 > > some instance, base R data types deserve that also, and data.frames lead
 > > the list for me, being useful, concise, universally available, expressive.
 > > >
 > > > So perhaps not ?data.frames replaced by? but ?accompanied by?
 > > appropriate bioc data types?
 > > >
 > > >  - Paul
 > > >
 > > > Definitely +1 for supporting GenomicRanges, including what's in
 > genome()
 > > and mcols(). There's a demo of an rtracklayer -> GRanges -> UCSC genome
 > > browser workflow in the rtracklayer vignette that I've made use of. I
 > > wouldn't necessarily say *don't* support data.frame, but I would certainly
 > > encourage Bioc users to import data with rtracklayer instead of generic
 > > read* functions, and to take advantage of the vast AnnotationHub and
 > > OrganismDbi-based annotations which provide GenomicRanges objects.
 > > >
 > > > Thanks and looking forward to it!
 > > >
 > >
 > > _______________________________________________
 > > Bioc-devel at r-project.org mailing list
 > > https://stat.ethz.ch/mailman/listinfo/bioc-devel
 > >
 > >
 > 
 > 	[[alternative HTML version deleted]]
 > 
 > _______________________________________________
 > Bioc-devel at r-project.org mailing list
 > https://stat.ethz.ch/mailman/listinfo/bioc-devel

Michael Lawrence

Fri, Mar 9, 2018 1:00 PM #

On Fri, Mar 9, 2018 at 12:36 PM, Cook, Malcolm <MEC at stowers.org> wrote:

Sure, and we'd want the API to support that as well (like epiviz does now).

~malcolm_cook at stowers.org

 >
 > 2) The rtracklayer API is in rtracklayer/R/browser.R. See ucsc.R for how
 > that is implemented for UCSC.
 >
 > On Fri, Mar 9, 2018 at 9:59 AM, Paul Shannon
 > <pshannon at systemsbiology.org>
 > wrote:
 >

 > > Thanks, Levi. Your comments, and Gabe?s are very helpful, getting me

to

 > > consider things I have overlooked.
 > >
 > > Support for GenomicRanges is essential, as you and Gabe point out.
 > >
 > > In all cases IGV will convert a GRanges object to an appropriate

track,

 > > then write it out as a temporary file.  igv supports bed, gff, gff3,

gtf,

 > > wig, bigWig, bedGraph, bam, vcf, and seg formats, and a variety of
 > > sources:  files via http, google cloud storage, GA4GH; recent limited
 > > support has been provided for direct javascript data.   Maybe someday
 > > AnnotationHub?
 > >
 > > GenomicRanges as I understand them are very flexible, not subclassed

 > into

 > > types as are track formats.  So I propose that in many cases it will

be he

 > > user?s responsibility to specify track type, call the appropriate
 > > constructor, maybe specify column names so that the right scores can

be

 > > extracted from the mcols - whose names are, so far as I know, are not
 > > standardized.
 > >
 > > If the GRanges object is too big - greater than a densely packed

 > megabase,

 > > for instance, igv works best if the track file is indexed and served

up by

 > > an index- and CORS-savvy webserver.   Thus the IGV should politely

fail -

 > > or at least issue a warning -  when encounters big tracks.  This ?too

big?

 > > threshold may change over time.
 > >
 > > Reading through Michael?s rtracklayer vignette I came across this:
 > >
 > >    The rtracklayer package currently interfaces with the UCSC

web-based

 > > genome browser.
 > >    Other packages may provide drivers for other genome browsers

through

> a

 > > plugin system.
 > >
 > > Can anyone (maybe Michael himself?) comment on how I can evaluate an
 > > rtracklayer plugin strategy for igv?
 > >
 > >  - Paul
 > >
 > >

 > > > On Mar 9, 2018, at 4:15 AM, Levi Waldron

 > <lwaldron.research at gmail.com>

 > > wrote:

 > > >
 > > > On Thu, Mar 8, 2018 at 12:29 AM, Paul Shannon <

 > > pshannon at systemsbiology.org> wrote:

 > > > Thanks, Gabe.
 > > >
 > > > You make an excellent point: bioc objects get first class support.

In

 > > some instance, base R data types deserve that also, and data.frames

lead

 > > the list for me, being useful, concise, universally available,

expressive.

 > > >
 > > > So perhaps not ?data.frames replaced by? but ?accompanied by?

 > > appropriate bioc data types?

 > > >
 > > >  - Paul
 > > >
 > > > Definitely +1 for supporting GenomicRanges, including what's in

 > genome()

 > > and mcols(). There's a demo of an rtracklayer -> GRanges -> UCSC

genome

 > > browser workflow in the rtracklayer vignette that I've made use of. I
 > > wouldn't necessarily say *don't* support data.frame, but I would

certainly

 > > encourage Bioc users to import data with rtracklayer instead of

generic

 > > read* functions, and to take advantage of the vast AnnotationHub and
 > > OrganismDbi-based annotations which provide GenomicRanges objects.

 > > >
 > > > Thanks and looking forward to it!
 > > >

 > >
 > > _______________________________________________
 > > Bioc-devel at r-project.org mailing list
 > > https://stat.ethz.ch/mailman/listinfo/bioc-devel
 > >
 > >

 >
 >      [[alternative HTML version deleted]]
 >
 > _______________________________________________
 > Bioc-devel at r-project.org mailing list
 > https://stat.ethz.ch/mailman/listinfo/bioc-devel

Paul Shannon

Mon, Mar 12, 2018 5:20 PM #

Gabe and Levi made a good case for supporting GRanges in IGV.   Looking at the GenomicRanges vignettes, it appears that many of Herve?s introductory examples have GC content as the mcols column of interest.   Would that be a good test and demo for IGV?  Or perhaps some other genomic quantity,  one for which sample data is already present in some Bioconductor package?s extdata?

The IGV VCF track now works, using GenomicRanges and VariantAnnotation.  It might be of interest, maybe lead to more useful suggestions which would be good for me to hear at this stage.   Here is a code chunk using default parameters for colors, track height and etc.  Homozygous non-reference calls are rendered in light blue, heterozygous in dark blue, reference in gray.

library(IGV)
library(VariantAnnotation)
igv <- IGV(portRange=9000:9020)
setGenome(igv, ?hg19")
setBrowserWindowTitle(igv, ?VCF demo?)

f <- system.file("extdata", "chr22.vcf.gz", package=?VariantAnnotation?)
chrom <- ?22"
start <- 50586118
end   <- 50633733

rng <- GRanges(seqnames=chrom, ranges=IRanges(start=start, end=end))
vcf.sub <- readVcf(f, "hg19", param=rng)

track <- VariantTrack(?chr22-tiny", vcf.sub)
displayTrack(igv, track)
showGenomicRegion(igv, sprintf("chr22:%d-%d", start-1000, end+1000))

Suggestions?

Michael Lawrence

Mon, Mar 12, 2018 7:29 PM #

You could look at the rtracklayer API. For example, using gets functions
like track<-() and range<-() to set track and region may be more natural to
R users. Then again, if there were endomorphic functions add_track() and
set_range(), the API would support chaining. There should be no need to
explicitly construct a track; just rely on dispatch and class semantics,
i.e., passing a VCF object to add_track() would create a variant track
automatically.




On Mon, Mar 12, 2018 at 5:20 PM, Paul Shannon <

paul.thurmond.shannon at gmail.com> wrote:

_______________________________________________
Bioc-devel at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

Paul Shannon

Wed, Mar 14, 2018 12:40 PM #

Hi Michael,

Set me straight if I got this wrong.   You suggest:

But wouldn?t 

   displayTrack(vcf) 

preclude any easy specification of options - which vary across track types - which are straightforward, easily managed and checked, by a set of track constructors?  

Two examples:

   displayTrack(VariantTrack(vcf, title=?mef2c eqtl?, height=?300?, homrefColor=?lightGray?, 
                             homVarColor=?darkRed?, hetVarColor=?lightRed?))

   displayTrack(AlignmentTrack(x, title=?bam 32?, viewAsPairs=TRUE, insertionColor=?black?))


So I suggest that the visualization of tracks has lots of track-type-specific settings which the user will want to control, and which would be messy to handle with an open-ended set of optional ??? args to a dispatch-capable single ?displayTrack? method.

 - Paul

Michael Lawrence

Wed, Mar 14, 2018 1:05 PM #

Agreed about encapsulating plot parameters. I was thinking in terms of user
convenience, relying on defaults.

On Wed, Mar 14, 2018 at 12:40 PM, Paul Shannon <

paul.thurmond.shannon at gmail.com> wrote:

Gabriel Becker

Wed, Mar 14, 2018 1:18 PM #

Paul,

I don't think these are necessarily in conflict. If myigv represents the
IGV session/state, then add_track(myigv, vcfobj) could call down to
add_track(myigv,VariantTrack(vcf)) so you'd get the default behaviors. you
could also support add_track(myigv, vcf, title = "bla", homVarColor =
"whateverman") which would call down to add_track(myigv, VariantTrack(vcf,
title = "bla", homVarColor = "whateverman"))

This is easy to do (I'm assume the IGVSession class name but replace it
with whatever class add_track is endomorphic in...):

setMethod("add_track", signature = c("IGVSession", "VCF"), function(igv,
track, ...) add_track(igv, VariantTrack(track, ...)))

setMethod("add_track", signature = c("IGVSession", "BAM", function(igv,
track, ...) add_track(igv, AlignmentTrack(track, ...)))

This would, as Michael points out, give you the default values of the
parameter when you just call add_track(myigv, vcfobj)

Does that make sense?

~G


On Wed, Mar 14, 2018 at 12:40 PM, Paul Shannon <

paul.thurmond.shannon at gmail.com> wrote:

Gabriel Becker, Ph.D
Scientist
Bioinformatics and Computational Biology
Genentech Research

	[[alternative HTML version deleted]]

Paul Shannon

Tue, Mar 20, 2018 4:09 PM #

I have now implemented VCF tracks for IGV, supporting both 

   a local VCF object read and filtered by the VariantAnnotation package, and
   a remote webserver-hosted vcf file.   

In normal use I expect (and recommend) that the local VCF object will be relatively small (< 1Mb, < 50 samples - or some tradeoff of those approximate numbers), and that the genome scale vcf file is accompanied by an index.  

I am now turning to annotation tracks: bed, bed9, gff, gff3, gtf.  rtracklayer provides a good set of importers for these formats, and S4 classes to represent them (apparently all are subclasses of GenomicRanges): 
 
   BEDFile (3 required fields, up to 9 optional fields - https://genome.ucsc.edu/FAQ/FAQformat.html#format1)
   GFFFile (includes gff, gff3, gtf)

I propose to support four different representations of these data in R:

   data.frame
   the two rtracklayer classes
   a url pointing to a web-hosted and indexed annotation

The AnnotationTrack constructor accepts all three in the ?annotation? parameter, a simple version of which (with many parameters defaulted) is:

  track <- AnnotationTrack(trackName, annotation, color, displayMode)

The annotation parameter will be inspected by the constructor: is it a data.frame? a BEDFile?  a GFFFile?  a url?

The local data is reformatted as needed into a file with a format igv.js understands - native bed and gff text files - then passed to igv as a local url. Remote urls are transmitted without change.

Does this sound right?  If you have a minute to comment, now is a good time to offer critique and suggestions on annotation tracks.

Next up after the AnnotationTrack class will be alignment (bam) tracks and, if I get to it before package submission data, a ?seg? track for segmented copy number data.

Last week Gabe asked:

I hope I don?t sound disrespectful by describing these shorter methods as only syntactic simplifications with a little S4 dispatch thrown in.    They have value, for sure, but are they not just a relatively thin layer on top of the classes I am writing now?   *If* that description is accurate, then I?d rather consider adding them later, after the nuts and bolts and basic operations are all written, tested, and subjected to a few months of user QC.  I admit that I also prefer the greater operational clarity which for me, with my plodding brain, comes from using by explicit data types and explicit constructors.)  

 - Paul