To avoid duplication of effort, and perhaps to attract some early reviewers, I figure I?d let this group know that I plan to submit a new package ?IGV? for inclusion in the next Bioconductor release. The package will provide an interface to the excellent and quite new browser-based genome viewer written by Jim Robison and colleagues, igv.js: https://github.com/igvteam/igv.js IGV depends upon RStudio?s httpuv websocket library for passing JSON messages between an R session and igv.js running in the browser. Communication goes both ways - both ends are fully and independently interactive. My goal with IGV is to support all of the tracks mentioned here https://github.com/igvteam/igv.js/wiki/Tracks Note that though igv.js typically gets its track data from CORS/indexed webservers, the IGV package will also support locally created R data.frames describing either bed or wig tracks - annotation and quantitative, respectively - without any need to host those tracks on a pre-existing webserver. httpuv includes a minimal webserver which can adequately serve the temporary files IGV creates from your data.frames. In the years since the first appearance of my RCyjs package (which has a similarly design, and the same base class, using websockets to communicate between R and the browser), RStudio and Hector Corrado Bravo have added async web socket support for Windows to httpuv. This means IGV (and RCyjs also) will run on all platforms. A refactored BrowserViz package (which might be useful to anyone wishing to do similar R-to-browser communication) will accompany my submission. For javascript development, I have adopted commonly used strategies and tools, using npm and webpack to build a single, all-libraries-included html/js/css file for loading into the browser. This allows us to control library versioning and to improve browser load times. The single combined html/js/css file is created, not as part of R CMD build, but with a prior and separate, developer-only makefile maintained in the package?s inst/browserCode directory. Only that combined html/js/css file is included in the package tarball, along with configuration files to rebuild it, but not including all of the usually large number of node_modules that contributed to its construction. Comments and suggestions welcome. - Paul
[Bioc-devel] IGV - a new package in preparation
17 messages · Levi Waldron, Paul Shannon, Cook, Malcolm +2 more
Paul, Sounds cool! My one note after a quick first pass is that here: On Wed, Mar 7, 2018 at 2:15 PM, Paul Shannon <pshannon at systemsbiology.org> wrote:
Note that though igv.js typically gets its track data from CORS/indexed webservers, the IGV package will also support locally created R data.frames describing either bed or wig tracks - annotation and quantitative, respectively - without any need to host those tracks on a pre-existing webserver. httpuv includes a minimal webserver which can adequately serve the temporary files IGV creates from your data.frames.
It seems to me that those data.frames should be replaced with the core Bioconductor object classes which represent the types of information being displayed. You might look to epivizr for inspiration here, which (IIRC) allows "tracks" within epiviz to be backed by bioconductor objects. Best, ~G
- Paul
_______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
Gabriel Becker, Ph.D Scientist Bioinformatics and Computational Biology Genentech Research [[alternative HTML version deleted]]
Thanks, Gabe. You make an excellent point: bioc objects get first class support. In some instance, base R data types deserve that also, and data.frames lead the list for me, being useful, concise, universally available, expressive. So perhaps not ?data.frames replaced by? but ?accompanied by? appropriate bioc data types? - Paul
On Mar 7, 2018, at 2:40 PM, Gabe Becker <becker.gabe at gene.com> wrote: Paul, Sounds cool! My one note after a quick first pass is that here: On Wed, Mar 7, 2018 at 2:15 PM, Paul Shannon <pshannon at systemsbiology.org> wrote: Note that though igv.js typically gets its track data from CORS/indexed webservers, the IGV package will also support locally created R data.frames describing either bed or wig tracks - annotation and quantitative, respectively - without any need to host those tracks on a pre-existing webserver. httpuv includes a minimal webserver which can adequately serve the temporary files IGV creates from your data.frames. It seems to me that those data.frames should be replaced with the core Bioconductor object classes which represent the types of information being displayed. You might look to epivizr for inspiration here, which (IIRC) allows "tracks" within epiviz to be backed by bioconductor objects. Best, ~G - Paul
_______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel -- Gabriel Becker, Ph.D Scientist Bioinformatics and Computational Biology Genentech Research
1 day later
On Thu, Mar 8, 2018 at 12:29 AM, Paul Shannon <pshannon at systemsbiology.org> wrote:
Thanks, Gabe. You make an excellent point: bioc objects get first class support. In some instance, base R data types deserve that also, and data.frames lead the list for me, being useful, concise, universally available, expressive. So perhaps not ?data.frames replaced by? but ?accompanied by? appropriate bioc data types? - Paul
Definitely +1 for supporting GenomicRanges, including what's in genome() and mcols(). There's a demo of an rtracklayer -> GRanges -> UCSC genome browser workflow in the rtracklayer vignette <http://bioconductor.org/packages/release/bioc/vignettes/rtracklayer/inst/doc/rtracklayer.pdf> that I've made use of. I wouldn't necessarily say *don't* support data.frame, but I would certainly encourage Bioc users to import data with rtracklayer instead of generic read* functions, and to take advantage of the vast AnnotationHub and OrganismDbi-based annotations which provide GenomicRanges objects. Thanks and looking forward to it!
Thanks, Levi. Your comments, and Gabe?s are very helpful, getting me to consider things I have overlooked. Support for GenomicRanges is essential, as you and Gabe point out. In all cases IGV will convert a GRanges object to an appropriate track, then write it out as a temporary file. igv supports bed, gff, gff3, gtf, wig, bigWig, bedGraph, bam, vcf, and seg formats, and a variety of sources: files via http, google cloud storage, GA4GH; recent limited support has been provided for direct javascript data. Maybe someday AnnotationHub? GenomicRanges as I understand them are very flexible, not subclassed into types as are track formats. So I propose that in many cases it will be he user?s responsibility to specify track type, call the appropriate constructor, maybe specify column names so that the right scores can be extracted from the mcols - whose names are, so far as I know, are not standardized. If the GRanges object is too big - greater than a densely packed megabase, for instance, igv works best if the track file is indexed and served up by an index- and CORS-savvy webserver. Thus the IGV should politely fail - or at least issue a warning - when encounters big tracks. This ?too big? threshold may change over time. Reading through Michael?s rtracklayer vignette I came across this: The rtracklayer package currently interfaces with the UCSC web-based genome browser. Other packages may provide drivers for other genome browsers through a plugin system. Can anyone (maybe Michael himself?) comment on how I can evaluate an rtracklayer plugin strategy for igv? - Paul
On Mar 9, 2018, at 4:15 AM, Levi Waldron <lwaldron.research at gmail.com> wrote: On Thu, Mar 8, 2018 at 12:29 AM, Paul Shannon <pshannon at systemsbiology.org> wrote: Thanks, Gabe. You make an excellent point: bioc objects get first class support. In some instance, base R data types deserve that also, and data.frames lead the list for me, being useful, concise, universally available, expressive. So perhaps not ?data.frames replaced by? but ?accompanied by? appropriate bioc data types? - Paul Definitely +1 for supporting GenomicRanges, including what's in genome() and mcols(). There's a demo of an rtracklayer -> GRanges -> UCSC genome browser workflow in the rtracklayer vignette that I've made use of. I wouldn't necessarily say *don't* support data.frame, but I would certainly encourage Bioc users to import data with rtracklayer instead of generic read* functions, and to take advantage of the vast AnnotationHub and OrganismDbi-based annotations which provide GenomicRanges objects. Thanks and looking forward to it!
Couple of things: 1) Check out epivizr and the surrounding infrastructure (maybe Hector can chime in). It's able to serve up data directly from R; would be nice if we could do that with IGV, instead of writing out to files. That would require it to talk to some standard API, like the old DAS. 2) The rtracklayer API is in rtracklayer/R/browser.R. See ucsc.R for how that is implemented for UCSC. On Fri, Mar 9, 2018 at 9:59 AM, Paul Shannon <pshannon at systemsbiology.org> wrote:
Thanks, Levi. Your comments, and Gabe?s are very helpful, getting me to consider things I have overlooked. Support for GenomicRanges is essential, as you and Gabe point out. In all cases IGV will convert a GRanges object to an appropriate track, then write it out as a temporary file. igv supports bed, gff, gff3, gtf, wig, bigWig, bedGraph, bam, vcf, and seg formats, and a variety of sources: files via http, google cloud storage, GA4GH; recent limited support has been provided for direct javascript data. Maybe someday AnnotationHub? GenomicRanges as I understand them are very flexible, not subclassed into types as are track formats. So I propose that in many cases it will be he user?s responsibility to specify track type, call the appropriate constructor, maybe specify column names so that the right scores can be extracted from the mcols - whose names are, so far as I know, are not standardized. If the GRanges object is too big - greater than a densely packed megabase, for instance, igv works best if the track file is indexed and served up by an index- and CORS-savvy webserver. Thus the IGV should politely fail - or at least issue a warning - when encounters big tracks. This ?too big? threshold may change over time. Reading through Michael?s rtracklayer vignette I came across this: The rtracklayer package currently interfaces with the UCSC web-based genome browser. Other packages may provide drivers for other genome browsers through a plugin system. Can anyone (maybe Michael himself?) comment on how I can evaluate an rtracklayer plugin strategy for igv? - Paul
On Mar 9, 2018, at 4:15 AM, Levi Waldron <lwaldron.research at gmail.com>
wrote:
On Thu, Mar 8, 2018 at 12:29 AM, Paul Shannon <
pshannon at systemsbiology.org> wrote:
Thanks, Gabe. You make an excellent point: bioc objects get first class support. In
some instance, base R data types deserve that also, and data.frames lead the list for me, being useful, concise, universally available, expressive.
So perhaps not ?data.frames replaced by? but ?accompanied by?
appropriate bioc data types?
- Paul Definitely +1 for supporting GenomicRanges, including what's in genome()
and mcols(). There's a demo of an rtracklayer -> GRanges -> UCSC genome browser workflow in the rtracklayer vignette that I've made use of. I wouldn't necessarily say *don't* support data.frame, but I would certainly encourage Bioc users to import data with rtracklayer instead of generic read* functions, and to take advantage of the vast AnnotationHub and OrganismDbi-based annotations which provide GenomicRanges objects.
Thanks and looking forward to it!
_______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
Thanks, Michael. httpuv, to which Hector made crucial contributions, makes it easy to send data directly between R and the browser, using websockets. I resort to files, however, because when the data, rendered as json, exceeds 500k, the websocket hangs. I never identified the weak spot. Some Juypter developers recently had good luck with binary websocket data exchange. I am cautious, though, about pushing limits and using the latest websocket extension, and found the fallback to local files quite adequate for now. I?ll look at ucsc.R. - Paul
On Mar 9, 2018, at 11:48 AM, Michael Lawrence <lawrence.michael at gene.com> wrote: Couple of things: 1) Check out epivizr and the surrounding infrastructure (maybe Hector can chime in). It's able to serve up data directly from R; would be nice if we could do that with IGV, instead of writing out to files. That would require it to talk to some standard API, like the old DAS. 2) The rtracklayer API is in rtracklayer/R/browser.R. See ucsc.R for how that is implemented for UCSC. On Fri, Mar 9, 2018 at 9:59 AM, Paul Shannon <pshannon at systemsbiology.org> wrote: Thanks, Levi. Your comments, and Gabe?s are very helpful, getting me to consider things I have overlooked. Support for GenomicRanges is essential, as you and Gabe point out. In all cases IGV will convert a GRanges object to an appropriate track, then write it out as a temporary file. igv supports bed, gff, gff3, gtf, wig, bigWig, bedGraph, bam, vcf, and seg formats, and a variety of sources: files via http, google cloud storage, GA4GH; recent limited support has been provided for direct javascript data. Maybe someday AnnotationHub? GenomicRanges as I understand them are very flexible, not subclassed into types as are track formats. So I propose that in many cases it will be he user?s responsibility to specify track type, call the appropriate constructor, maybe specify column names so that the right scores can be extracted from the mcols - whose names are, so far as I know, are not standardized. If the GRanges object is too big - greater than a densely packed megabase, for instance, igv works best if the track file is indexed and served up by an index- and CORS-savvy webserver. Thus the IGV should politely fail - or at least issue a warning - when encounters big tracks. This ?too big? threshold may change over time. Reading through Michael?s rtracklayer vignette I came across this: The rtracklayer package currently interfaces with the UCSC web-based genome browser. Other packages may provide drivers for other genome browsers through a plugin system. Can anyone (maybe Michael himself?) comment on how I can evaluate an rtracklayer plugin strategy for igv? - Paul
On Mar 9, 2018, at 4:15 AM, Levi Waldron <lwaldron.research at gmail.com> wrote: On Thu, Mar 8, 2018 at 12:29 AM, Paul Shannon <pshannon at systemsbiology.org> wrote: Thanks, Gabe. You make an excellent point: bioc objects get first class support. In some instance, base R data types deserve that also, and data.frames lead the list for me, being useful, concise, universally available, expressive. So perhaps not ?data.frames replaced by? but ?accompanied by? appropriate bioc data types? - Paul Definitely +1 for supporting GenomicRanges, including what's in genome() and mcols(). There's a demo of an rtracklayer -> GRanges -> UCSC genome browser workflow in the rtracklayer vignette that I've made use of. I wouldn't necessarily say *don't* support data.frame, but I would certainly encourage Bioc users to import data with rtracklayer instead of generic read* functions, and to take advantage of the vast AnnotationHub and OrganismDbi-based annotations which provide GenomicRanges objects. Thanks and looking forward to it!
_______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
Yea I wouldn't use JSON, particularly "row-oriented" JSON, as a means of scalable data transmission. On Fri, Mar 9, 2018 at 11:57 AM, Paul Shannon <pshannon at systemsbiology.org> wrote:
Thanks, Michael. httpuv, to which Hector made crucial contributions, makes it easy to send data directly between R and the browser, using websockets. I resort to files, however, because when the data, rendered as json, exceeds 500k, the websocket hangs. I never identified the weak spot. Some Juypter developers recently had good luck with binary websocket data exchange. I am cautious, though, about pushing limits and using the latest websocket extension, and found the fallback to local files quite adequate for now. I?ll look at ucsc.R. - Paul
On Mar 9, 2018, at 11:48 AM, Michael Lawrence <lawrence.michael at gene.com>
wrote:
Couple of things: 1) Check out epivizr and the surrounding infrastructure (maybe Hector
can chime in). It's able to serve up data directly from R; would be nice if we could do that with IGV, instead of writing out to files. That would require it to talk to some standard API, like the old DAS.
2) The rtracklayer API is in rtracklayer/R/browser.R. See ucsc.R for how
that is implemented for UCSC.
On Fri, Mar 9, 2018 at 9:59 AM, Paul Shannon <
pshannon at systemsbiology.org> wrote:
Thanks, Levi. Your comments, and Gabe?s are very helpful, getting me to
consider things I have overlooked.
Support for GenomicRanges is essential, as you and Gabe point out. In all cases IGV will convert a GRanges object to an appropriate track,
then write it out as a temporary file. igv supports bed, gff, gff3, gtf, wig, bigWig, bedGraph, bam, vcf, and seg formats, and a variety of sources: files via http, google cloud storage, GA4GH; recent limited support has been provided for direct javascript data. Maybe someday AnnotationHub?
GenomicRanges as I understand them are very flexible, not subclassed
into types as are track formats. So I propose that in many cases it will be he user?s responsibility to specify track type, call the appropriate constructor, maybe specify column names so that the right scores can be extracted from the mcols - whose names are, so far as I know, are not standardized.
If the GRanges object is too big - greater than a densely packed
megabase, for instance, igv works best if the track file is indexed and served up by an index- and CORS-savvy webserver. Thus the IGV should politely fail - or at least issue a warning - when encounters big tracks. This ?too big? threshold may change over time.
Reading through Michael?s rtracklayer vignette I came across this: The rtracklayer package currently interfaces with the UCSC web-based
genome browser.
Other packages may provide drivers for other genome browsers through
a plugin system.
Can anyone (maybe Michael himself?) comment on how I can evaluate an
rtracklayer plugin strategy for igv?
- Paul
On Mar 9, 2018, at 4:15 AM, Levi Waldron <lwaldron.research at gmail.com>
wrote:
On Thu, Mar 8, 2018 at 12:29 AM, Paul Shannon <
pshannon at systemsbiology.org> wrote:
Thanks, Gabe. You make an excellent point: bioc objects get first class support. In
some instance, base R data types deserve that also, and data.frames lead the list for me, being useful, concise, universally available, expressive.
So perhaps not ?data.frames replaced by? but ?accompanied by?
appropriate bioc data types?
- Paul Definitely +1 for supporting GenomicRanges, including what's in
genome() and mcols(). There's a demo of an rtracklayer -> GRanges -> UCSC genome browser workflow in the rtracklayer vignette that I've made use of. I wouldn't necessarily say *don't* support data.frame, but I would certainly encourage Bioc users to import data with rtracklayer instead of generic read* functions, and to take advantage of the vast AnnotationHub and OrganismDbi-based annotations which provide GenomicRanges objects.
Thanks and looking forward to it!
_______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
_______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
Hello,
Jumping in the conversation perhaps late.
If it helps the effort, below are some IGV related R functions I've used in the past to good effect communicating with IGV running on local/remote host and issuing GOTO and Save Snapshot commands.
They use utils::write.socket
One things that helped my experience greatly was ensuring socket always gets closed.
################################################################################
### IGV Interaction
################################################################################
library(functional)
OKCmd.socket<-function(socket,cmds,OK="^OK\\s*"){
## PURPOSE: write each cmd in <cmds>, a character vector, to the
## <socket> (appending newlines), warning for those that don't
## return <OK> (such as IGV responds)
lapply(cmds,function(cmd) {
write.socket(socket,paste(cmd,"\n"))
result=read.socket(socket)
if(1 != regexpr(OK,result))
warning(OK,' was expected in OKCmd.socket while executing [',cmd,'] but received [',result,']')
result
})
}
IGV.make.socket<-
## PURPOSE: a version of make.socket which abides to IGV's default
## port convention.
Curry(make.socket,port=60151)
IGV.tell<-function(cmds,...) {
## PURPOSE: send all the <cmds> to IGV on port determined by <...>,
## warning if unexpected IGV response, and being careful to cleanup
## by closing the socket.
message(cmds)
with(list(s=IGV.make.socket(...)),{
on.exit(close.socket(s))
OKCmd.socket(s,cmds)
})}
attr(IGV.tell,'ex')<-function(){
IGV.tell('goto B52','myComputer.myInstitute.org')
IGV.tell('goto chr4:1234:12345')
}
sanitize.path<-function(path) {
## YMMV
path<-gsub('%','',path,perl=TRUE)
path<-gsub('\\|',' by ',path,perl=TRUE) # good choice?
path<-gsub('[\\,]','',path,perl=TRUE)
path<-gsub('\\/','\\;',path,perl=TRUE)
path<-gsub('\\n',' ',path,perl=TRUE)
path
}
IGVSaveSnapshots<-structure(
function (region
,dir='./'
,filename=gsub(':','@',sanitize.path(region))
,type='png'
,filepath=sprintf('%s.%s',filename,type)
,...) {
## PURPOSE: create snapshots of <region> in <dir> in IGV socket
## behind host/port in .... NB: the dir must be relative to the
## host on which IGV is running, which may not exist on localhost.
IGV.cmds.gosnap<-function(x,filepath)
c(sprintf("goto %s",x)
,sprintf("snapshot %s" ,filepath)
)
IGV.tell(c(
sprintf("snapshotDirectory %s",dir)
,mapply(IGV.cmds.gosnap,region,filepath)
)
,...)
sprintf("%s/%s",dir,filepath)
},ex=function(){
## 1) Take two snapshots using IGV running on remote host
IGVSaveSnapshots(dir='/Volumes/Users/lab_project/SR_Prot_projects/iClip/myGenes/'
,c('ns1' # can be a gene name...
,'X:123-1234' # ...or locus
)
,'M0050U1ZE6')
IGVSaveSnapshots(dir='\\\\ion\\projects\\mec\\ShilatifardLab\\analysis\\fec\\triptolide\\fig'
,region=c(
## can be...:
'chrX:123-1234' # ...a locus
#,'ASNS' # .. a gene name...
)
,type='svg'
,'LA10MJDPKM5.sgc.loc')
})
~malcolm_cook at stowers.org
> -----Original Message-----
> From: Bioc-devel <bioc-devel-bounces at r-project.org> On Behalf Of Paul
> Shannon
> Sent: Friday, March 09, 2018 1:58 PM
> To: Michael Lawrence <lawrence.michael at gene.com>
> Cc: Gabe Becker <becker.gabe at gene.com>; bioc-devel at r-project.org; Paul
> Shannon <pshannon at systemsbiology.org>
> Subject: Re: [Bioc-devel] IGV - a new package in preparation
>
> Thanks, Michael.
>
> httpuv, to which Hector made crucial contributions, makes it easy to send
> data directly between R and the browser, using websockets. I resort to
> files, however, because when the data, rendered as json, exceeds 500k, the
> websocket hangs. I never identified the weak spot. Some Juypter
> developers recently had good luck with binary websocket data exchange. I
> am cautious, though, about pushing limits and using the latest websocket
> extension, and found the fallback to local files quite adequate for now.
>
> I?ll look at ucsc.R.
>
> - Paul
>
>
> > On Mar 9, 2018, at 11:48 AM, Michael Lawrence
> <lawrence.michael at gene.com> wrote:
> > > > Couple of things: > > > > 1) Check out epivizr and the surrounding infrastructure (maybe Hector can > chime in). It's able to serve up data directly from R; would be nice if we > could do that with IGV, instead of writing out to files. That would require it > to talk to some standard API, like the old DAS. > > > > 2) The rtracklayer API is in rtracklayer/R/browser.R. See ucsc.R for how > that is implemented for UCSC. > > > > On Fri, Mar 9, 2018 at 9:59 AM, Paul Shannon
> <pshannon at systemsbiology.org> wrote:
> > Thanks, Levi. Your comments, and Gabe?s are very helpful, getting me to > consider things I have overlooked. > > > > Support for GenomicRanges is essential, as you and Gabe point out. > > > > In all cases IGV will convert a GRanges object to an appropriate track, then > write it out as a temporary file. igv supports bed, gff, gff3, gtf, wig, bigWig, > bedGraph, bam, vcf, and seg formats, and a variety of sources: files via > http, google cloud storage, GA4GH; recent limited support has been > provided for direct javascript data. Maybe someday AnnotationHub? > > > > GenomicRanges as I understand them are very flexible, not subclassed > into types as are track formats. So I propose that in many cases it will be he > user?s responsibility to specify track type, call the appropriate constructor, > maybe specify column names so that the right scores can be extracted from > the mcols - whose names are, so far as I know, are not standardized. > > > > If the GRanges object is too big - greater than a densely packed > megabase, for instance, igv works best if the track file is indexed and served > up by an index- and CORS-savvy webserver. Thus the IGV should politely > fail - or at least issue a warning - when encounters big tracks. This ?too big? > threshold may change over time. > > > > Reading through Michael?s rtracklayer vignette I came across this: > > > > The rtracklayer package currently interfaces with the UCSC web-based > genome browser. > > Other packages may provide drivers for other genome browsers through > a plugin system. > > > > Can anyone (maybe Michael himself?) comment on how I can evaluate an > rtracklayer plugin strategy for igv? > > > > - Paul > > > > > > > On Mar 9, 2018, at 4:15 AM, Levi Waldron
> <lwaldron.research at gmail.com> wrote:
> > > > > > On Thu, Mar 8, 2018 at 12:29 AM, Paul Shannon
> <pshannon at systemsbiology.org> wrote:
> > > Thanks, Gabe. > > > > > > You make an excellent point: bioc objects get first class support. In > some instance, base R data types deserve that also, and data.frames lead > the list for me, being useful, concise, universally available, expressive. > > > > > > So perhaps not ?data.frames replaced by? but ?accompanied by? > appropriate bioc data types? > > > > > > - Paul > > > > > > Definitely +1 for supporting GenomicRanges, including what's in > genome() and mcols(). There's a demo of an rtracklayer -> GRanges -> UCSC > genome browser workflow in the rtracklayer vignette that I've made use of. > I wouldn't necessarily say *don't* support data.frame, but I would certainly > encourage Bioc users to import data with rtracklayer instead of generic > read* functions, and to take advantage of the vast AnnotationHub and > OrganismDbi-based annotations which provide GenomicRanges objects. > > > > > > Thanks and looking forward to it! > > > > > > > _______________________________________________ > > Bioc-devel at r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/bioc-devel > > > > > > _______________________________________________ > Bioc-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/bioc-devel
Hi, > -----Original Message----- > From: Bioc-devel <bioc-devel-bounces at r-project.org> On Behalf Of > Michael Lawrence > Sent: Friday, March 09, 2018 1:49 PM > To: Paul Shannon <pshannon at systemsbiology.org> > Cc: Gabe Becker <becker.gabe at gene.com>; bioc-devel at r-project.org > Subject: Re: [Bioc-devel] IGV - a new package in preparation > > Couple of things: > > 1) Check out epivizr and the surrounding infrastructure (maybe Hector can > chime in). It's able to serve up data directly from R; would be nice if we > could do that with IGV, instead of writing out to files. That would require > it to talk to some standard API, like the old DAS. One value of writing to files is that if IGV is running on remote host then retrieval via byte-range encoding continues to just work. Of course this is dependent upon what you are trying to do. ~malcolm_cook at stowers.org > > 2) The rtracklayer API is in rtracklayer/R/browser.R. See ucsc.R for how > that is implemented for UCSC. > > On Fri, Mar 9, 2018 at 9:59 AM, Paul Shannon > <pshannon at systemsbiology.org>
> wrote:
> > > Thanks, Levi. Your comments, and Gabe?s are very helpful, getting me to > > consider things I have overlooked. > > > > Support for GenomicRanges is essential, as you and Gabe point out. > > > > In all cases IGV will convert a GRanges object to an appropriate track, > > then write it out as a temporary file. igv supports bed, gff, gff3, gtf, > > wig, bigWig, bedGraph, bam, vcf, and seg formats, and a variety of > > sources: files via http, google cloud storage, GA4GH; recent limited > > support has been provided for direct javascript data. Maybe someday > > AnnotationHub? > > > > GenomicRanges as I understand them are very flexible, not subclassed > into > > types as are track formats. So I propose that in many cases it will be he > > user?s responsibility to specify track type, call the appropriate > > constructor, maybe specify column names so that the right scores can be > > extracted from the mcols - whose names are, so far as I know, are not > > standardized. > > > > If the GRanges object is too big - greater than a densely packed > megabase, > > for instance, igv works best if the track file is indexed and served up by > > an index- and CORS-savvy webserver. Thus the IGV should politely fail - > > or at least issue a warning - when encounters big tracks. This ?too big? > > threshold may change over time. > > > > Reading through Michael?s rtracklayer vignette I came across this: > > > > The rtracklayer package currently interfaces with the UCSC web-based > > genome browser. > > Other packages may provide drivers for other genome browsers through > a > > plugin system. > > > > Can anyone (maybe Michael himself?) comment on how I can evaluate an > > rtracklayer plugin strategy for igv? > > > > - Paul > > > > > > > On Mar 9, 2018, at 4:15 AM, Levi Waldron > <lwaldron.research at gmail.com>
> > wrote:
> > > > > > On Thu, Mar 8, 2018 at 12:29 AM, Paul Shannon <
> > pshannon at systemsbiology.org> wrote:
> > > Thanks, Gabe. > > > > > > You make an excellent point: bioc objects get first class support. In > > some instance, base R data types deserve that also, and data.frames lead > > the list for me, being useful, concise, universally available, expressive. > > > > > > So perhaps not ?data.frames replaced by? but ?accompanied by? > > appropriate bioc data types? > > > > > > - Paul > > > > > > Definitely +1 for supporting GenomicRanges, including what's in > genome() > > and mcols(). There's a demo of an rtracklayer -> GRanges -> UCSC genome > > browser workflow in the rtracklayer vignette that I've made use of. I > > wouldn't necessarily say *don't* support data.frame, but I would certainly > > encourage Bioc users to import data with rtracklayer instead of generic > > read* functions, and to take advantage of the vast AnnotationHub and > > OrganismDbi-based annotations which provide GenomicRanges objects. > > > > > > Thanks and looking forward to it! > > > > > > > _______________________________________________ > > Bioc-devel at r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/bioc-devel > > > > > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioc-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/bioc-devel
On Fri, Mar 9, 2018 at 12:36 PM, Cook, Malcolm <MEC at stowers.org> wrote:
Hi,
> -----Original Message----- > From: Bioc-devel <bioc-devel-bounces at r-project.org> On Behalf Of > Michael Lawrence > Sent: Friday, March 09, 2018 1:49 PM > To: Paul Shannon <pshannon at systemsbiology.org> > Cc: Gabe Becker <becker.gabe at gene.com>; bioc-devel at r-project.org > Subject: Re: [Bioc-devel] IGV - a new package in preparation > > Couple of things: > > 1) Check out epivizr and the surrounding infrastructure (maybe Hector
can
> chime in). It's able to serve up data directly from R; would be nice if
we
> could do that with IGV, instead of writing out to files. That would
require
> it to talk to some standard API, like the old DAS.
One value of writing to files is that if IGV is running on remote host then retrieval via byte-range encoding continues to just work. Of course this is dependent upon what you are trying to do.
Sure, and we'd want the API to support that as well (like epiviz does now).
~malcolm_cook at stowers.org
> > 2) The rtracklayer API is in rtracklayer/R/browser.R. See ucsc.R for how > that is implemented for UCSC. > > On Fri, Mar 9, 2018 at 9:59 AM, Paul Shannon > <pshannon at systemsbiology.org> > wrote: >
> > Thanks, Levi. Your comments, and Gabe?s are very helpful, getting me
to
> > consider things I have overlooked. > > > > Support for GenomicRanges is essential, as you and Gabe point out. > > > > In all cases IGV will convert a GRanges object to an appropriate
track,
> > then write it out as a temporary file. igv supports bed, gff, gff3,
gtf,
> > wig, bigWig, bedGraph, bam, vcf, and seg formats, and a variety of > > sources: files via http, google cloud storage, GA4GH; recent limited > > support has been provided for direct javascript data. Maybe someday > > AnnotationHub? > > > > GenomicRanges as I understand them are very flexible, not subclassed
> into
> > types as are track formats. So I propose that in many cases it will
be he
> > user?s responsibility to specify track type, call the appropriate > > constructor, maybe specify column names so that the right scores can
be
> > extracted from the mcols - whose names are, so far as I know, are not > > standardized. > > > > If the GRanges object is too big - greater than a densely packed
> megabase,
> > for instance, igv works best if the track file is indexed and served
up by
> > an index- and CORS-savvy webserver. Thus the IGV should politely
fail -
> > or at least issue a warning - when encounters big tracks. This ?too
big?
> > threshold may change over time. > > > > Reading through Michael?s rtracklayer vignette I came across this: > > > > The rtracklayer package currently interfaces with the UCSC
web-based
> > genome browser. > > Other packages may provide drivers for other genome browsers
through
> a
> > plugin system. > > > > Can anyone (maybe Michael himself?) comment on how I can evaluate an > > rtracklayer plugin strategy for igv? > > > > - Paul > > > >
> > > On Mar 9, 2018, at 4:15 AM, Levi Waldron
> <lwaldron.research at gmail.com>
> > wrote:
> > > > > > On Thu, Mar 8, 2018 at 12:29 AM, Paul Shannon <
> > pshannon at systemsbiology.org> wrote:
> > > Thanks, Gabe. > > > > > > You make an excellent point: bioc objects get first class support.
In
> > some instance, base R data types deserve that also, and data.frames
lead
> > the list for me, being useful, concise, universally available,
expressive.
> > > > > > So perhaps not ?data.frames replaced by? but ?accompanied by?
> > appropriate bioc data types?
> > > > > > - Paul > > > > > > Definitely +1 for supporting GenomicRanges, including what's in
> genome()
> > and mcols(). There's a demo of an rtracklayer -> GRanges -> UCSC
genome
> > browser workflow in the rtracklayer vignette that I've made use of. I > > wouldn't necessarily say *don't* support data.frame, but I would
certainly
> > encourage Bioc users to import data with rtracklayer instead of
generic
> > read* functions, and to take advantage of the vast AnnotationHub and > > OrganismDbi-based annotations which provide GenomicRanges objects.
> > > > > > Thanks and looking forward to it! > > >
> > > > _______________________________________________ > > Bioc-devel at r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/bioc-devel > > > >
> > [[alternative HTML version deleted]] > > _______________________________________________ > Bioc-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/bioc-devel
3 days later
Gabe and Levi made a good case for supporting GRanges in IGV. Looking at the GenomicRanges vignettes, it appears that many of Herve?s introductory examples have GC content as the mcols column of interest. Would that be a good test and demo for IGV? Or perhaps some other genomic quantity, one for which sample data is already present in some Bioconductor package?s extdata?
The IGV VCF track now works, using GenomicRanges and VariantAnnotation. It might be of interest, maybe lead to more useful suggestions which would be good for me to hear at this stage. Here is a code chunk using default parameters for colors, track height and etc. Homozygous non-reference calls are rendered in light blue, heterozygous in dark blue, reference in gray.
library(IGV)
library(VariantAnnotation)
igv <- IGV(portRange=9000:9020)
setGenome(igv, ?hg19")
setBrowserWindowTitle(igv, ?VCF demo?)
f <- system.file("extdata", "chr22.vcf.gz", package=?VariantAnnotation?)
chrom <- ?22"
start <- 50586118
end <- 50633733
rng <- GRanges(seqnames=chrom, ranges=IRanges(start=start, end=end))
vcf.sub <- readVcf(f, "hg19", param=rng)
track <- VariantTrack(?chr22-tiny", vcf.sub)
displayTrack(igv, track)
showGenomicRegion(igv, sprintf("chr22:%d-%d", start-1000, end+1000))
Suggestions?
On Mar 9, 2018, at 4:15 AM, Levi Waldron <lwaldron.research at gmail.com> wrote: Definitely +1 for supporting GenomicRanges, including what's in genome() and mcols(). There's a demo of an rtracklayer -> GRanges -> UCSC
You could look at the rtracklayer API. For example, using gets functions like track<-() and range<-() to set track and region may be more natural to R users. Then again, if there were endomorphic functions add_track() and set_range(), the API would support chaining. There should be no need to explicitly construct a track; just rely on dispatch and class semantics, i.e., passing a VCF object to add_track() would create a variant track automatically. On Mon, Mar 12, 2018 at 5:20 PM, Paul Shannon <
paul.thurmond.shannon at gmail.com> wrote:
Gabe and Levi made a good case for supporting GRanges in IGV. Looking at
the GenomicRanges vignettes, it appears that many of Herve?s introductory
examples have GC content as the mcols column of interest. Would that be a
good test and demo for IGV? Or perhaps some other genomic quantity, one
for which sample data is already present in some Bioconductor package?s
extdata?
The IGV VCF track now works, using GenomicRanges and VariantAnnotation.
It might be of interest, maybe lead to more useful suggestions which would
be good for me to hear at this stage. Here is a code chunk using default
parameters for colors, track height and etc. Homozygous non-reference
calls are rendered in light blue, heterozygous in dark blue, reference in
gray.
library(IGV)
library(VariantAnnotation)
igv <- IGV(portRange=9000:9020)
setGenome(igv, ?hg19")
setBrowserWindowTitle(igv, ?VCF demo?)
f <- system.file("extdata", "chr22.vcf.gz", package=?VariantAnnotation?)
chrom <- ?22"
start <- 50586118
end <- 50633733
rng <- GRanges(seqnames=chrom, ranges=IRanges(start=start, end=end))
vcf.sub <- readVcf(f, "hg19", param=rng)
track <- VariantTrack(?chr22-tiny", vcf.sub)
displayTrack(igv, track)
showGenomicRegion(igv, sprintf("chr22:%d-%d", start-1000, end+1000))
Suggestions?
On Mar 9, 2018, at 4:15 AM, Levi Waldron <lwaldron.research at gmail.com>
wrote:
Definitely +1 for supporting GenomicRanges, including what's in genome()
and mcols(). There's a demo of an rtracklayer -> GRanges -> UCSC
_______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
1 day later
Hi Michael, Set me straight if I got this wrong. You suggest:
There should be no need to explicitly construct a track; just rely on dispatch and class semantics, i.e., passing a VCF object to add_track() would create a variant track automatically.
But wouldn?t
displayTrack(vcf)
preclude any easy specification of options - which vary across track types - which are straightforward, easily managed and checked, by a set of track constructors?
Two examples:
displayTrack(VariantTrack(vcf, title=?mef2c eqtl?, height=?300?, homrefColor=?lightGray?,
homVarColor=?darkRed?, hetVarColor=?lightRed?))
displayTrack(AlignmentTrack(x, title=?bam 32?, viewAsPairs=TRUE, insertionColor=?black?))
So I suggest that the visualization of tracks has lots of track-type-specific settings which the user will want to control, and which would be messy to handle with an open-ended set of optional ??? args to a dispatch-capable single ?displayTrack? method.
- Paul
Agreed about encapsulating plot parameters. I was thinking in terms of user convenience, relying on defaults. On Wed, Mar 14, 2018 at 12:40 PM, Paul Shannon <
paul.thurmond.shannon at gmail.com> wrote:
Hi Michael, Set me straight if I got this wrong. You suggest:
There should be no need to explicitly construct a track; just rely on
dispatch and class semantics, i.e., passing a VCF object to add_track()
would create a variant track automatically.
But wouldn?t
displayTrack(vcf)
preclude any easy specification of options - which vary across track types
- which are straightforward, easily managed and checked, by a set of track
constructors?
Two examples:
displayTrack(VariantTrack(vcf, title=?mef2c eqtl?, height=?300?,
homrefColor=?lightGray?,
homVarColor=?darkRed?,
hetVarColor=?lightRed?))
displayTrack(AlignmentTrack(x, title=?bam 32?, viewAsPairs=TRUE,
insertionColor=?black?))
So I suggest that the visualization of tracks has lots of
track-type-specific settings which the user will want to control, and which
would be messy to handle with an open-ended set of optional ??? args to a
dispatch-capable single ?displayTrack? method.
- Paul
Paul,
I don't think these are necessarily in conflict. If myigv represents the
IGV session/state, then add_track(myigv, vcfobj) could call down to
add_track(myigv,VariantTrack(vcf)) so you'd get the default behaviors. you
could also support add_track(myigv, vcf, title = "bla", homVarColor =
"whateverman") which would call down to add_track(myigv, VariantTrack(vcf,
title = "bla", homVarColor = "whateverman"))
This is easy to do (I'm assume the IGVSession class name but replace it
with whatever class add_track is endomorphic in...):
setMethod("add_track", signature = c("IGVSession", "VCF"), function(igv,
track, ...) add_track(igv, VariantTrack(track, ...)))
setMethod("add_track", signature = c("IGVSession", "BAM", function(igv,
track, ...) add_track(igv, AlignmentTrack(track, ...)))
This would, as Michael points out, give you the default values of the
parameter when you just call add_track(myigv, vcfobj)
Does that make sense?
~G
On Wed, Mar 14, 2018 at 12:40 PM, Paul Shannon <
paul.thurmond.shannon at gmail.com> wrote:
Hi Michael, Set me straight if I got this wrong. You suggest:
There should be no need to explicitly construct a track; just rely on
dispatch and class semantics, i.e., passing a VCF object to add_track()
would create a variant track automatically.
But wouldn?t
displayTrack(vcf)
preclude any easy specification of options - which vary across track types
- which are straightforward, easily managed and checked, by a set of track
constructors?
Two examples:
displayTrack(VariantTrack(vcf, title=?mef2c eqtl?, height=?300?,
homrefColor=?lightGray?,
homVarColor=?darkRed?,
hetVarColor=?lightRed?))
displayTrack(AlignmentTrack(x, title=?bam 32?, viewAsPairs=TRUE,
insertionColor=?black?))
So I suggest that the visualization of tracks has lots of
track-type-specific settings which the user will want to control, and which
would be messy to handle with an open-ended set of optional ??? args to a
dispatch-capable single ?displayTrack? method.
- Paul
Gabriel Becker, Ph.D Scientist Bioinformatics and Computational Biology Genentech Research [[alternative HTML version deleted]]
6 days later
I have now implemented VCF tracks for IGV, supporting both a local VCF object read and filtered by the VariantAnnotation package, and a remote webserver-hosted vcf file. In normal use I expect (and recommend) that the local VCF object will be relatively small (< 1Mb, < 50 samples - or some tradeoff of those approximate numbers), and that the genome scale vcf file is accompanied by an index. I am now turning to annotation tracks: bed, bed9, gff, gff3, gtf. rtracklayer provides a good set of importers for these formats, and S4 classes to represent them (apparently all are subclasses of GenomicRanges): BEDFile (3 required fields, up to 9 optional fields - https://genome.ucsc.edu/FAQ/FAQformat.html#format1) GFFFile (includes gff, gff3, gtf) I propose to support four different representations of these data in R: data.frame the two rtracklayer classes a url pointing to a web-hosted and indexed annotation The AnnotationTrack constructor accepts all three in the ?annotation? parameter, a simple version of which (with many parameters defaulted) is: track <- AnnotationTrack(trackName, annotation, color, displayMode) The annotation parameter will be inspected by the constructor: is it a data.frame? a BEDFile? a GFFFile? a url? The local data is reformatted as needed into a file with a format igv.js understands - native bed and gff text files - then passed to igv as a local url. Remote urls are transmitted without change. Does this sound right? If you have a minute to comment, now is a good time to offer critique and suggestions on annotation tracks. Next up after the AnnotationTrack class will be alignment (bam) tracks and, if I get to it before package submission data, a ?seg? track for segmented copy number data. Last week Gabe asked:
If myigv represents the IGV session/state, then add_track(myigv, vcfobj) could call down to add_track(myigv,VariantTrack(vcf)) so you'd get the default behaviors. you could also support add_track(myigv, vcf, title = "bla", homVarColor = "whateverman") which would call down to add_track(myigv, VariantTrack(vcf, title = "bla", homVarColor = "whateverman?))
This is easy to do (I'm assume the IGVSession class name but replace it with whatever class add_track is endomorphic in...):
setMethod("add_track", signature = c("IGVSession", "VCF"), function(igv, track, ...) add_track(igv, VariantTrack(track, ...)))
setMethod("add_track", signature = c("IGVSession", "BAM", function(igv, track, ...) add_track(igv, AlignmentTrack(track, ...)))
This would, as Michael points out, give you the default values of the parameter when you just call add_track(myigv, vcfobj)
I hope I don?t sound disrespectful by describing these shorter methods as only syntactic simplifications with a little S4 dispatch thrown in. They have value, for sure, but are they not just a relatively thin layer on top of the classes I am writing now? *If* that description is accurate, then I?d rather consider adding them later, after the nuts and bolts and basic operations are all written, tested, and subjected to a few months of user QC. I admit that I also prefer the greater operational clarity which for me, with my plodding brain, comes from using by explicit data types and explicit constructors.) - Paul
On Mar 14, 2018, at 1:05 PM, Michael Lawrence <lawrence.michael at gene.com> wrote: Agreed about encapsulating plot parameters. I was thinking in terms of user convenience, relying on defaults. On Wed, Mar 14, 2018 at 12:40 PM, Paul Shannon <paul.thurmond.shannon at gmail.com> wrote: Hi Michael, Set me straight if I got this wrong. You suggest:
There should be no need to explicitly construct a track; just rely on dispatch and class semantics, i.e., passing a VCF object to add_track() would create a variant track automatically.
But wouldn?t
displayTrack(vcf)
preclude any easy specification of options - which vary across track types - which are straightforward, easily managed and checked, by a set of track constructors?
Two examples:
displayTrack(VariantTrack(vcf, title=?mef2c eqtl?, height=?300?, homrefColor=?lightGray?,
homVarColor=?darkRed?, hetVarColor=?lightRed?))
displayTrack(AlignmentTrack(x, title=?bam 32?, viewAsPairs=TRUE, insertionColor=?black?))
So I suggest that the visualization of tracks has lots of track-type-specific settings which the user will want to control, and which would be messy to handle with an open-ended set of optional ??? args to a dispatch-capable single ?displayTrack? method.
- Paul