Skip to content

[Bioc-devel] Import BSgenome class without attaching BiocGenerics (and others)?

25 messages · Bhagwat, Aditya, Kasper Daniel Hansen, Michael Lawrence +2 more

#
Dear Bioc devel,

Is it possible to import the BSgenome class without attaching BiocGenerics (to keep a clean namespace during the development of multicrispr<https://gitlab.gwdg.de/loosolab/software/multicrispr>).

BSgenome <- methods::getClassDef('BSgenome', package = 'BSgenome')

(Posted earlier on BioC support<https://support.bioconductor.org/p/124442/> and redirected here following Martin's suggestion)

Thankyou :-)

Aditya
#
The way to keep a "clean namespace" is to selectively import symbols
into your namespace, not to import _nothing_ into your namespace.
Otherwise, your code will fill with namespace qualifications that
distract from what is more important to communicate: the intent of the
code. And no, there's no way to define method signatures using
anything other than simple class names.

It would be interesting to explore alternative ways of specifying
method signatures. One way would be if every package exported a "class
reference" (class name with package attribute, at least)  for each of
its classes. Those could be treated like any other exported object,
and referenced via namespace qualification. It would require major
changes to the methods package but that should probably happen anyway
to support disambiguation when two packages define a class of the same
name. It would be nice to get away from the exportClasses() and
importClasses() stuff. File that under the "rainy year" category.

Michael

On Fri, Sep 6, 2019 at 3:39 AM Bhagwat, Aditya
<Aditya.Bhagwat at mpi-bn.mpg.de> wrote:

  
    
#
Thank you Michael,

Appreciate your time for helping me fill the gaps in my understanding of the S4 flow :-).

It all started when I defined (in my multicrispr package) the S4 coercer :
methods::setAs( "BSgenome",
"GRanges",
function(from) as(GenomeInfoDb::seqinfo(from), "GRanges")

When building, I noticed the message
in method for 'coerce' with signature '"BSgenome","GRanges"': no definition for class "BSgenome"

So, I added
BSgenome <- methods::getClassDef('BSgenome', package = 'BSgenome')

That loads all these dependencies.
I guess because these dependencies are needed to provide for all required S4 methods for the BSgenome class, am I right?

Is there a way to define a methods::setAs without loading the class definition?

Aditya
#
I noticed the unfriendly indentation and formatting of my response , so I updated my original question on BioC support (with a much more eye-friendly formatting):

https://support.bioconductor.org/p/124442
#
It sounds like you are trying to defer loading of namespaces in order
to save time when they are unnecessary? That's probably going to end
up a losing battle.

On Fri, Sep 6, 2019 at 5:47 AM Bhagwat, Aditya
<Aditya.Bhagwat at mpi-bn.mpg.de> wrote:

  
    
#
There are
  importMethodsFrom(PACKAGE, NAME_OF_METHODS)
  importClassesFrom(PACKAGE, NAME_OF_METHODS)
to help with selective importing S4 methods/classes.  See section 1.5.6 of
WRE.

On Fri, Sep 6, 2019 at 9:24 AM Michael Lawrence via Bioc-devel <
bioc-devel at r-project.org> wrote:

            

  
    
#
Thanks Kasper and Michael.

The importClassesFrom  sounds like something that would allow an attachment-free S4 class import, will check them out.
Michael, the current auto-attach is causing 66 namespace clashes<https://support.bioconductor.org/p/124442/> ? not feeling very comfortable about that, so trying to avoid them.

I also think there?s something about S4 coercion that I don?t yet fully understand.
For instance: the S4Vectors package has three different versions of a S4Vectors::Vector -> data.frame coercer. Why? Any useful pointers?

setAs("Vector", "data.frame", function(from) as.data.frame(from))

as.data.frame.Vector <- function(x, row.names=NULL, optional=FALSE, ...) {
    as.data.frame(x, row.names=NULL, optional=optional, ...)
}

setMethod("as.data.frame", "Vector",
          function(x, row.names=NULL, optional=FALSE, ...)
          {
              x <- as.vector(x)
              as.data.frame(x, row.names=row.names, optional=optional, ...)
          })



From: Kasper Daniel Hansen <kasperdanielhansen at gmail.com>
Sent: Freitag, 6. September 2019 17:49
To: Michael Lawrence <lawrence.michael at gene.com>
Cc: Bhagwat, Aditya <Aditya.Bhagwat at mpi-bn.mpg.de>; bioc-devel at r-project.org
Subject: Re: [Bioc-devel] Import BSgenome class without attaching BiocGenerics (and others)?

There are
  importMethodsFrom(PACKAGE, NAME_OF_METHODS)
  importClassesFrom(PACKAGE, NAME_OF_METHODS)
to help with selective importing S4 methods/classes.  See section 1.5.6 of WRE.
On Fri, Sep 6, 2019 at 9:24 AM Michael Lawrence via Bioc-devel <bioc-devel at r-project.org<mailto:bioc-devel at r-project.org>> wrote:
It sounds like you are trying to defer loading of namespaces in order
to save time when they are unnecessary? That's probably going to end
up a losing battle.

On Fri, Sep 6, 2019 at 5:47 AM Bhagwat, Aditya
<Aditya.Bhagwat at mpi-bn.mpg.de<mailto:Aditya.Bhagwat at mpi-bn.mpg.de>> wrote:
--
Michael Lawrence
Scientist, Bioinformatics and Computational Biology
Genentech, A Member of the Roche Group
Office +1 (650) 225-7760
michafla at gene.com<mailto:michafla at gene.com>

Join Genentech on LinkedIn | Twitter | Facebook | Instagram | YouTube

_______________________________________________
Bioc-devel at r-project.org<mailto:Bioc-devel at r-project.org> mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


--
Best,
Kasper
#
There's never a need to attach a package to satisfy the dependencies
of another package. That would defeat the purpose of namespaces.

The three coercion functions gets at the heart of the S3/S4 mess.
setAs() provides a dynamic coercion mechanism, so is useful for as(x,
class) when class can be anything. as.data.frame() is an S3 generic
defined by the base package, so every package sees it. Something
promotes as.data.frame() to an S4 generic, but only packages that
import the generic can see it. That likely excludes the vast majority
of CRAN packages. Thus, we define an S3 method for calls to the S3
generic. The S4 generic will fall back to the S3 methods; however, it
will first check all S4 methods. Defining as.data.frame,Vector()
defends against the definition of a method above it, such as
as.data.frame,Annotated(), which would intercept dispatch to the S3
as.data.frame.Vector().

Michael

On Fri, Sep 6, 2019 at 10:08 AM Bhagwat, Aditya
<Aditya.Bhagwat at mpi-bn.mpg.de> wrote:

  
    
#
Waaw, thanks Michael, that is really clarifying.
3 days later
#
Hi Aditya,

It feels that a coercion method from BSgenome to GRanges should rather 
be defined in the BSgenome package itself. Patch/PR welcome on GitHub. 
More generally speaking, coercion methods should be defined in a place 
that is "as close as possible" to the "from" or "to" classes rather than 
in a package that doesn't own any of the 2 classes involved.

Is this what you have in mind for this coercion?

   > as(seqinfo(BSgenome.Celegans.UCSC.ce10), "GRanges")
   GRanges object with 7 ranges and 0 metadata columns:
            seqnames     ranges strand
               <Rle>  <IRanges>  <Rle>
       chrI     chrI 1-15072423      *
      chrII    chrII 1-15279345      *
     chrIII   chrIII 1-13783700      *
      chrIV    chrIV 1-17493793      *
       chrV     chrV 1-20924149      *
       chrX     chrX 1-17718866      *
       chrM     chrM    1-13794      *
     -------
     seqinfo: 7 sequences (1 circular) from ce10 genome

Thanks,
H.
On 9/6/19 03:39, Bhagwat, Aditya wrote:

  
    
#
Hi Herve,
:-)
Owkies. What pull/fork/check/branch protocol to be followed?
Yes.

Perhaps also useful to share the wider context, allowing your and others feedback for improved software design.
I wanted to subset a <https://support.bioconductor.org/p/124367> BSgenome (without the _random or _unassigned), but Lori explained this is not possible.<https://support.bioconductor.org/p/124367>
Instead Lori suggested to coerce a BSgenome into a GRanges<https://support.bioconductor.org/p/123489>, which is a useful solution, but for which currently no exported S4 method exists<https://support.bioconductor.org/p/124416>
So I defined an S4 coercer in my multicrispr package, making sure to properly import the Bsgenome class<https://support.bioconductor.org/p/124442>.
Then, after coercing a BSgenome into a GRanges, I can extract the chromosomes, after properly importing IRanges::`%in%`<https://support.bioconductor.org/p/124367>
Which I can then on end to karyoploteR<https://support.bioconductor.org/p/124328>, for genome-wide plots of crispr target sites.

A good moment also to say thank you to all of you who helped me out, it helps me to make multicrispr fit nicely into the BioC ecosystem.

Speeking of BioC design philosophy, can any of you suggest concise and to-the-point reading material to deepen my understanding of the core BioC software design philosophy?
I am trying to understand that better (which was the context for asking recently why there are three Vector -> data.frame coercers in S4Vectors<https://support.bioconductor.org/p/124491>)

Cheers,

Aditya
#
Hi Aditya,
On 9/11/19 01:31, Bhagwat, Aditya wrote:
Looks like you don't need to coerce the BSgenome object to GRanges. See 
https://support.bioconductor.org/p/123489/#124581

H.

  
    
#
Hi Herve,

Thank you for your responses.
The karyoploteR use case still requires it, though, to allow plotting of only the chromosomal BSgenome portions:

    chromranges <- as(bsegenome, "GRanges")
    kp <- karyoploteR::plotKaryotype(chromranges)
    karyoploteR::kpPlotRegions(kp, crispr_target_sites)

Or do you see any alternative for this purpose too?

Aditya
#
I'm pretty surprised that the karyoploteR package does not accept a
Seqinfo since it is plotting chromosomes. But again, please consider
just doing as(seqinfo(bsgenome), "GRanges").

On Wed, Sep 11, 2019 at 3:59 AM Bhagwat, Aditya
<Aditya.Bhagwat at mpi-bn.mpg.de> wrote:

  
    
#
Thanks Michael, 

The important detail is that I want to plot the relevant chromosomes only

    relevant_chromosomes <- GenomeInfoDb::seqnames(grangeslist)  %>% 
                            S4Vectors::runValue() %>% 
                            Reduce(union, .) %>% 
                            unique()

    genomeranges <- GenomeInfoDb::seqinfo(grangeslist) %>% 
                    as('GRanges') %>% 
                   (function(gr){
                       gr [ as.character(GenomeInfoDb::seqnames(gr)) %in% 
                            relevant_chromosomes ]
                   })

    kp <- karyoploteR::plotKaryotype(genomeranges)
    karyoploteR::kpPlotRegions(kp, grangeslist) # grangeslist contains crispr target sites


And, this process required as("GRanges")

    #' Convert BSgenome into GRanges 
    #' @param from BSgenome, e.g. BSgenome.Mmusculus.UCSC.mm10::Mmusculus
    #' @examples 
    #' require(magrittr)
    #' BSgenome.Mmusculus.UCSC.mm10::BSgenome.Mmusculus.UCSC.mm10 %>%
    #' as('GRanges')
    #' @importClassesFrom BSgenome BSgenome
    #' @export
    methods::setAs( "BSgenome", 
                    "GRanges", 
                    function(from)  from %>% 
                                    GenomeInfoDb::seqinfo() %>% 
                                    as('GRanges'))

Thankyou for feedback,

Aditya
#
So why not just do:

as(seqinfo(bsgenome)[unique(unlist(seqnames(grl)))], "GRanges")

Michael

On Wed, Sep 11, 2019 at 5:55 AM Bhagwat, Aditya
<Aditya.Bhagwat at mpi-bn.mpg.de> wrote:

  
    
#
The unique seqnames is what we call the seqlevels. So just:

   as(seqinfo(bsgenome)[seqlevels(grl)], "GRanges")

H.
On 9/11/19 07:42, Michael Lawrence wrote:

  
    
#
Or more accurately:

   as(seqinfo(bsgenome)[seqlevelsInUse(grl)], "GRanges")

since not all seqlevels are necessarily "in use" (i.e. not necessarily 
represented in seqnames(grl)).

H.
On 9/11/19 08:26, Herv? Pag?s wrote:

  
    
#
Good call. Didn't know about seqlevelsInUse().
On Wed, Sep 11, 2019 at 8:29 AM Pages, Herve <hpages at fredhutch.org> wrote:

  
    
#
Hi all,

I'm the developer of karyoploteR.

@Michael: I never though about using seqinfo as the source for the 
genome information. I'll add this as an option to define the genome. 
Thanks for the suggestion.

@Aditya: If you want to plot just your relevant chromosomes, you don't 
need to alter the genome. You can use the "chromosomes" parameter to 
give a vector of chromosome names. Is it not working for you for some 
reason?

Bernat


El 9/11/19 a las 2:31 PM, Michael Lawrence via Bioc-devel escribi?:
#
Oh, and Aditya, take into account taht if you give karyoploteR a custom 
genome as you are planning to do, it will not paint the cytobands by 
default, you'll have to get them yourself and give them to plotKaryotype.

If possible, I would recommend giving the genome by name ("hg19") and 
selecting the chromosomes to plot using "chromosomes".

Bernat




El 9/12/19 a las 8:47 AM, Bernat Gel Moreno escribi?:
#
Thanks Michael and Herve, 

Will do that then. 

I extract from this discussion that exporting a function in a core BioC package is reserved for functions 
(1) whose name unambiguously communicates what they do
(2) has the potential to be broadly used

And that as(BSgenome, 'GRanges') is being felt not not comply to these.

Thanks for all  the feedback - has been very helpful.

Aditya
#
I have updated karyoploteR and it's now (from version 1.11.9 in devel) 
possible to use a BSgenome object or a seqinfo object as genome 
definitions in plotKaryotype. In both cases, if possible, it will by 
default automatically filter the chromosomes to the canonical ones (if 
defined) and retrieve the cytobands for the genome. Or you can specify 
the exact chromosomes you want to plot. I think this should help with 
the specific question at hand.

Bernat


El 9/12/19 a las 10:09 AM, Bernat Gel Moreno escribi?:
#
Thankyou Bernat, 

Saw your email just now - since I have the "digest" option. Good that you brought the chromosomes parameter under my attention, I must be able to use that!

Aditya


-----Original Message-----
From: Bernat Gel Moreno <bgel at igtp.cat> 
Sent: Donnerstag, 12. September 2019 08:47
To: bioc-devel at r-project.org
Subject: Re: [Bioc-devel] Import BSgenome class without attaching BiocGenerics (and others)?

Hi all,

I'm the developer of karyoploteR.

@Michael: I never though about using seqinfo as the source for the genome information. I'll add this as an option to define the genome. 
Thanks for the suggestion.

@Aditya: If you want to plot just your relevant chromosomes, you don't need to alter the genome. You can use the "chromosomes" parameter to give a vector of chromosome names. Is it not working for you for some reason?

Bernat


El 9/11/19 a las 2:31 PM, Michael Lawrence via Bioc-devel escribi?:
#
Thankyou Bernat!

-----Original Message-----
From: Bernat Gel Moreno <bgel at igtp.cat> 
Sent: Donnerstag, 12. September 2019 11:26
To: bioc-devel at r-project.org
Subject: Re: [Bioc-devel] Import BSgenome class without attaching BiocGenerics (and others)?

I have updated karyoploteR and it's now (from version 1.11.9 in devel) possible to use a BSgenome object or a seqinfo object as genome definitions in plotKaryotype. In both cases, if possible, it will by default automatically filter the chromosomes to the canonical ones (if
defined) and retrieve the cytobands for the genome. Or you can specify the exact chromosomes you want to plot. I think this should help with the specific question at hand.

Bernat


El 9/12/19 a las 10:09 AM, Bernat Gel Moreno escribi?: