Skip to content

[Bioc-devel] as.list of a GRanges

18 messages · Hervé Pagès, Cook, Malcolm, Michael Lawrence +4 more

#
Hi,

I'm having an error in the devel version of my package karyoploteR due 
to an error when converting a GRanges into a list of GRanges with "as.list".

This used to be possible and it was working a few weeks ago.

Am I suposed to use a different approach or is there a problem somewhere?

Thanks!

Bernat

Here's an example

rr <- GRanges(seqnames = c("chr1", "chr2"), ranges = IRanges(start=c(1, 
2), end=c(11, 12)))
as.list(rr)

And the error message

Error in (function (classes, fdef, mtable)? :
 ? unable to find an inherited method for function 'getListElement' for 
signature '"GRanges"'


and the sessionInfo

 > sessionInfo()
R Under development (unstable) (2018-02-14 r74250)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Debian GNU/Linux 8 (jessie)

Matrix products: default
BLAS: /software/debian-8/general/R-Bioc-Devel/current/lib/R/lib/libRblas.so
LAPACK: 
/software/debian-8/general/R-Bioc-Devel/current/lib/R/lib/libRlapack.so

locale:
 ?[1] LC_CTYPE=en_US.UTF-8????? LC_NUMERIC=C LC_TIME=C???????????????? 
LC_COLLATE=en_US.utf8
 ?[5] LC_MONETARY=en_US.utf8??? LC_MESSAGES=en_US.utf8 
LC_PAPER=es_ES.UTF-8????? LC_NAME=C
 ?[9] LC_ADDRESS=C????????????? LC_TELEPHONE=C LC_MEASUREMENT=en_US.utf8 
LC_IDENTIFICATION=C

attached base packages:
[1] parallel? stats4??? stats???? graphics? grDevices utils datasets? 
methods?? base

other attached packages:
[1] GenomicRanges_1.31.20 GenomeInfoDb_1.15.5 IRanges_2.13.26?????? 
S4Vectors_0.17.32???? BiocGenerics_0.25.3

loaded via a namespace (and not attached):
[1] zlibbioc_1.25.0??????? compiler_3.5.0 XVector_0.19.8???????? 
tools_3.5.0??????????? GenomeInfoDbData_1.1.0
[6] RCurl_1.95-4.10??????? bitops_1.0-6
#
Hi Bernat,

Use as(gr, "GRangesList") instead of as.list() on your GRanges objects.

Most of the times (e.g. for passing to lapply() or mclapply()), you
don't need an ordinary list, but, if you really want one, you can call
as.list() on the GRangesList object returned by as(gr, "GRangesList").

Hope this helps,
H.
On 02/15/2018 12:05 AM, Bernat Gel wrote:

  
    
#
So is as.list() no longer supported for GRanges objects? I have found it
useful in places.
On Thu, Feb 15, 2018 at 8:30 AM, Herv? Pag?s <hpages at fredhutch.org> wrote:

            

  
  
#
On 02/15/2018 08:37 AM, Michael Lawrence wrote:
Very few places. I found a dozen of them in the entire software repo.
Now you should use as.list(as(gr, "GRangesList")) instead.
as.list() was behaving inconsistently on IRanges and GRanges objects,
which is blocking new developments. It will come back with a consistent
behavior. More generally speaking IRanges and GRanges will behave
consistently as far as their "list interpretation" is concerned.

I'll send more details later.

H.

  
    
#
Hi,

Can I ask, is this change under discussion in current release or so far in Bioconductor devel only (my assumption)?
> On 02/15/2018 08:37 AM, Michael Lawrence wrote:
> > So is as.list() no longer supported for GRanges objects? I have found it
 > > useful in places.
 > 
 > Very few places. I found a dozen of them in the entire software repo.

However there are probably more in the wild...

 > Now you should use as.list(as(gr, "GRangesList")) instead.
 > as.list() was behaving inconsistently on IRanges and GRanges objects,
 > which is blocking new developments. It will come back with a consistent
 > behavior. More generally speaking IRanges and GRanges will behave
 > consistently as far as their "list interpretation" is concerned.

Can we please be assured to be reminded of this prominently in release notes?

Thanks!

~malcolm
#
On Thu, Feb 15, 2018 at 11:53 AM, Cook, Malcolm <MEC at stowers.org> wrote:

            
Yes, this is what I meant. The Bioc package corpus is not very
representative for some of these entry points. Please keep that in mind
when making refactoring decisions.
Please be careful with these changes in behavior. Given the amount of code
that now depends on this software, consistency with existing behavior needs
to be weighted heavily.

  
  
#
On 02/15/2018 11:53 AM, Cook, Malcolm wrote:
Bioconductor devel only.
What as.list() was doing on a GRanges object was not documented. Relying
on some kind of obscure undocumented feature is never a good idea.
The changes will be announced and described on this list and in the
NEWS files of the IRanges and GenomicRanges packages.

H.

  
    
#
On Thu, Feb 15, 2018 at 1:45 PM, Herv? Pag?s <hpages at fredhutch.org> wrote:

            
There's just too much that is documented implicitly through inherited
behaviors, or where we say things like "this data structure behaves as one
would expect given base R". It's not fair to claim that those features are
undocumented. Our documentation is not complete enough to use it as an
excuse.

  
  
#
On 02/15/2018 01:57 PM, Michael Lawrence wrote:
It's not fair to suggest that this is a widely used feature either.

I've identified all the places in the 1500 software packages where
this was used, and, as I said, there were very few places. BTW I
fixed most of them but my plan is to fix all of them. Some of the
code that is outside the Bioc package corpus might be affected but
it's fair to assume that this will be a very rare occurence. This can
be mitigated by temporary restoring as.list() on GRanges, with a
deprecation message, and wait 1 more devel cycle to replace it with
the new behavior. I chose to disable it for now, on purpose, so I can
identify packages that break (the build report is a great tool for
that) and fix them.

I'm not using the fact that as.list() on a GRanges is not documented
as an excuse for anything. Only to help those with concerns to
relativize and relax.

H.

  
    
#
Hi Herv? and others,

Thanks for the responses.

I woudn't call as.list() of a GRanges an "obscure behaviour" but more a 
"works as expected, even if not clearly documented" behaviour.

In any case I can change the code to as(gr, "GRangesList") as suggested.

Thanks again for the responses and discussion :)

Bernat


*Bernat Gel Moreno*
Bioinformatician

Hereditary Cancer Program
Program of Predictive and Personalized Medicine of Cancer (PMPPC)
Germans Trias i Pujol Research Institute (IGTP)

Campus Can Ruti
Carretera de Can Ruti, Cam? de les Escoles s/n
08916 Badalona, Barcelona, Spain

Tel: (+34) 93 554 3068
Fax: (+34) 93 497 8654
08916 Badalona, Barcelona, Spain
bgel at igtp.cat <mailto:bgel at igtp.cat>
www.germanstrias.org <http://www.germanstrias.org/>

<http://www.germanstrias.org/>







El 02/15/2018 a las 11:19 PM, Herv? Pag?s escribi?:
#
For what it's worth, my package (LOLA) was one that used as.list on a 
GRanges or GRangesList, and those calls were broken by changes to devel. 
Since I was also pushing changes at the time, I assumed the devel build 
errors were due to my updates -- I spent quite a bit of time trying to 
figure out what was wrong before I realized this breakage was not caused 
by my updates, but by upstream changes in GRanges...eventually I tracked 
down errors to as.list (and ultimately, found other errors, which we 
discussed earlier on this list), but my conclusion from this was that, 
from my perspective, using the deployed bioc devel as a way to test for 
what refactoring will break doesn't seem like the ideal way to go -- I 
assumed that generally, other package changes wouldn't typically be 
pushed that would break my package's build, so it devalued the role of 
the dev builds and reduced my confidence in using that (now when I see 
error I may assume it's something else, and wait a few days, instead of 
diving right in to try to solve the problem).

I like the idea of temporarily restoring as.list with a deprecation 
message -- also, as a general development philosophy going forward in 
terms of testing on devel. This would have saved me a lot of time 
troubleshooting in this instance.

Just my 2 cents.

-Nathan
On 02/16/2018 02:57 AM, Bernat Gel wrote:
#
FWIW, this change also affects code that don't call as.list() explicitly.

such as calling Reduce(union, granges), Reduce is implemented on base, and
will call as.list() if the predicate isn't a vector already.

I understand it wasn't intended to be used this way, but with this in mind
there are more packages potentially affected by the change.

On Fri, Feb 16, 2018 at 1:25 PM, Nathan Sheffield <nathan at code.databio.org>
wrote:

  
  
#
This is one reason why I strongly advocate keeping NEWS up to date on the
devel branch. Not that it would necessarily be easy for Nathan to track it
down based on NEWS on all the packages he depends on.

On Fri, Feb 16, 2018 at 10:25 AM, Nathan Sheffield <nathan at code.databio.org>
wrote:

  
  
#
Hi Bernat,
On 02/15/2018 11:57 PM, Bernat Gel wrote:
Most users/developers will probably agree that as.list() worked
as expected on a GRanges object. But then they'll be surprised
and confused when they use it on an IRanges object and discover
that it does something completely different. The current effort
is to bring more consistency between GRanges and IRanges objects
and to have their list-like semantics aligned and documented so
there will be no more such surprise.
I went ahead and fixed karyoploteR. This is karyoploteR 1.5.2. Make
sure to resync your GitHub repo by following the instructions here:

 
https://bioconductor.org/developers/how-to/git/sync-existing-repositories/

Note that the loop on the GRanges object (via the call to Map())
was not needed and could be replaced with a solution that uses
proper vectorization.

Best,
H.

  
    
2 days later
#
Hi Herv?,

I completely agree with the goal of having the semantics of list-like 
operations standardised and documented to avoid surprises, and if to do 
so, the current use of as.list must be changed I'm pefectly ok with 
that. I had not seen the strange behaviour with IRanges, so I was not 
aware of the problem.

In any case, thanks for fixing (and simplifying) karyoploteR. In 
retrospective I don't know why I didn't use simple vectorization! So, thanks


Bernat

*Bernat Gel Moreno*
Bioinformatician

Hereditary Cancer Program
Program of Predictive and Personalized Medicine of Cancer (PMPPC)
Germans Trias i Pujol Research Institute (IGTP)

Campus Can Ruti
Carretera de Can Ruti, Cam? de les Escoles s/n
08916 Badalona, Barcelona, Spain

Tel: (+34) 93 554 3068
Fax: (+34) 93 497 8654
08916 Badalona, Barcelona, Spain
bgel at igtp.cat <mailto:bgel at igtp.cat>
www.germanstrias.org <http://www.germanstrias.org/>

<http://www.germanstrias.org/>







El 02/17/2018 a las 04:19 AM, Herv? Pag?s escribi?:
#
On Mon, Feb 19, 2018 at 2:10 AM, Bernat Gel <bgel at igtp.cat> wrote:

            
Just want to point out that it's important to keep in mind that many of our
users never use IRanges directly, so consistency is not an absolute
requirement.

so I was not aware of the problem.

  
  
#
On 02/19/2018 06:43 AM, Michael Lawrence wrote:
Even if you only use GRanges objects, it's confusing that lapply()
works on them but not mapply(). The undergoing changes will also
address inconsistencies within the GRanges API, not just the
inconsistencies between the GRanges and IRanges APIs.

H.

  
    
#
Hi Renan,

Most packages affected by these changes are packages that loop on
the individual ranges of a GRanges object. They generally don't
call as.list() directly but use something like lapply(), vapply(),
sapply(), Map(), Reduce(), etc... All these functions indeed call
as.list() internally on the supplied object before looping on it.
Just to clarify, when I say I found a dozen of Bioconductor packages
in the entire software repo where as.list() was used on a GRanges
object, I'm counting all the packages that use it explicitly or
implicitly. This includes signeR, which I had on my list of packages
to fix.

BTW in this particular instance, I would recommend doing

     reduce(granges, drop.empty.ranges=TRUE)

instead of

     Reduce(union, as(granges, "GRangesList"))

reduce() walks on the individual ranges of the supplied object at
the C level so is much faster than performing a binary union in
an R loop. It should also be more memory efficient.

Cheers,
H.
On 02/16/2018 09:02 AM, Renan Valieris wrote: