Skip to content

[Bioc-devel] "patches" for Gviz: utr plotting support and direct BamFile plotting

7 messages · Steve Lianoglou, Martin Morgan, Kasper Daniel Hansen +2 more

#
Hi Florian (and other interested Gviz'ers),

I thought I'd use Gviz to whip up pretty plots for my thesis (yay!)
where I need to plot lots of NGS data over 3'UTRs.

I wanted to tackle the "more standard" drawing of UTRs (thin exons
(vs. thick coding)) in gene regions as well as making repeated
plotting of the same data over different regions easier -- there is
also an unplanned for increase in plotting speed of ~ 5-8x
(unscientific benchmark) when plotting gene regions using my TxDbTrack
vs. GeneRegionTrack.

I have a more thorough summary of what I did here:
http://cbio.mskcc.org/~lianos/files/bioc/Gviz/Gviz-enhancement-1.html

With the relevant pics at the bottom. It's still a work in progress
but I thought I'd put it out there now to see if you think it'd be
useful for patching back into Gviz -- I'd be happy to groom things
further to make it easier to add back into Gviz, or change things to
make the approach more "inline" with the coding style/philosophy of
the package (which I tried to stick to).

Thanks (again) for this package -- it's really great.

-steve
#
Steve and I exchanged a little email about S4 initialize methods that it 
might help to share. Steve created an initialization method

setMethod("initialize", "BamTrack",
function(.Object, bam, cache=new.env(), range.strict=FALSE, ...) {
   if (missing(bam) || !is(bam, "BamFile") || !file.exists(path(bam))) {
     stop("bam required during initialize,BamTrack")
   }
   cache$bam <- bam
   .Object at cache <- cache
   callNextMethod(.Object=.Object, ...)
})

Steve tells me that this is similar to Gviz coding style. I think there 
are several issues here.

The first is that creating a sub-class actually tries to create an 
instance of the parent class, and to do that new("BamTrack") has to 
succeed. It doesn't because 'bam' does not have a default value

 > setClass("BamSubtrack", contains="BamTrack")
Error in .local(.Object, ...) : bam required during initialize,BamTrack

A second issue that comes up involves validation, which is what the 
check for a missing(bam) etc., is. It makes more sense to place this in 
the object validity method so that the code can be re-used, perhaps 
providing a prototype to initialize the bam field properly.

While on validity and prototypes, a weird thing is that the class 
definition can specify an invalid prototype and, since validity is only 
checked if the user provides additional arguments to 'new' / 
'initialize', it's possible to create invalid objects

   setClass("A", representation(x="numeric"))
   setValidity("A", function(object) {
       if (length(object at x) != 1L) "'x' must be length 1" else NULL
   })

and then

 > a = new("A")

seems to work but

 > validObject(a)
Error in validObject(a) : invalid class "A" object: 'x' must be length 1

the solution is to provide a prototype that creates a valid object

   setClass("A", representation(x="numeric"), prototype=prototype(x=1))

and the acid test is validObject(new("A")) == TRUE

A third and even more obscure issue is that 'initialize' is advertised 
to take unnamed arguments as instances of parent classes that are used 
to initialize derived classes, so it makes sense to avoid accidentally 
capturing un-named arguments by placing 'bam' and friends _after_ ... 
Let's see...

   setClass("A", representation(x="numeric"))
   setClass("B", representation(y="numeric"), contains="A")

and then

 >   new("B", new("A", x=2))
An object of class "B"
Slot "y":
numeric(0)

Slot "x":
[1] 2

but...

   setMethod(initialize, "B", function(.Object, y, ...)
       callNextMethod(.Object, y=y, ...))

and now the copy constructor is broken.

 >   new("B", new("A", x=2))
Error in validObject(.Object) :
   invalid class "B" object: invalid object for slot "y" in class "B": 
got class "A", should be or extend class "numeric"

Another point about initialize as a copy constructor is that it updates 
multiple slots in a (relatively) efficient way -- only 1 copy of the 
object, rather than once for each slot assignment

   removeMethod("initialize", "B")

and then

 > b = new("B")
 > tracemem(b)
[1] "<0x53d74e0>"
 > b at x = 1
tracemem[0x53d74e0 -> 0x54096c8]:
 > b at y = 2
tracemem[0x54096c8 -> 0x540b0c0]:

so a copy on each slot assignment, vs.

 > b1 = new("B"); tracemem(b1)
[1] "<0x540e968>"
 > initialize(b1, x=1, y=2)
tracemem[0x540e968 -> 0x541c628]: initialize initialize
An object of class "B"
Slot "y":
[1] 2

Slot "x":
[1] 1

Combined, these are enough to make one want to think very carefully 
about writing initialize methods; often a 'Constructor' is the right 
place to do argument coercion, etc., (although sometimes I think the 
constructor is avoiding some of its responsibility, e.g., BamFile() 
fails, but validObject(new("BamFile")) == TRUE) and validity methods the 
correct place to check validity.

Martin
On 08/21/2012 11:12 PM, Steve Lianoglou wrote:

  
    
#
Hi,

And since we're already getting pretty deep into the woods, I guess it
can't hurt to keep going:

On Wed, Aug 22, 2012 at 4:11 PM, Kasper Daniel Hansen
<kasperdanielhansen at gmail.com> wrote:
With the exception when your class has slots that are environments.

If it's true that some things will still call new("YourClass", ...)
that aren't your constructor, then you will be surprised:

setClass("A", representation=representation(cache="environment"),
prototype=prototype(cache=new.env()))
ctr <- function(cache=new.env()) new("A", cache=cache)

## This is the behavior you probably expect:
a = ctr()
b = ctr()
a at cache[['a']] = 1
b at cache[['a']]
NULL

## This isn't
y = new("A")
z = new("A")
y at cache[['a']] = 1
z at cache[['a']]
[1] 1    ## Woops!


But if you set an appropriate initialize method:

setMethod(initialize, "A", function(.Object, ..., cache=new.env()) {
  .Object at cache <- cache
  callNextMethod(.Object, ...)
})

All is well:
y = new("A")
z = new("A")
y at cache[['a']] = 1
z at cache[['a']]
NULL


I think ReferenceClasses replaces (most(?)) of the use cases that I
use the `cache` idiom for, although I'm not sure about the gotchas
with them because I haven't tried to grok RefClasses yet.

Still ... thought I'd point this out (I think it was actually one of
you two, who must have alerted me to this years ago (perhaps on
R-devel)).

-steve
#
Uh, uh,
Seems like I am the bad guy still using the dreaded initialize methods
around here :-(
I do agree with most of what you guys say, but still want to put my two
cents here. My lawyers are preparing a more complete statement at this
point :-)

Having a constructor function to me somewhat implies that you want objects
from that class to be created by the user in a manual process. In more
complex class hierarchies you don't really want that, but rather you want
to pass through all the parent class' instantiations to fill the relevant
slots appropriately. Whenever you need something more complicated than
foo at a=b in these cases I do not see a way around the initializer. For
instance, in Gviz I have a whole bunch of classes that inherit from each
other, each of them grabbing the arguments to fill their slots while
objects are instantiated. The bottom-most of these classes will gobble up
all the arguments that are left over and stick them into a plotting
parameters object. I guess I could have explicit constructors for all of
those classes, and in those explicitly call the parent constructor, thus
walking through the hierarchy, doing whatever magic I need to do to make
things work. Now that doesn't strike me as particularly elegant either,
and I can't see how that would help with the code copying issue.

Another remark regarding validation. For classes with a large memory
footprint I am very much worried about unnecessary copies of the data. For
objects with very light content that are created very often however I care
much more about fast object instantiation. Running through a validation
method each time you create an object adds quite some overhead to this
(and we all know that building S4 methods even without validators is not
cheap at all). I remember there were times when the use of validation
methods for classes was not recommended. And personally I am no big fan of
them for the reasons pointed out by Kasper and Martin before.

That being said, I will take a closer look at my package to figure out a
way to code everything without the initialize methods and report back to
you guys about my success. I do not generally advertise the use use of
initializers (as a matter of fact I am far far away from that), I just
want to stand up here for these cuddly little creatures, threatened by
extinction and make the point that they still do have their rightful place
in our Bioconductor eco system?
Cheers,
Florian
#
Hi Florian,
On 08/22/2012 11:47 PM, Hahne, Florian wrote:
You don't have to. You can either use setValidity2/new2 from the
IRanges package:

   setClass("A", representation(x="numeric"))
   setValidity2("A", function(object) {cat("validating ... OK\n"); TRUE})

   > a <- new("A", x=2.34)
   validating ... OK

   > a <- new2("A", x=2.34, check=FALSE)

or bring the case in front of the R-devel court for adding some kind of
'check' argument to new(). Take all your lawyers with you.

H.