[Bioc-devel] show method for CompressedVRangesList-class
I checked in a fix for the splitting to CompressedVRangesList. The slowness of creating a SimpleVRangesList is due to the cost of extracting a VRanges for each sample. Depending your exact use case, it might be better to pay that cost up-front, instead of deferring it to when the user wants to extract an element, which happens with the compressed list. As long as the number of samples is small, the memory overhead should be minimal. Michael
On Wed, Feb 25, 2015 at 9:59 AM, Michael Lawrence <michafla at gene.com> wrote:
Yea, I know, just need to get around to that one. Technically, it works, but it's obviously not ideal. On Wed, Feb 25, 2015 at 8:52 AM, Gabe Becker <becker.gabe at gene.com> wrote:
Why does splitting a VRanges give a GRangesList with VRanges objects as elements? Seems like it should return a VRangesList.
spl = split(vr, sampleNames(vr)) class(spl)
[1] "GRangesList" attr(,"package") [1] "GenomicRanges"
class(spl[[1]])
[1] "VRanges" attr(,"package") [1] "VariantAnnotation" ~G On Wed, Feb 25, 2015 at 8:39 AM, Michael Lawrence < lawrence.michael at gene.com> wrote:
Construction will take longer; the savings are in the accessing of the elements. But this seems like too much longer, so I will look into it. On Wed, Feb 25, 2015 at 8:12 AM, Robert Castelo <robert.castelo at upf.edu> wrote:
my current reason to prefer a CompressedVRangesList object over a SimpleVRangesList object is that i find one order of magnitude
difference
in creation time in each of these classes of objects:
library(VariantAnnotation)
fl <- system.file("extdata", "CEUtrio.vcf.bgz",
package="VariantFiltering")
vcf <- readVcf(fl, genome="hg19")
vr <- as(vcf, "VRanges")
length(vr)
[1] 15000
## create a VRangesList object
system.time(vrl <- do.call("VRangesList", split(vr, sampleNames(vr))))
user system elapsed
0.247 0.004 0.252
## create a CompressedVRangesList object
system.time(cvrl <- new("CompressedVRangesList", split(vr,
sampleNames(vr))))
user system elapsed
0.019 0.000 0.019
0.252/0.019
[1] 13.26316
with a larger vcf differences increase:
[... load vcf, coerce to VRanges ...]
length(vr)
[1] 25916
system.time(vrl <- do.call("VRangesList", split(vr, sampleNames(vr))))
user system elapsed
2.672 0.000 2.676
system.time(cvrl <- new("CompressedVRangesList", split(vr,
sampleNames(vr))))
user system elapsed
0.014 0.000 0.014
2.676 / 0.014
[1] 191.1429
so maybe i'm using the wrong way to construct a VRangesList object, but
according to our last conversation about this, there was no obvious
default
fast way to do it, starting from a VRanges object: https://stat.ethz.ch/pipermail/bioc-devel/2015-January/006905.html it would be great if there's a fast way to do this kind of
construction.
thanks, robert. On 02/25/2015 04:42 PM, Michael Lawrence wrote:
If you're storing data on a relatively small number of individuals
(say,
hundreds), you should use SimpleVRangesList, not
CompressedVRangesList.
On Wed, Feb 25, 2015 at 7:10 AM, Robert Castelo <
robert.castelo at upf.edu
<mailto:robert.castelo at upf.edu>> wrote:
i see you point, the logic i was thinking about is to use a list
of
VRanges objects to hold separately the variants of multiple
individuals, with one VRanges object per individual.
if i type the name of such a list object on the R shell, having
the
GRangesList show method, i feel i do not see much information
because the screen just scrolls up tens or hundreds of lines
specifiying variants per individual. however, the concise
appearance
of something like a VRangesList:
> vrl
VRangesList of length 10
names(32): S1 S2 S3 S4 ... S7 S8 S9 S10
at least suggests the user that the object holding the variants
has
information for 10 samples and belongs to the class 'VRangesList'.
i thought this made general sense but i'm fine if you feel this
interpretation does not warrant such a change.
cheers,
robert.
On 02/25/2015 01:25 AM, Michael Lawrence wrote:
Why not have the SimpleVRangesList be shown like
CompressedVRangesList,
for consistency with GRangesList? In other words, the opposite
of what
you propose. A strong argument could also be made that a
SimpleGenomicRangesList should be shown like a GRangesList.
Unless there
is some aversion to the more verbose output....
On Tue, Feb 24, 2015 at 2:36 PM, Robert Castelo
<robert.castelo at upf.edu <mailto:robert.castelo at upf.edu>
<mailto:robert.castelo at upf.edu
<mailto:robert.castelo at upf.edu>__>> wrote:
so, yes, but IMO rather than inheriting the show method
from
a
GRangesList, i think that the show method for
CompressedVRangesList
objects should be inherited from a VRangesList object.
right now
this is the situation:
library(VariantAnnotation)
example(VRangesList)
vrl
VRangesList of length 2
names(2): sampleA sampleB
cvrl <- new("CompressedVRangesList", split(vr,
sampleNames(vr)))
cvrl
CompressedVRangesList object of length 2:
$a
VRanges object with 1 range and 1 metadata column:
seqnames ranges strand ref
alt
totalDepth refDepth altDepth
<Rle> <IRanges> <Rle> <character> <characterOrRle>
<integerOrRle>
<integerOrRle> <integerOrRle>
[1] chr1 [1, 5] + T
C 12 5 7
sampleNames softFilterMatrix | tumorSpecific
<factorOrRle> <matrix> | <logical>
[1] a TRUE | FALSE
$b
VRanges object with 1 range and 1 metadata column:
seqnames ranges strand ref alt totalDepth
refDepth
altDepth
sampleNames softFilterMatrix |
[1] chr2 [10, 20] + A T 17
10
6 b FALSE |
tumorSpecific
[1] TRUE
-------
seqinfo: 2 sequences from an unspecified genome; no
seqlengths
would it be possible to have the VRangesList show method
for
CompressedVRangesList objects?
robert.
On 2/24/15 7:24 PM, Michael Lawrence wrote:
I think you might be missing an import. It should
inherit the
method for GRangesList.
On Tue, Feb 24, 2015 at 9:53 AM, Robert Castelo
<robert.castelo at upf.edu <mailto:robert.castelo at upf.edu>
<mailto:robert.castelo at upf.edu
<mailto:robert.castelo at upf.edu>__>> wrote:
hi,
i'm using the CompressedVRangesList class in
VariantFiltering
to hold variants and their annotations across
multiple samples
and found that there was no show method for this
class (unless
i'm missing the right import here) so i made one
within
VariantFiltering by copying&pasting from other
similar classes:
setMethod("show",
signature(object="__CompressedVRangesList"),
function(object) {
lo <- length(object)
cat(classNameForDisplay(__object), "
of
length ",
lo, "\n",
sep = "")
if (!is.null(names(object)))
cat(BiocGenerics:::__
labeledLine("names",
names(object)))
})
i guess, however, that the right home for this
would
be
VariantAnnotation. let me know if you consider
adding it there
(or somewhere else) and i'll remove it from
VariantFiltering.
thanks,
robert.
_________________________________________________
Bioc-devel at r-project.org <mailto:Bioc-devel at r-project.org
<mailto:Bioc-devel at r-project.__org
<mailto:Bioc-devel at r-project.org>>
mailing list
https://stat.ethz.ch/mailman/__listinfo/bioc-devel <https://stat.ethz.ch/mailman/listinfo/bioc-devel> -- Robert Castelo, PhD Associate Professor Dept. of Experimental and Health Sciences Universitat Pompeu Fabra (UPF) Barcelona Biomedical Research Park (PRBB) Dr Aiguader 88 E-08003 Barcelona, Spain telf: +34.933.160.514 <tel:%2B34.933.160.514> fax: +34.933.160.550 <tel:%2B34.933.160.550>
-- Robert Castelo, PhD Associate Professor Dept. of Experimental and Health Sciences Universitat Pompeu Fabra (UPF) Barcelona Biomedical Research Park (PRBB) Dr Aiguader 88 E-08003 Barcelona, Spain telf: +34.933.160.514 fax: +34.933.160.550
[[alternative HTML version deleted]]
_______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
-- Gabriel Becker, Ph.D Computational Biologist Genentech Research