Skip to content

[Bioc-devel] Subsetting Lists by Lists

12 messages · Michael Lawrence, Cook, Malcolm, Ryan +2 more

#
in the mean time, 

lapply(`[`,x,IntegerList(1:5))

??

 >-----Original Message-----
 >From: bioc-devel-bounces at r-project.org [mailto:bioc-devel-bounces at r-project.org] On Behalf Of Michael Lawrence
 >Sent: Tuesday, April 01, 2014 9:21 AM
 >To: bioc-devel at r-project.org
 >Subject: [Bioc-devel] Subsetting Lists by Lists
 >
 >Mostly to Herve:
 >
 >Sometimes we want to pluck the first 1, or 10, or whatever elements from
 >each element of a list. If I had a list 'x', I thought I could do this with:
 >
 >x[IntegerList(1:5)]
 >
 >But it only gives elements 1:5 from x[[1]], not each element of 'x'. In
 >other words, I thought the index would be repped out. Instead, 'x' is
 >subset to the length of 'i', and I'm not sure if that makes sense?
 >
 >But maybe what we really want are pluckHead/Tail, which would be robust to
 >the case that < n elements are in an element. And of course a more general
 >pluck(x, i) to select 'i' from each element, but I wanted the line above to
 >do that.
 >
 >Michael
 >
 >	[[alternative HTML version deleted]]
 >
 >_______________________________________________
 >Bioc-devel at r-project.org mailing list
 >https://stat.ethz.ch/mailman/listinfo/bioc-devel
#
That won't work if any vector has fewer than 5 elements. Maybe

lapply(x, head, n=5)

would work?
On Tue Apr 1 09:24:51 2014, Cook, Malcolm wrote:
#
Hi all,

The following is tangentially related, but hopefully the answer will be useful to others (both directly and via my package, which prompts this)...

Suppose I do this:

dat <- GRangesList( 
lapply( bigWigFileNames, import, 
            selection=someRanges ) )

Now I have a GRangesList of values, some of which have 0 ranges, some of which may have hundreds or thousands. I would like to aggregate and smooth over various subgroups of these for Gviz/trackViewer plots, so I was thinking of getting an RleList or similar out of the mcols() from each of the GRangesList atoms. 

However,

1) What is the "right" (fast, idiomatic, future-safe) way to extract and combine these ranges into columns of Rles might be.  I found something not-dissimilar in SpliceGraphs (or was it spliceGrapher?) but I imagine this is a common operation with some efficient "one true way" to do it. 

2) should I be sucking these into a genoset or SummarizedExperiment instead?  I'll take the hit if I have to, once, but I don't want it to eat up all available RAM since I eventually wish to make the plotting process at least somewhat interactive (even if that means calling IGV or interactiveDisplay to do it). 

Thanks for any guidance you may have to offer,

--t
#
On 04/01/2014 10:17 AM, Ryan wrote:
Yes. Note that you can use endoapply() to preserve the class of the
original object:

   > endoapply(cvg, head, n=5)
   RleList of length 3
   $chr1
   integer-Rle of length 5 with 2 runs
     Lengths: 4 1
     Values : 1 2

   $chr2
   integer-Rle of length 5 with 4 runs
     Lengths: 1 1 1 2
     Values : 0 1 2 3

   $chr3
   integer-Rle of length 5 with 1 run
     Lengths: 5
     Values : 0

But lapply- or endoapply-based solutions are slower than a [ based
solution. Unfortunately the latter requires too much munging to get
the subscript right:

   ## parallel seq_len()
   pseq_len <- function(eltlens)
   {
     ans_skeleton <- PartitioningByWidth(eltlens)
     tmp <- relist(seq_len(sum(eltlens)), ans_skeleton)
     tmp - start(ans_skeleton) + 1L
   }

Then:

   > pseq_len(c(5, 1, 0, 2))
   IntegerList of length 4
   [[1]] 1 2 3 4 5
   [[2]] 1
   [[3]] integer(0)
   [[4]] 1 2

   > cvg[pseq_len(pmin(elementLengths(cvg), 5))]
   RleList of length 3
   $chr1
   integer-Rle of length 5 with 2 runs
     Lengths: 4 1
     Values : 1 2

   $chr2
   integer-Rle of length 5 with 4 runs
     Lengths: 1 1 1 2
     Values : 0 1 2 3

   $chr3
   integer-Rle of length 5 with 1 run
     Lengths: 5
     Values : 0

H.

  
    
#
On 04/01/2014 02:43 PM, Michael Lawrence wrote:
The pseq_len() utility I sent previously solves your pluckHead()
problem:

   pluckHead <- function(x, n=6)
   {
     x[pseq_len(pmin(elementLengths(x), n))]
   }

or, using the non-exported utility IRanges:::fancy_mseq():

   pluckHead <- function(x, n=6)
   {
     x_eltlens <- unname(elementLengths(x))
     i_eltlens <- pmin(x_eltlens, n)
     i_skeleton <- PartitioningByEnd(cumsum(i_eltlens), names=names(x))
     unlisted_i <- IRanges:::fancy_mseq(i_eltlens)
     i <- relist(unlisted_i, i_skeleton)
     x[i]
   }

For pluckTail():

   pluckTail <- function(x, n=6)
   {
     x_eltlens <- unname(elementLengths(x))
     i_eltlens <- pmin(x_eltlens, n)
     i_skeleton <- PartitioningByEnd(cumsum(i_eltlens), names=names(x))
     offset <- x_eltlens - i_eltlens
     unlisted_i <- IRanges:::fancy_mseq(i_eltlens, offset)
     i <- relist(unlisted_i, i_skeleton)
     x[i]
   }

For both, 'n' can be of length > 1 and is recycled to the length of 'x'.
Negative values in 'n' are not supported but that should be easy to
add.

So I could add these 2 functions to IRanges, however, I'm not totally
convinced by the names. What about phead() and ptail() ("p" for
"parallel"), or vhead() and vtail() ("v" for "vectorized"), or mhead()
and mtail() (they're just fast equivalent to 'mapply(head, x, n)' and
'mapply(tail, x, n))', or...?

Thanks,
H.

  
    
#
Hi Tim,

There is probably too much guess work for me to really be able to
help... However, and FWIW, in Bioc-devel the 'asRle' argument of
import() has been replaced by the 'as' argument and it can be set
to "GRanges", "RleList", or "NumericList". Be aware that, surprisingly,
if you specify a 'selection', then using as="RleList" vs
as="NumericList" not only changes the returned type but also the 
semantic of the function: the returned List object has 1 list element
per chromosome for the former and 1 list element per range in
'selection' for the latter.

(IMO it would probably be less confusing and less error-prone if
switching between these 2 semantics was decoupled from choosing
the returned type.)

Anyway, in your case maybe you want to use as="RleList".

Then for averaging the score over each of the range in your initial
'selection", maybe you'll find the examples section of the tileGenome()
function helpful (this is in the GenomicRanges package).

Am I on the right track with this?

Cheers,
H.
On 04/01/2014 10:40 AM, Tim Triche, Jr. wrote:

  
    
#
On 04/01/2014 10:40 AM, Tim Triche, Jr. wrote:
BTW, as the maintainer of the SplicingGraphs package, I'm curious about
this. Would be great if you could remember what you've seen exactly and
where you've seen it so I could go check and make sure that I'm using
the "one true way".

Thanks,
H.

  
    
2 days later
#
Added in IRanges 1.21.41.

H.
On 04/01/2014 06:15 PM, Michael Lawrence wrote: