[Bioc-devel] Subsetting Lists by Lists
Added in IRanges 1.21.41. H.
On 04/01/2014 06:15 PM, Michael Lawrence wrote:
I like phead/ptail. I was going to write them, so thanks for taking care
of it!
Michael
On Tue, Apr 1, 2014 at 3:24 PM, Herv? Pag?s <hpages at fhcrc.org
<mailto:hpages at fhcrc.org>> wrote:
On 04/01/2014 02:43 PM, Michael Lawrence wrote:
Thanks Herve. I might not be so bad to have rep out in the
unnamed case
(think of NULL names meaning wildcard). If we had:
i <- IntegerList(1:5)
x[i]
The 'i' does not really identify any one element in 'x'. If both
'i' and
'x' had names, then there would be a matching, but otherwise,
truncating
'x' to length(i) is surprising, and it's hard to imagine a
use-case for
it. In some ways, this is analogous to logical indexing, which
is recycled.
But that said, my use case is really more of a pluckHead/Tail. Don't
worry about this release.
The pseq_len() utility I sent previously solves your pluckHead()
problem:
pluckHead <- function(x, n=6)
{
x[pseq_len(pmin(__elementLengths(x), n))]
}
or, using the non-exported utility IRanges:::fancy_mseq():
pluckHead <- function(x, n=6)
{
x_eltlens <- unname(elementLengths(x))
i_eltlens <- pmin(x_eltlens, n)
i_skeleton <- PartitioningByEnd(cumsum(i___eltlens),
names=names(x))
unlisted_i <- IRanges:::fancy_mseq(i___eltlens)
i <- relist(unlisted_i, i_skeleton)
x[i]
}
For pluckTail():
pluckTail <- function(x, n=6)
{
x_eltlens <- unname(elementLengths(x))
i_eltlens <- pmin(x_eltlens, n)
i_skeleton <- PartitioningByEnd(cumsum(i___eltlens),
names=names(x))
offset <- x_eltlens - i_eltlens
unlisted_i <- IRanges:::fancy_mseq(i___eltlens, offset)
i <- relist(unlisted_i, i_skeleton)
x[i]
}
For both, 'n' can be of length > 1 and is recycled to the length of 'x'.
Negative values in 'n' are not supported but that should be easy to
add.
So I could add these 2 functions to IRanges, however, I'm not totally
convinced by the names. What about phead() and ptail() ("p" for
"parallel"), or vhead() and vtail() ("v" for "vectorized"), or mhead()
and mtail() (they're just fast equivalent to 'mapply(head, x, n)' and
'mapply(tail, x, n))', or...?
Thanks,
H.
Michael
On Tue, Apr 1, 2014 at 12:06 PM, Herv? Pag?s <hpages at fhcrc.org
<mailto:hpages at fhcrc.org>
<mailto:hpages at fhcrc.org <mailto:hpages at fhcrc.org>>> wrote:
On 04/01/2014 10:17 AM, Ryan wrote:
That won't work if any vector has fewer than 5
elements. Maybe
lapply(x, head, n=5)
would work?
Yes. Note that you can use endoapply() to preserve the
class of the
original object:
> endoapply(cvg, head, n=5)
RleList of length 3
$chr1
integer-Rle of length 5 with 2 runs
Lengths: 4 1
Values : 1 2
$chr2
integer-Rle of length 5 with 4 runs
Lengths: 1 1 1 2
Values : 0 1 2 3
$chr3
integer-Rle of length 5 with 1 run
Lengths: 5
Values : 0
But lapply- or endoapply-based solutions are slower than a
[ based
solution. Unfortunately the latter requires too much
munging to get
the subscript right:
## parallel seq_len()
pseq_len <- function(eltlens)
{
ans_skeleton <- PartitioningByWidth(eltlens)
tmp <- relist(seq_len(sum(eltlens)), ans_skeleton)
tmp - start(ans_skeleton) + 1L
}
Then:
> pseq_len(c(5, 1, 0, 2))
IntegerList of length 4
[[1]] 1 2 3 4 5
[[2]] 1
[[3]] integer(0)
[[4]] 1 2
> cvg[pseq_len(pmin(____elementLengths(cvg), 5))]
RleList of length 3
$chr1
integer-Rle of length 5 with 2 runs
Lengths: 4 1
Values : 1 2
$chr2
integer-Rle of length 5 with 4 runs
Lengths: 1 1 1 2
Values : 0 1 2 3
$chr3
integer-Rle of length 5 with 1 run
Lengths: 5
Values : 0
H.
On Tue Apr 1 09:24:51 2014, Cook, Malcolm wrote:
in the mean time,
lapply(`[`,x,IntegerList(1:5))
??
>-----Original Message-----
>From: bioc-devel-bounces at r-project.____org
<mailto:bioc-devel-bounces at r-__project.org
<mailto:bioc-devel-bounces at r-project.org>>
[mailto:bioc-devel-bounces at r-____project.org
<mailto:bioc-devel-bounces at r-__project.org>
<mailto:bioc-devel-bounces at r-__project.org
<mailto:bioc-devel-bounces at r-project.org>>] On Behalf Of
Michael Lawrence
>Sent: Tuesday, April 01, 2014 9:21 AM
>To: bioc-devel at r-project.org
<mailto:bioc-devel at r-project.org>
<mailto:bioc-devel at r-project.__org
<mailto:bioc-devel at r-project.org>>
>Subject: [Bioc-devel] Subsetting Lists by Lists
>
>Mostly to Herve:
>
>Sometimes we want to pluck the first 1, or 10, or
whatever elements
from
>each element of a list. If I had a list 'x', I
thought I
could do
this with:
>
>x[IntegerList(1:5)]
>
>But it only gives elements 1:5 from x[[1]], not
each
element of
'x'. In
>other words, I thought the index would be
repped out.
Instead, 'x' is
>subset to the length of 'i', and I'm not sure
if that
makes sense?
>
>But maybe what we really want are
pluckHead/Tail, which
would be
robust to
>the case that < n elements are in an element.
And of
course a more
general
>pluck(x, i) to select 'i' from each element, but I
wanted the line
above to
>do that.
>
>Michael
>
> [[alternative HTML version deleted]]
>
>___________________________________________________
>Bioc-devel at r-project.org
<mailto:Bioc-devel at r-project.org>
<mailto:Bioc-devel at r-project.__org
<mailto:Bioc-devel at r-project.org>> mailing list
___________________________________________________
Bioc-devel at r-project.org <mailto:Bioc-devel at r-project.org>
<mailto:Bioc-devel at r-project.__org
<mailto:Bioc-devel at r-project.org>>
mailing list
https://stat.ethz.ch/mailman/____listinfo/bioc-devel
<https://stat.ethz.ch/mailman/__listinfo/bioc-devel>
<https://stat.ethz.ch/mailman/__listinfo/bioc-devel
<https://stat.ethz.ch/mailman/listinfo/bioc-devel>>
___________________________________________________
Bioc-devel at r-project.org <mailto:Bioc-devel at r-project.org>
<mailto:Bioc-devel at r-project.__org
<mailto:Bioc-devel at r-project.org>>
mailing list
https://stat.ethz.ch/mailman/____listinfo/bioc-devel
<https://stat.ethz.ch/mailman/__listinfo/bioc-devel>
<https://stat.ethz.ch/mailman/__listinfo/bioc-devel
<https://stat.ethz.ch/mailman/listinfo/bioc-devel>>
--
Herv? Pag?s
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024
E-mail: hpages at fhcrc.org <mailto:hpages at fhcrc.org>
<mailto:hpages at fhcrc.org <mailto:hpages at fhcrc.org>>
Phone: (206) 667-5791 <tel:%28206%29%20667-5791>
<tel:%28206%29%20667-5791>
Fax: (206) 667-1319 <tel:%28206%29%20667-1319>
<tel:%28206%29%20667-1319>
--
Herv? Pag?s
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024
E-mail: hpages at fhcrc.org <mailto:hpages at fhcrc.org>
Phone: (206) 667-5791 <tel:%28206%29%20667-5791>
Fax: (206) 667-1319 <tel:%28206%29%20667-1319>
Herv? Pag?s Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpages at fhcrc.org Phone: (206) 667-5791 Fax: (206) 667-1319