An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/bioc-devel/attachments/20130917/4487b6fe/attachment.pl>
[Bioc-devel] deprecation of keepSeqlevels and renameSeqlevels
14 messages · Vincent Carey, Tim Triche, Jr., Michael Lawrence +2 more
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/bioc-devel/attachments/20130917/4d8548ca/attachment.pl>
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/bioc-devel/attachments/20130917/49c2277f/attachment.pl>
For sorting the chromosome names in "natural" order, you can use
the makeSeqnameIds() helper function in GenomicRanges:
seqlevels(gr) <- seqlevels(gr)[makeSeqnameIds(seqlevels(gr))]
It recognizes several naming conventions (chr-prefixed or not,
roman, etc...) e.g.:
> makeSeqnameIds(c("Y", "1", "9", "M", "2"))
[1] 4 1 3 5 2
> makeSeqnameIds(c("chrY", "chrI", "chrIX", "chrM", "chrII"))
[1] 4 1 3 5 2
I still need to improve the documentation of this function, put
more examples, and add unit tests. It's used internally by the
makeTranscriptDb* functions in GenomicFeatures to assign internal
ids to the chromosomes so that all the TranscriptDb extractors
can then return GRanges and GRangesList objects with the seqlevels
in the "natural" order.
Cheers,
H.
On 09/17/2013 10:23 AM, Kasper Daniel Hansen wrote:
Actually, what I (and I think several others) just want is fixSeqnames() which sorts and fixes them to some convention. I don't really care which one, but I want to do some easy harmonization, preferably in line with other bioc tools. I know this is hard to do for all organisms, but having something that deals with the major ones (human, mouse, fruit fly, yeast, some monkeys) would be extremely valuable. If this function existed, I could just throw all my GRanges at it. Kasper On Tue, Sep 17, 2013 at 1:14 PM, Michael Lawrence <lawrence.michael at gene.com
wrote:
When I hit this I usually just do:
seqlevels(gr) <- paste0("chr", c(1:22, "X", "Y"))
It would be nice if there were a function for sorting chromosome names
according to a specified naming convention. Like sortSeqlevels(gr,
UCSCNaming()) or the alternative replacement syntax: seqinfo(gr) <-
sort(seqinfo(gr), UCSCNaming()). Although sort is only single-dispatch.
Michael
On Tue, Sep 17, 2013 at 8:59 AM, Vincent Carey
<stvjc at channing.harvard.edu>wrote:
I am replying to this as Michael mentions that this is a powerful operation. Here's my problem that I believe is related, but I do not see a straightforward solution
bhmm19uni
GRanges with 559456 ranges and 4 metadata columns:
seqnames ranges strand | name
score
<Rle> <IRanges> <Rle> | <character>
<numeric>
1 chr1 [ 67229025, 67341224] * | 13_Heterochrom/lo
0
2 chr1 [203057955, 203060154] * | 12_Repressed
0
3 chr1 [ 8458427, 8486226] * | 13_Heterochrom/lo
0
4 chr1 [ 16904427, 16904826] * | 5_Strong_Enhancer
0
5 chr1 [ 25289427, 25293426] * | 11_Weak_Txn
0
... ... ... ... ... ...
...
570766 chr22 [51100931, 51101330] * | 2_Weak_Promoter
0
570767 chr22 [51101331, 51101530] * | 6_Weak_Enhancer
0
570768 chr22 [51101531, 51101730] * | 2_Weak_Promoter
0
570769 chr22 [51101731, 51178130] * | 13_Heterochrom/lo
0
570770 chr22 [51178131, 51178330] * | 15_Repetitive/CNV
0
thick numcode
<IRanges> <character>
1 [ 67001613, 67113812] 13
2 [201324578, 201326777] 12
3 [ 8381014, 8408813] 13
4 [ 16777014, 16777413] 5
5 [ 25162014, 25166013] 11
... ... ...
570766 [49447797, 49448196] 2
570767 [49448197, 49448396] 6
570768 [49448397, 49448596] 2
570769 [49448597, 49524996] 13
570770 [49524997, 49525196] 15
---
seqlengths:
chr1 chr10 chr11 chr5 ... chr2 chr21
chr4
249250621 135534747 135006516 180915260 ... 243199373 48129895
191154276
If I try to make a karyogram with ggbio, the chromosomes are in an
unnatural order. What is the right approach to getting the seqinfo in a
natural order with respect to chromosome names and lengths? Reassigning
seqlevels seems dangerous but I haven't experimented fully. It must be a
common use case but I do not see it in doc.
On Tue, May 21, 2013 at 11:00 AM, Michael Lawrence <
lawrence.michael at gene.com> wrote:
On Mon, May 20, 2013 at 10:36 PM, Herv? Pag?s <hpages at fhcrc.org> wrote:
Michael, On 05/20/2013 08:44 PM, Michael Lawrence wrote:
On Mon, May 20, 2013 at 4:13 PM, Herv? Pag?s <hpages at fhcrc.org
<mailto:hpages at fhcrc.org>> wrote:
Hi,
On 05/20/2013 03:15 PM, Michael Lawrence wrote:
On Mon, May 20, 2013 at 3:11 PM, Martin Morgan
<mtmorgan at fhcrc.org <mailto:mtmorgan at fhcrc.org>> wrote:
Hi Michael --
On 5/20/2013 5:56 AM, Michael Lawrence wrote:
While it's cool that seqlevels<- has become so
flexible,
I still claim
that
rename/keep/drop would be a lot easier to read,
because
they describe the
high-level operation, and there's no reason to
deprecate
them. They're
also
easier to remember. For example, for dropping with
seqlevels<-, the user
needs
to remember that "force" is necessary to drop. Much
easier to just say
"dropSeqlevels(), please". And reimplementing
keepSeqlevels is still too
complicated. Is it such a maintenance burden to have
these simple
wrappers sit
on top of the low-level manipulators?
Another suggestion: renameSeqlevels should not
require
a
named vector (in
cases
where it is unnamed, require that it is parallel to
the
existing
seqlevels, as
with seqlevels<-).
I didn't really indicate what drove my desire to see
keepSeqlevels
deprecated. keepSeqlevels, seqlevels<-, and isActiveSeq
were
developed more
or less independently, and have different contracts. The
different
contracts are confusing to the user, as is creating a
usable
help system
when there are multiple end points for what is a common
operation. The help
pages of each were inadequate in different ways. And
because
the code was
developed independently, support for different objects
was
inconsistent. So
actually, yes, the maintenance (and use) burden for the
previous state of
affairs was substantial.
On the other hand, I agree that keepSeqlevels is
convenient
as a simple
wrapper around seqlevels<-, in the same way that setNames
and names<- are
both useful.
So we could iterate to
keepSeqlevels <-
function(x, value, ...)
{
seqlevels(x, force=TRUE) <- value
x
}
but I'd be less enthusiastic about maintaining the
original
contract of
keepSeqlevels, where keepSeqlevels(gr1, gr2), would have
worked for two
GRanges objects.
Why would this be called keepSeqlevels() if, depending on how
it's
used,
it will either add, drop, rename, or permute the seqlevels?
Couldn't this be called setSeqlevels?
I thought that new2old was necessary for permuting.
The seqlevels setter has no 'new2old' arg. You're right that
adding should be disallowed for keepSeqlevels(). Adding seqlevels is
not
a common operation. The two common operations are: * Permuting, usually because the data were imported in non-canonical order (seqnameStyle was designed to address this, no?).
Permuting is straightforward with seqlevels<-:
> seqlevels(gr)
[1] "chr1" "chr2" "chr3"
> seqlevels(gr) <- rev(seqlevels(gr))
> seqlevels(gr)
[1] "chr3" "chr2" "chr1" * Subsetting, either by keeping or dropping.
Also straightforward:
> seqlevels(gr, force=TRUE) <- "chr2"
> seqlevels(gr)
[1] "chr2" Two main reasons: taking a
small slice of the data for prototyping, or removing problematic chromosomes (sex, circular). This goes back to bsapply and the
exclude
argument.
There are other important use cases:
- Dropping seqlevels that are *not* in use. This happens for example
when subsetting a BAM file with Rsamtools::filterBam to keep only
the alignments located on a given chromosome. The entire sequence
dictionary (in the header) is propagated to the resulting BAM file
so when you read the file with readGAlignments() you end up with a
bunch of useless seqlevels that you generally start by dropping.
You don't want to drop any alignment so force=FALSE is your
friend.
This is sort of tangential to the discussion, but do you really want to
do
this? I would preserve the universe as given by the BAM.
- An even more common operation is renaming: 90% of the times that's
what I use seqlevels<- for, and that's what I tell people to use
when they need to rename their chromosomes. Also straightforward:
> seqlevels(gr)
[1] "chr1" "chr2" "chr3" "chrM"
> seqlevels(gr) <- sub("^chr", "", seqlevels(gr))
> seqlevels(gr)
[1] "1" "2" "3" "M"
> seqlevels(gr)[seqlevels(gr) == "M"] <- "MT"
> seqlevels(gr)
[1] "1" "2" "3" "MT"
Note that this is simpler to use than renameSeqlevels() which
always required you to build the entire right value as a named
vector.
Sure, renaming is a common use case. Not sure how I forgot that. As I wrote earlier in the thread, renameSeqlevels should be changed so
as
not to require naming of the vector.
So the added-value of keepSeqlevels() seems to boil down to: (a) it always uses force=TRUE (b) it preserves the original object and returns the modified object If you want to restrict the use of keepSeqlevels() to permuting and dropping, you'll also have to disallow renaming (in addition to
adding).
Then its name will more or less reflect what it does (if "keep" means "permute" or "drop"). The final result will be something that does
less
than setSeqlevels() and that is not necessarily easier to read: both will set on the object whatever vector of seqlevels they are supplied, except that one will fail when doing so would actually result in
adding
or renaming seqlevels.
So it looks like seqlevels<- is pretty powerful. Now I see that if I scroll down to the examples in the man page, I find the actual documentation. It's cool to learn that we can replace seqlevels on TxDb objects. My argument has always been that since these low-level operations are so powerful, it's nice to have high-level operations to clarify the code. We have to think hard to know what the RHS will do in that replacement. It's probably one of the most complex replacement operations; far more complex than the
typical
one, including levels<- on factor. There's nothing wrong with that; it's just complexity that we should be able to hide. Michael H.
Michael
H.
Well, I wasn't even aware of that feature, so you've made
your
point about
the documentation ;) Sounds like a good compromise to me.
Thanks for understanding,
Michael
Martin
Michael
On Sat, May 18, 2013 at 6:00 PM, Martin Morgan
<mtmorgan at fhcrc.org <mailto:mtmorgan at fhcrc.org>
<mailto:mtmorgan at fhcrc.org <mailto:
mtmorgan at fhcrc.org
wrote:
On 05/18/2013 05:42 PM, Martin Morgan wrote:
Some of the most common operations are
straight-forward, e.g.,
> gr = GRanges(paste0("chr", c(1:22,
"X", "Y")), IRanges(1,
100))
> seqlevels(gr) = sub("chr", "ch",
seqlevels(gr))
To get a more comprehensive example I should
have
followed my own
advice and
grabbed from the help page!
## Rename:
seqlevels(txdb) <- sub("chr", "",
seqlevels(txdb))
seqlevels(txdb)
seqlevels(txdb) <- paste0("CH",
seqlevels(txdb))
seqlevels(txdb)
seqlevels(txdb)[seqlevels(__****__txdb)
==
"CHM"] <- "M"
seqlevels(txdb)
--
Computational Biology / Fred Hutchinson Cancer
Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109
Location: Arnold Building M1 B861
Phone: (206) 667-2793
<tel:%28206%29%20667-2793>
<tel:%28206%29%20667-2793>
--
Dr. Martin Morgan, PhD
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109
[[alternative HTML version deleted]]
______________________________**___________________
Bioc-devel at r-project.org <mailto:Bioc-devel at r-project.**org<
Bioc-devel at r-project.org>
mailing list
https://stat.ethz.ch/mailman/__listinfo/bioc-devel> <https://stat.ethz.ch/mailman/**listinfo/bioc-devel< https://stat.ethz.ch/mailman/listinfo/bioc-devel> -- Herv? Pag?s Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpages at fhcrc.org <mailto:hpages at fhcrc.org> Phone: (206) 667-5791 <tel:%28206%29%20667-5791> Fax: (206) 667-1319 <tel:%28206%29%20667-1319> -- Herv? Pag?s Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpages at fhcrc.org Phone: (206) 667-5791 Fax: (206) 667-1319 [[alternative HTML version deleted]] _______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
[[alternative HTML version deleted]]
_______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
[[alternative HTML version deleted]]
_______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
Herv? Pag?s Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpages at fhcrc.org Phone: (206) 667-5791 Fax: (206) 667-1319
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/bioc-devel/attachments/20130917/d66b6c33/attachment.pl>
On 09/17/2013 11:30 AM, Michael Lawrence wrote:
Sounds great. Could we have a more intuitive name? Like naturalOrder().
That would be confusing because order() does something different:
> order(c("chrY", "chr1", "chr9", "chr2"))
[1] 2 4 3 1
> makeSeqnameIds(c("chrY", "chr1", "chr9", "chr2"))
[1] 4 1 3 2
makeSeqnameIds() is more like rank() except on how it deals with
duplicates:
> makeSeqnameIds(c("chrY", "chr1", "chr9", "chr2", "chr9"))
[1] 4 1 3 2 3
Whatever 'ties.method' you use, rank() will put "chrY" at rank 5.
IIRC makeSeqnameIds() needs to support duplicates because of how it's
used internally in the makeTranscriptDb* functions.
Alternatively we could add sortSeqlevels() as a wrapper to
makeSeqnameIds() so you would be able to do:
seqlevels(gr) <- sortSeqlevels(seqlevels(gr))
Does that work?
Thanks,
H.
And then the method to fix the GRanges could be restoreNaturalOrder().
Just some suggestions...
On Tue, Sep 17, 2013 at 10:58 AM, Herv? Pag?s <hpages at fhcrc.org
<mailto:hpages at fhcrc.org>> wrote:
For sorting the chromosome names in "natural" order, you can use
the makeSeqnameIds() helper function in GenomicRanges:
seqlevels(gr) <- seqlevels(gr)[makeSeqnameIds(__seqlevels(gr))]
It recognizes several naming conventions (chr-prefixed or not,
roman, etc...) e.g.:
> makeSeqnameIds(c("Y", "1", "9", "M", "2"))
[1] 4 1 3 5 2
> makeSeqnameIds(c("chrY", "chrI", "chrIX", "chrM", "chrII"))
[1] 4 1 3 5 2
I still need to improve the documentation of this function, put
more examples, and add unit tests. It's used internally by the
makeTranscriptDb* functions in GenomicFeatures to assign internal
ids to the chromosomes so that all the TranscriptDb extractors
can then return GRanges and GRangesList objects with the seqlevels
in the "natural" order.
Cheers,
H.
On 09/17/2013 10:23 AM, Kasper Daniel Hansen wrote:
Actually, what I (and I think several others) just want is
fixSeqnames()
which sorts and fixes them to some convention. I don't really
care which
one, but I want to do some easy harmonization, preferably in
line with
other bioc tools. I know this is hard to do for all organisms,
but having
something that deals with the major ones (human, mouse, fruit
fly, yeast,
some monkeys) would be extremely valuable. If this function
existed, I
could just throw all my GRanges at it.
Kasper
On Tue, Sep 17, 2013 at 1:14 PM, Michael Lawrence
<lawrence.michael at gene.com <mailto:lawrence.michael at gene.com>
wrote:
When I hit this I usually just do:
seqlevels(gr) <- paste0("chr", c(1:22, "X", "Y"))
It would be nice if there were a function for sorting
chromosome names
according to a specified naming convention. Like
sortSeqlevels(gr,
UCSCNaming()) or the alternative replacement syntax:
seqinfo(gr) <-
sort(seqinfo(gr), UCSCNaming()). Although sort is only
single-dispatch.
Michael
On Tue, Sep 17, 2013 at 8:59 AM, Vincent Carey
<stvjc at channing.harvard.edu
<mailto:stvjc at channing.harvard.edu>>__wrote:
I am replying to this as Michael mentions that this is a
powerful
operation. Here's my
problem that I believe is related, but I do not see a
straightforward
solution
bhmm19uni
GRanges with 559456 ranges and 4 metadata columns:
seqnames ranges strand |
name
score
<Rle> <IRanges> <Rle> |
<character>
<numeric>
1 chr1 [ 67229025, 67341224] * |
13_Heterochrom/lo
0
2 chr1 [203057955, 203060154] * |
12_Repressed
0
3 chr1 [ 8458427, 8486226] * |
13_Heterochrom/lo
0
4 chr1 [ 16904427, 16904826] * |
5_Strong_Enhancer
0
5 chr1 [ 25289427, 25293426] * |
11_Weak_Txn
0
... ... ... ... ...
...
...
570766 chr22 [51100931, 51101330] * |
2_Weak_Promoter
0
570767 chr22 [51101331, 51101530] * |
6_Weak_Enhancer
0
570768 chr22 [51101531, 51101730] * |
2_Weak_Promoter
0
570769 chr22 [51101731, 51178130] * |
13_Heterochrom/lo
0
570770 chr22 [51178131, 51178330] * |
15_Repetitive/CNV
0
thick numcode
<IRanges> <character>
1 [ 67001613, 67113812] 13
2 [201324578, 201326777] 12
3 [ 8381014, 8408813] 13
4 [ 16777014, 16777413] 5
5 [ 25162014, 25166013] 11
... ... ...
570766 [49447797, 49448196] 2
570767 [49448197, 49448396] 6
570768 [49448397, 49448596] 2
570769 [49448597, 49524996] 13
570770 [49524997, 49525196] 15
---
seqlengths:
chr1 chr10 chr11 chr5 ...
chr2 chr21
chr4
249250621 135534747 135006516 180915260 ...
243199373 48129895
191154276
If I try to make a karyogram with ggbio, the chromosomes
are in an
unnatural order. What is the right approach to getting
the seqinfo in a
natural order with respect to chromosome names and
lengths? Reassigning
seqlevels seems dangerous but I haven't experimented
fully. It must be a
common use case but I do not see it in doc.
On Tue, May 21, 2013 at 11:00 AM, Michael Lawrence <
lawrence.michael at gene.com
<mailto:lawrence.michael at gene.com>> wrote:
On Mon, May 20, 2013 at 10:36 PM, Herv? Pag?s
<hpages at fhcrc.org <mailto:hpages at fhcrc.org>> wrote:
Michael,
On 05/20/2013 08:44 PM, Michael Lawrence wrote:
On Mon, May 20, 2013 at 4:13 PM, Herv? Pag?s
<hpages at fhcrc.org <mailto:hpages at fhcrc.org>
<mailto:hpages at fhcrc.org
<mailto:hpages at fhcrc.org>>> wrote:
Hi,
On 05/20/2013 03:15 PM, Michael
Lawrence wrote:
On Mon, May 20, 2013 at 3:11 PM,
Martin Morgan
<mtmorgan at fhcrc.org
<mailto:mtmorgan at fhcrc.org>
<mailto:mtmorgan at fhcrc.org
<mailto:mtmorgan at fhcrc.org>>> wrote:
Hi Michael --
On 5/20/2013 5:56 AM, Michael
Lawrence wrote:
While it's cool that
seqlevels<- has become so
flexible,
I still claim
that
rename/keep/drop would be
a lot easier to read,
because
they describe the
high-level operation, and
there's no reason to
deprecate
them. They're
also
easier to remember. For
example, for dropping with
seqlevels<-, the user
needs
to remember that "force"
is necessary to drop. Much
easier to just say
"dropSeqlevels(), please".
And reimplementing
keepSeqlevels is still too
complicated. Is it such a
maintenance burden to have
these simple
wrappers sit
on top of the low-level
manipulators?
Another suggestion:
renameSeqlevels should not
require
a
named vector (in
cases
where it is unnamed,
require that it is parallel to
the
existing
seqlevels, as
with seqlevels<-).
I didn't really indicate what
drove my desire to see
keepSeqlevels
deprecated. keepSeqlevels,
seqlevels<-, and isActiveSeq
were
developed more
or less independently, and
have different contracts. The
different
contracts are confusing to the
user, as is creating a
usable
help system
when there are multiple end
points for what is a common
operation. The help
pages of each were inadequate
in different ways. And
because
the code was
developed independently,
support for different objects
was
inconsistent. So
actually, yes, the maintenance
(and use) burden for the
previous state of
affairs was substantial.
On the other hand, I agree
that keepSeqlevels is
convenient
as a simple
wrapper around seqlevels<-, in
the same way that setNames
and names<- are
both useful.
So we could iterate to
keepSeqlevels <-
function(x, value, ...)
{
seqlevels(x,
force=TRUE) <- value
x
}
but I'd be less enthusiastic
about maintaining the
original
contract of
keepSeqlevels, where
keepSeqlevels(gr1, gr2), would have
worked for two
GRanges objects.
Why would this be called
keepSeqlevels() if, depending on how
it's
used,
it will either add, drop, rename, or
permute the seqlevels?
Couldn't this be called setSeqlevels?
I thought that new2old was necessary for
permuting.
The seqlevels setter has no 'new2old' arg.
You're right that
adding should be disallowed for
keepSeqlevels(). Adding seqlevels is
not
a common operation. The two common
operations are:
* Permuting, usually because the data were
imported in non-canonical
order (seqnameStyle was designed to address
this, no?).
Permuting is straightforward with seqlevels<-:
> seqlevels(gr)
[1] "chr1" "chr2" "chr3"
> seqlevels(gr) <- rev(seqlevels(gr))
> seqlevels(gr)
[1] "chr3" "chr2" "chr1"
* Subsetting, either by keeping or dropping.
Also straightforward:
> seqlevels(gr, force=TRUE) <- "chr2"
> seqlevels(gr)
[1] "chr2"
Two main reasons: taking a
small slice of the data for prototyping, or
removing problematic
chromosomes (sex, circular). This goes back
to bsapply and the
exclude
argument.
There are other important use cases:
- Dropping seqlevels that are *not* in use.
This happens for example
when subsetting a BAM file with
Rsamtools::filterBam to keep only
the alignments located on a given
chromosome. The entire sequence
dictionary (in the header) is propagated
to the resulting BAM file
so when you read the file with
readGAlignments() you end up with a
bunch of useless seqlevels that you
generally start by dropping.
You don't want to drop any alignment so
force=FALSE is your
friend.
This is sort of tangential to the discussion, but do
you really want to
do
this? I would preserve the universe as given by the BAM.
- An even more common operation is renaming:
90% of the times that's
what I use seqlevels<- for, and that's
what I tell people to use
when they need to rename their
chromosomes. Also straightforward:
> seqlevels(gr)
[1] "chr1" "chr2" "chr3" "chrM"
> seqlevels(gr) <- sub("^chr", "",
seqlevels(gr))
> seqlevels(gr)
[1] "1" "2" "3" "M"
> seqlevels(gr)[seqlevels(gr) == "M"] <-
"MT"
> seqlevels(gr)
[1] "1" "2" "3" "MT"
Note that this is simpler to use than
renameSeqlevels() which
always required you to build the entire
right value as a named
vector.
Sure, renaming is a common use case. Not sure how I
forgot that.
As I wrote earlier in the thread, renameSeqlevels
should be changed so
as
not to require naming of the vector.
So the added-value of keepSeqlevels() seems to
boil down to:
(a) it always uses force=TRUE
(b) it preserves the original object and
returns the modified object
If you want to restrict the use of
keepSeqlevels() to permuting and
dropping, you'll also have to disallow renaming
(in addition to
adding).
Then its name will more or less reflect what it
does (if "keep" means
"permute" or "drop"). The final result will be
something that does
less
than setSeqlevels() and that is not necessarily
easier to read: both
will set on the object whatever vector of
seqlevels they are supplied,
except that one will fail when doing so would
actually result in
adding
or renaming seqlevels.
So it looks like seqlevels<- is pretty powerful. Now
I see that if I
scroll
down to the examples in the man page, I find the
actual documentation.
It's
cool to learn that we can replace seqlevels on TxDb
objects. My argument
has always been that since these low-level
operations are so powerful,
it's
nice to have high-level operations to clarify the
code. We have to think
hard to know what the RHS will do in that
replacement. It's probably one
of
the most complex replacement operations; far more
complex than the
typical
one, including levels<- on factor. There's nothing
wrong with that; it's
just complexity that we should be able to hide.
Michael
H.
Michael
H.
Well, I wasn't even aware of that
feature, so you've made
your
point about
the documentation ;) Sounds like a
good compromise to me.
Thanks for understanding,
Michael
Martin
Michael
On Sat, May 18, 2013 at
6:00 PM, Martin Morgan
<mtmorgan at fhcrc.org
<mailto:mtmorgan at fhcrc.org>
<mailto:mtmorgan at fhcrc.org
<mailto:mtmorgan at fhcrc.org>>
<mailto:mtmorgan at fhcrc.org
<mailto:mtmorgan at fhcrc.org> <mailto:
mtmorgan at fhcrc.org <mailto:mtmorgan at fhcrc.org>
wrote:
On 05/18/2013 05:42
PM, Martin Morgan wrote:
Some of the most
common operations are
straight-forward, e.g.,
> gr =
GRanges(paste0("chr", c(1:22,
"X", "Y")), IRanges(1,
100))
>
seqlevels(gr) = sub("chr", "ch",
seqlevels(gr))
To get a more
comprehensive example I should
have
followed my own
advice and
grabbed from the
help page!
## Rename:
seqlevels(txdb) <- sub("chr", "",
seqlevels(txdb))
seqlevels(txdb)
seqlevels(txdb) <- paste0("CH",
seqlevels(txdb))
seqlevels(txdb)
seqlevels(txdb)[seqlevels(__**__**__txdb)
==
"CHM"] <- "M"
seqlevels(txdb)
--
Computational
Biology / Fred Hutchinson Cancer
Research Center
1100 Fairview Ave. N.
PO Box 19024
Seattle, WA 98109
Location: Arnold
Building M1 B861
Phone: (206)
667-2793 <tel:%28206%29%20667-2793>
<tel:%28206%29%20667-2793>
<tel:%28206%29%20667-2793>
--
Dr. Martin Morgan, PhD
Fred Hutchinson Cancer
Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109
[[alternative HTML
version deleted]]
________________________________**___________________
Bioc-devel at r-project.org
<mailto:Bioc-devel at r-project.org>
<mailto:Bioc-devel at r-project.
<mailto:Bioc-devel at r-project.>*__*org<
Bioc-devel at r-project.org
<mailto:Bioc-devel at r-project.org>>
mailing list
https://stat.ethz.ch/mailman/___**_listinfo/bioc-devel
<https://stat.ethz.ch/mailman/_**_listinfo/bioc-devel><
https://stat.ethz.ch/mailman/____listinfo/bioc-devel
<https://stat.ethz.ch/mailman/__listinfo/bioc-devel>>
<https://stat.ethz.ch/mailman/__**listinfo/bioc-devel <https://stat.ethz.ch/mailman/**listinfo/bioc-devel><
https://stat.ethz.ch/mailman/__listinfo/bioc-devel
<https://stat.ethz.ch/mailman/listinfo/bioc-devel>>
--
Herv? Pag?s
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024
E-mail: hpages at fhcrc.org
<mailto:hpages at fhcrc.org>
<mailto:hpages at fhcrc.org
<mailto:hpages at fhcrc.org>>
Phone: (206) 667-5791
<tel:%28206%29%20667-5791>
<tel:%28206%29%20667-5791>
Fax: (206) 667-1319
<tel:%28206%29%20667-1319>
<tel:%28206%29%20667-1319>
--
Herv? Pag?s
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024
E-mail: hpages at fhcrc.org <mailto:hpages at fhcrc.org>
Phone: (206) 667-5791 <tel:%28206%29%20667-5791>
Fax: (206) 667-1319 <tel:%28206%29%20667-1319>
[[alternative HTML version deleted]]
_________________________________________________
Bioc-devel at r-project.org
<mailto:Bioc-devel at r-project.org> mailing list
https://stat.ethz.ch/mailman/__listinfo/bioc-devel
<https://stat.ethz.ch/mailman/listinfo/bioc-devel>
[[alternative HTML version deleted]]
_________________________________________________
Bioc-devel at r-project.org <mailto:Bioc-devel at r-project.org>
mailing list
https://stat.ethz.ch/mailman/__listinfo/bioc-devel
<https://stat.ethz.ch/mailman/listinfo/bioc-devel>
[[alternative HTML version deleted]]
_________________________________________________
Bioc-devel at r-project.org <mailto:Bioc-devel at r-project.org>
mailing list
https://stat.ethz.ch/mailman/__listinfo/bioc-devel
<https://stat.ethz.ch/mailman/listinfo/bioc-devel>
--
Herv? Pag?s
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024
E-mail: hpages at fhcrc.org <mailto:hpages at fhcrc.org>
Phone: (206) 667-5791 <tel:%28206%29%20667-5791>
Fax: (206) 667-1319 <tel:%28206%29%20667-1319>
Herv? Pag?s Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpages at fhcrc.org Phone: (206) 667-5791 Fax: (206) 667-1319
I have been doing this in a kludgey fashion for a long long time, and would favor restoreNaturalOrder() as a clear name for what it does. It would be great to have this in BioC-2.13 release. Thanks Michael and Herve and Kasper for bringing up and solving this cleanly. --t
On Sep 17, 2013, at 11:30 AM, Michael Lawrence <lawrence.michael at gene.com> wrote:
Sounds great. Could we have a more intuitive name? Like naturalOrder(). And then the method to fix the GRanges could be restoreNaturalOrder(). Just some suggestions... On Tue, Sep 17, 2013 at 10:58 AM, Herv? Pag?s <hpages at fhcrc.org> wrote:
For sorting the chromosome names in "natural" order, you can use the makeSeqnameIds() helper function in GenomicRanges: seqlevels(gr) <- seqlevels(gr)[makeSeqnameIds(**seqlevels(gr))] It recognizes several naming conventions (chr-prefixed or not, roman, etc...) e.g.:
makeSeqnameIds(c("Y", "1", "9", "M", "2"))
[1] 4 1 3 5 2
makeSeqnameIds(c("chrY", "chrI", "chrIX", "chrM", "chrII"))
[1] 4 1 3 5 2 I still need to improve the documentation of this function, put more examples, and add unit tests. It's used internally by the makeTranscriptDb* functions in GenomicFeatures to assign internal ids to the chromosomes so that all the TranscriptDb extractors can then return GRanges and GRangesList objects with the seqlevels in the "natural" order. Cheers, H. On 09/17/2013 10:23 AM, Kasper Daniel Hansen wrote:
Actually, what I (and I think several others) just want is fixSeqnames() which sorts and fixes them to some convention. I don't really care which one, but I want to do some easy harmonization, preferably in line with other bioc tools. I know this is hard to do for all organisms, but having something that deals with the major ones (human, mouse, fruit fly, yeast, some monkeys) would be extremely valuable. If this function existed, I could just throw all my GRanges at it. Kasper On Tue, Sep 17, 2013 at 1:14 PM, Michael Lawrence < lawrence.michael at gene.com
wrote:
When I hit this I usually just do:
seqlevels(gr) <- paste0("chr", c(1:22, "X", "Y"))
It would be nice if there were a function for sorting chromosome names
according to a specified naming convention. Like sortSeqlevels(gr,
UCSCNaming()) or the alternative replacement syntax: seqinfo(gr) <-
sort(seqinfo(gr), UCSCNaming()). Although sort is only single-dispatch.
Michael
On Tue, Sep 17, 2013 at 8:59 AM, Vincent Carey
<stvjc at channing.harvard.edu>**wrote:
I am replying to this as Michael mentions that this is a powerful
operation. Here's my
problem that I believe is related, but I do not see a straightforward
solution
bhmm19uni
GRanges with 559456 ranges and 4 metadata columns:
seqnames ranges strand | name
score
<Rle> <IRanges> <Rle> | <character>
<numeric>
1 chr1 [ 67229025, 67341224] * | 13_Heterochrom/lo
0
2 chr1 [203057955, 203060154] * | 12_Repressed
0
3 chr1 [ 8458427, 8486226] * | 13_Heterochrom/lo
0
4 chr1 [ 16904427, 16904826] * | 5_Strong_Enhancer
0
5 chr1 [ 25289427, 25293426] * | 11_Weak_Txn
0
... ... ... ... ... ...
...
570766 chr22 [51100931, 51101330] * | 2_Weak_Promoter
0
570767 chr22 [51101331, 51101530] * | 6_Weak_Enhancer
0
570768 chr22 [51101531, 51101730] * | 2_Weak_Promoter
0
570769 chr22 [51101731, 51178130] * | 13_Heterochrom/lo
0
570770 chr22 [51178131, 51178330] * | 15_Repetitive/CNV
0
thick numcode
<IRanges> <character>
1 [ 67001613, 67113812] 13
2 [201324578, 201326777] 12
3 [ 8381014, 8408813] 13
4 [ 16777014, 16777413] 5
5 [ 25162014, 25166013] 11
... ... ...
570766 [49447797, 49448196] 2
570767 [49448197, 49448396] 6
570768 [49448397, 49448596] 2
570769 [49448597, 49524996] 13
570770 [49524997, 49525196] 15
---
seqlengths:
chr1 chr10 chr11 chr5 ... chr2 chr21
chr4
249250621 135534747 135006516 180915260 ... 243199373 48129895
191154276
If I try to make a karyogram with ggbio, the chromosomes are in an
unnatural order. What is the right approach to getting the seqinfo in a
natural order with respect to chromosome names and lengths? Reassigning
seqlevels seems dangerous but I haven't experimented fully. It must be
a
common use case but I do not see it in doc.
On Tue, May 21, 2013 at 11:00 AM, Michael Lawrence <
lawrence.michael at gene.com> wrote:
On Mon, May 20, 2013 at 10:36 PM, Herv? Pag?s <hpages at fhcrc.org>
wrote: Michael,
On 05/20/2013 08:44 PM, Michael Lawrence wrote:
On Mon, May 20, 2013 at 4:13 PM, Herv? Pag?s <hpages at fhcrc.org
<mailto:hpages at fhcrc.org>> wrote:
Hi,
On 05/20/2013 03:15 PM, Michael Lawrence wrote:
On Mon, May 20, 2013 at 3:11 PM, Martin Morgan
<mtmorgan at fhcrc.org <mailto:mtmorgan at fhcrc.org>> wrote:
Hi Michael --
On 5/20/2013 5:56 AM, Michael Lawrence wrote:
While it's cool that seqlevels<- has become so
flexible,
I still claim
that
rename/keep/drop would be a lot easier to read,
because
they describe the
high-level operation, and there's no reason to
deprecate
them. They're
also
easier to remember. For example, for dropping with
seqlevels<-, the user
needs
to remember that "force" is necessary to drop. Much
easier to just say
"dropSeqlevels(), please". And reimplementing
keepSeqlevels is still too
complicated. Is it such a maintenance burden to have
these simple
wrappers sit
on top of the low-level manipulators?
Another suggestion: renameSeqlevels should not
require
a
named vector (in
cases
where it is unnamed, require that it is parallel to
the
existing
seqlevels, as
with seqlevels<-).
I didn't really indicate what drove my desire to see
keepSeqlevels
deprecated. keepSeqlevels, seqlevels<-, and isActiveSeq
were
developed more
or less independently, and have different contracts. The
different
contracts are confusing to the user, as is creating a
usable
help system
when there are multiple end points for what is a common
operation. The help
pages of each were inadequate in different ways. And
because
the code was
developed independently, support for different objects
was
inconsistent. So
actually, yes, the maintenance (and use) burden for the
previous state of
affairs was substantial.
On the other hand, I agree that keepSeqlevels is
convenient
as a simple
wrapper around seqlevels<-, in the same way that
setNames
and names<- are
both useful.
So we could iterate to
keepSeqlevels <-
function(x, value, ...)
{
seqlevels(x, force=TRUE) <- value
x
}
but I'd be less enthusiastic about maintaining the
original
contract of
keepSeqlevels, where keepSeqlevels(gr1, gr2), would have
worked for two
GRanges objects.
Why would this be called keepSeqlevels() if, depending on how
it's
used,
it will either add, drop, rename, or permute the seqlevels?
Couldn't this be called setSeqlevels?
I thought that new2old was necessary for permuting.
The seqlevels setter has no 'new2old' arg. You're right that
adding should be disallowed for keepSeqlevels(). Adding seqlevels is
not
a common operation. The two common operations are:
* Permuting, usually because the data were imported in non-canonical order (seqnameStyle was designed to address this, no?).
Permuting is straightforward with seqlevels<-:
seqlevels(gr)
[1] "chr1" "chr2" "chr3"
seqlevels(gr) <- rev(seqlevels(gr))
seqlevels(gr)
[1] "chr3" "chr2" "chr1" * Subsetting, either by keeping or dropping. Also straightforward:
seqlevels(gr, force=TRUE) <- "chr2"
seqlevels(gr)
[1] "chr2" Two main reasons: taking a
small slice of the data for prototyping, or removing problematic chromosomes (sex, circular). This goes back to bsapply and the
exclude
argument.
There are other important use cases:
- Dropping seqlevels that are *not* in use. This happens for
example
when subsetting a BAM file with Rsamtools::filterBam to keep only
the alignments located on a given chromosome. The entire sequence
dictionary (in the header) is propagated to the resulting BAM
file
so when you read the file with readGAlignments() you end up with
a
bunch of useless seqlevels that you generally start by dropping.
You don't want to drop any alignment so force=FALSE is your
friend.
This is sort of tangential to the discussion, but do you really want
to
do
this? I would preserve the universe as given by the BAM.
- An even more common operation is renaming: 90% of the times
that's
what I use seqlevels<- for, and that's what I tell people to use
when they need to rename their chromosomes. Also straightforward:
seqlevels(gr)
[1] "chr1" "chr2" "chr3" "chrM"
seqlevels(gr) <- sub("^chr", "", seqlevels(gr))
seqlevels(gr)
[1] "1" "2" "3" "M"
seqlevels(gr)[seqlevels(gr) == "M"] <- "MT" seqlevels(gr)
[1] "1" "2" "3" "MT"
Note that this is simpler to use than renameSeqlevels() which
always required you to build the entire right value as a named
vector.
Sure, renaming is a common use case. Not sure how I forgot that.
As I wrote earlier in the thread, renameSeqlevels should be changed so
as
not to require naming of the vector.
So the added-value of keepSeqlevels() seems to boil down to:
(a) it always uses force=TRUE (b) it preserves the original object and returns the modified object If you want to restrict the use of keepSeqlevels() to permuting and dropping, you'll also have to disallow renaming (in addition to
adding).
Then its name will more or less reflect what it does (if "keep" means
"permute" or "drop"). The final result will be something that does
less
than setSeqlevels() and that is not necessarily easier to read: both
will set on the object whatever vector of seqlevels they are supplied, except that one will fail when doing so would actually result in
adding
or renaming seqlevels.
So it looks like seqlevels<- is pretty powerful. Now I see that if I
scroll down to the examples in the man page, I find the actual documentation. It's cool to learn that we can replace seqlevels on TxDb objects. My argument has always been that since these low-level operations are so powerful, it's nice to have high-level operations to clarify the code. We have to think hard to know what the RHS will do in that replacement. It's probably one of the most complex replacement operations; far more complex than the
typical
one, including levels<- on factor. There's nothing wrong with that; it's
just complexity that we should be able to hide. Michael H.
Michael
H.
Well, I wasn't even aware of that feature, so you've made
your
point about
the documentation ;) Sounds like a good compromise to me.
Thanks for understanding,
Michael
Martin
Michael
On Sat, May 18, 2013 at 6:00 PM, Martin Morgan
<mtmorgan at fhcrc.org <mailto:mtmorgan at fhcrc.org>
<mailto:mtmorgan at fhcrc.org <mailto:
mtmorgan at fhcrc.org
wrote:
On 05/18/2013 05:42 PM, Martin Morgan wrote:
Some of the most common operations are
straight-forward, e.g.,
gr = GRanges(paste0("chr", c(1:22,
"X", "Y")), IRanges(1,
100))
seqlevels(gr) = sub("chr", "ch",
seqlevels(gr))
To get a more comprehensive example I should
have
followed my own
advice and
grabbed from the help page!
## Rename:
seqlevels(txdb) <- sub("chr", "",
seqlevels(txdb))
seqlevels(txdb)
seqlevels(txdb) <- paste0("CH",
seqlevels(txdb))
seqlevels(txdb)
seqlevels(txdb)[seqlevels(__****
**__txdb)
==
"CHM"] <- "M"
seqlevels(txdb)
--
Computational Biology / Fred Hutchinson Cancer
Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109
Location: Arnold Building M1 B861
Phone: (206) 667-2793
<tel:%28206%29%20667-2793>
<tel:%28206%29%20667-2793>
--
Dr. Martin Morgan, PhD
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109
[[alternative HTML version deleted]]
______________________________****___________________
Bioc-devel at r-project.org <mailto:Bioc-devel at r-project.***
*org<
Bioc-devel at r-project.org>
mailing list
--
Herv? Pag?s
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024
E-mail: hpages at fhcrc.org <mailto:hpages at fhcrc.org>
Phone: (206) 667-5791 <tel:%28206%29%20667-5791>
Fax: (206) 667-1319 <tel:%28206%29%20667-1319>
--
Herv? Pag?s Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpages at fhcrc.org Phone: (206) 667-5791 Fax: (206) 667-1319
[[alternative HTML version deleted]]
______________________________**_________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/**listinfo/bioc-devel<https://stat.ethz.ch/mailman/listinfo/bioc-devel>
[[alternative HTML version deleted]]
______________________________**_________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/**listinfo/bioc-devel<https://stat.ethz.ch/mailman/listinfo/bioc-devel>
[[alternative HTML version deleted]]
______________________________**_________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/**listinfo/bioc-devel<https://stat.ethz.ch/mailman/listinfo/bioc-devel>
-- Herv? Pag?s Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpages at fhcrc.org Phone: (206) 667-5791 Fax: (206) 667-1319
[[alternative HTML version deleted]]
_______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
On 09/17/2013 10:58 AM, Herv? Pag?s wrote:
For sorting the chromosome names in "natural" order, you can use the makeSeqnameIds() helper function in GenomicRanges: seqlevels(gr) <- seqlevels(gr)[makeSeqnameIds(seqlevels(gr))]
I meant:
seqlevels(gr) <- seqlevels(gr)[order(makeSeqnameIds(seqlevels(gr)))]
Probably the source of Michael's confusion... sorry.
sortSeqlevels() just added to GenomicRanges 1.13.44:
seqlevels <- c("chr2RHet", "chr4", "chrUextra", "chrYHet",
"chrM", "chrXHet", "chr2LHet", "chrU",
"chr3L", "chr3R", "chr2R", "chrX")
Then:
> sortSeqlevels(seqlevels)
[1] "chr2R" "chr3L" "chr3R" "chr4" "chrX"
"chrU"
[7] "chrM" "chr2LHet" "chr2RHet" "chrXHet" "chrYHet"
"chrUextra"
> sortSeqlevels(seqlevels, X.is.sexchrom=TRUE)
[1] "chr2R" "chr3L" "chr3R" "chr4" "chrX"
"chrU"
[7] "chrM" "chr2LHet" "chr2RHet" "chrXHet" "chrYHet"
"chrUextra"
H.
It recognizes several naming conventions (chr-prefixed or not, roman, etc...) e.g.:
> makeSeqnameIds(c("Y", "1", "9", "M", "2"))
[1] 4 1 3 5 2
> makeSeqnameIds(c("chrY", "chrI", "chrIX", "chrM", "chrII"))
[1] 4 1 3 5 2 I still need to improve the documentation of this function, put more examples, and add unit tests. It's used internally by the makeTranscriptDb* functions in GenomicFeatures to assign internal ids to the chromosomes so that all the TranscriptDb extractors can then return GRanges and GRangesList objects with the seqlevels in the "natural" order. Cheers, H. On 09/17/2013 10:23 AM, Kasper Daniel Hansen wrote:
Actually, what I (and I think several others) just want is fixSeqnames() which sorts and fixes them to some convention. I don't really care which one, but I want to do some easy harmonization, preferably in line with other bioc tools. I know this is hard to do for all organisms, but having something that deals with the major ones (human, mouse, fruit fly, yeast, some monkeys) would be extremely valuable. If this function existed, I could just throw all my GRanges at it. Kasper On Tue, Sep 17, 2013 at 1:14 PM, Michael Lawrence <lawrence.michael at gene.com
wrote:
When I hit this I usually just do:
seqlevels(gr) <- paste0("chr", c(1:22, "X", "Y"))
It would be nice if there were a function for sorting chromosome names
according to a specified naming convention. Like sortSeqlevels(gr,
UCSCNaming()) or the alternative replacement syntax: seqinfo(gr) <-
sort(seqinfo(gr), UCSCNaming()). Although sort is only single-dispatch.
Michael
On Tue, Sep 17, 2013 at 8:59 AM, Vincent Carey
<stvjc at channing.harvard.edu>wrote:
I am replying to this as Michael mentions that this is a powerful operation. Here's my problem that I believe is related, but I do not see a straightforward solution
bhmm19uni
GRanges with 559456 ranges and 4 metadata columns:
seqnames ranges strand | name
score
<Rle> <IRanges> <Rle> | <character>
<numeric>
1 chr1 [ 67229025, 67341224] * | 13_Heterochrom/lo
0
2 chr1 [203057955, 203060154] * | 12_Repressed
0
3 chr1 [ 8458427, 8486226] * | 13_Heterochrom/lo
0
4 chr1 [ 16904427, 16904826] * | 5_Strong_Enhancer
0
5 chr1 [ 25289427, 25293426] * | 11_Weak_Txn
0
... ... ... ... ... ...
...
570766 chr22 [51100931, 51101330] * | 2_Weak_Promoter
0
570767 chr22 [51101331, 51101530] * | 6_Weak_Enhancer
0
570768 chr22 [51101531, 51101730] * | 2_Weak_Promoter
0
570769 chr22 [51101731, 51178130] * | 13_Heterochrom/lo
0
570770 chr22 [51178131, 51178330] * | 15_Repetitive/CNV
0
thick numcode
<IRanges> <character>
1 [ 67001613, 67113812] 13
2 [201324578, 201326777] 12
3 [ 8381014, 8408813] 13
4 [ 16777014, 16777413] 5
5 [ 25162014, 25166013] 11
... ... ...
570766 [49447797, 49448196] 2
570767 [49448197, 49448396] 6
570768 [49448397, 49448596] 2
570769 [49448597, 49524996] 13
570770 [49524997, 49525196] 15
---
seqlengths:
chr1 chr10 chr11 chr5 ... chr2 chr21
chr4
249250621 135534747 135006516 180915260 ... 243199373 48129895
191154276
If I try to make a karyogram with ggbio, the chromosomes are in an
unnatural order. What is the right approach to getting the seqinfo
in a
natural order with respect to chromosome names and lengths?
Reassigning
seqlevels seems dangerous but I haven't experimented fully. It must
be a
common use case but I do not see it in doc.
On Tue, May 21, 2013 at 11:00 AM, Michael Lawrence <
lawrence.michael at gene.com> wrote:
On Mon, May 20, 2013 at 10:36 PM, Herv? Pag?s <hpages at fhcrc.org> wrote:
Michael, On 05/20/2013 08:44 PM, Michael Lawrence wrote:
On Mon, May 20, 2013 at 4:13 PM, Herv? Pag?s <hpages at fhcrc.org
<mailto:hpages at fhcrc.org>> wrote:
Hi,
On 05/20/2013 03:15 PM, Michael Lawrence wrote:
On Mon, May 20, 2013 at 3:11 PM, Martin Morgan
<mtmorgan at fhcrc.org <mailto:mtmorgan at fhcrc.org>> wrote:
Hi Michael --
On 5/20/2013 5:56 AM, Michael Lawrence wrote:
While it's cool that seqlevels<- has become so
flexible,
I still claim
that
rename/keep/drop would be a lot easier to read,
because
they describe the
high-level operation, and there's no reason to
deprecate
them. They're
also
easier to remember. For example, for dropping with
seqlevels<-, the user
needs
to remember that "force" is necessary to drop. Much
easier to just say
"dropSeqlevels(), please". And reimplementing
keepSeqlevels is still too
complicated. Is it such a maintenance burden to
have
these simple
wrappers sit
on top of the low-level manipulators?
Another suggestion: renameSeqlevels should not
require
a
named vector (in
cases
where it is unnamed, require that it is parallel to
the
existing
seqlevels, as
with seqlevels<-).
I didn't really indicate what drove my desire to see
keepSeqlevels
deprecated. keepSeqlevels, seqlevels<-, and isActiveSeq
were
developed more
or less independently, and have different contracts.
The
different
contracts are confusing to the user, as is creating a
usable
help system
when there are multiple end points for what is a common
operation. The help
pages of each were inadequate in different ways. And
because
the code was
developed independently, support for different objects
was
inconsistent. So
actually, yes, the maintenance (and use) burden for the
previous state of
affairs was substantial.
On the other hand, I agree that keepSeqlevels is
convenient
as a simple
wrapper around seqlevels<-, in the same way that
setNames
and names<- are
both useful.
So we could iterate to
keepSeqlevels <-
function(x, value, ...)
{
seqlevels(x, force=TRUE) <- value
x
}
but I'd be less enthusiastic about maintaining the
original
contract of
keepSeqlevels, where keepSeqlevels(gr1, gr2), would
have
worked for two
GRanges objects.
Why would this be called keepSeqlevels() if, depending on how
it's
used,
it will either add, drop, rename, or permute the seqlevels?
Couldn't this be called setSeqlevels?
I thought that new2old was necessary for permuting.
The seqlevels setter has no 'new2old' arg. You're right that
adding should be disallowed for keepSeqlevels(). Adding seqlevels is
not
a common operation. The two common operations are: * Permuting, usually because the data were imported in non-canonical order (seqnameStyle was designed to address this, no?).
Permuting is straightforward with seqlevels<-:
> seqlevels(gr)
[1] "chr1" "chr2" "chr3"
> seqlevels(gr) <- rev(seqlevels(gr))
> seqlevels(gr)
[1] "chr3" "chr2" "chr1" * Subsetting, either by keeping or dropping.
Also straightforward:
> seqlevels(gr, force=TRUE) <- "chr2"
> seqlevels(gr)
[1] "chr2" Two main reasons: taking a
small slice of the data for prototyping, or removing problematic chromosomes (sex, circular). This goes back to bsapply and the
exclude
argument.
There are other important use cases:
- Dropping seqlevels that are *not* in use. This happens for
example
when subsetting a BAM file with Rsamtools::filterBam to keep
only
the alignments located on a given chromosome. The entire
sequence
dictionary (in the header) is propagated to the resulting BAM
file
so when you read the file with readGAlignments() you end up
with a
bunch of useless seqlevels that you generally start by dropping.
You don't want to drop any alignment so force=FALSE is your
friend.
This is sort of tangential to the discussion, but do you really want to
do
this? I would preserve the universe as given by the BAM.
- An even more common operation is renaming: 90% of the times
that's
what I use seqlevels<- for, and that's what I tell people to use
when they need to rename their chromosomes. Also
straightforward:
> seqlevels(gr)
[1] "chr1" "chr2" "chr3" "chrM"
> seqlevels(gr) <- sub("^chr", "", seqlevels(gr))
> seqlevels(gr)
[1] "1" "2" "3" "M"
> seqlevels(gr)[seqlevels(gr) == "M"] <- "MT"
> seqlevels(gr)
[1] "1" "2" "3" "MT"
Note that this is simpler to use than renameSeqlevels() which
always required you to build the entire right value as a named
vector.
Sure, renaming is a common use case. Not sure how I forgot that. As I wrote earlier in the thread, renameSeqlevels should be changed so
as
not to require naming of the vector.
So the added-value of keepSeqlevels() seems to boil down to: (a) it always uses force=TRUE (b) it preserves the original object and returns the modified object If you want to restrict the use of keepSeqlevels() to permuting and dropping, you'll also have to disallow renaming (in addition to
adding).
Then its name will more or less reflect what it does (if "keep" means "permute" or "drop"). The final result will be something that does
less
than setSeqlevels() and that is not necessarily easier to read: both will set on the object whatever vector of seqlevels they are supplied, except that one will fail when doing so would actually result in
adding
or renaming seqlevels.
So it looks like seqlevels<- is pretty powerful. Now I see that if I scroll down to the examples in the man page, I find the actual documentation. It's cool to learn that we can replace seqlevels on TxDb objects. My argument has always been that since these low-level operations are so powerful, it's nice to have high-level operations to clarify the code. We have to think hard to know what the RHS will do in that replacement. It's probably one of the most complex replacement operations; far more complex than the
typical
one, including levels<- on factor. There's nothing wrong with that; it's just complexity that we should be able to hide. Michael H.
Michael
H.
Well, I wasn't even aware of that feature, so you've made
your
point about
the documentation ;) Sounds like a good compromise to me.
Thanks for understanding,
Michael
Martin
Michael
On Sat, May 18, 2013 at 6:00 PM, Martin Morgan
<mtmorgan at fhcrc.org <mailto:mtmorgan at fhcrc.org>
<mailto:mtmorgan at fhcrc.org <mailto:
mtmorgan at fhcrc.org
wrote:
On 05/18/2013 05:42 PM, Martin Morgan wrote:
Some of the most common operations are
straight-forward, e.g.,
> gr = GRanges(paste0("chr",
c(1:22,
"X", "Y")), IRanges(1,
100))
> seqlevels(gr) = sub("chr", "ch",
seqlevels(gr))
To get a more comprehensive example I should
have
followed my own
advice and
grabbed from the help page!
## Rename:
seqlevels(txdb) <- sub("chr", "",
seqlevels(txdb))
seqlevels(txdb)
seqlevels(txdb) <- paste0("CH",
seqlevels(txdb))
seqlevels(txdb)
seqlevels(txdb)[seqlevels(__****__txdb)
==
"CHM"] <- "M"
seqlevels(txdb)
--
Computational Biology / Fred Hutchinson
Cancer
Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109
Location: Arnold Building M1 B861
Phone: (206) 667-2793
<tel:%28206%29%20667-2793>
<tel:%28206%29%20667-2793>
--
Dr. Martin Morgan, PhD
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109
[[alternative HTML version deleted]]
______________________________**___________________
Bioc-devel at r-project.org
<mailto:Bioc-devel at r-project.**org<
Bioc-devel at r-project.org>
mailing list
https://stat.ethz.ch/mailman/__listinfo/bioc-devel> <https://stat.ethz.ch/mailman/**listinfo/bioc-devel< https://stat.ethz.ch/mailman/listinfo/bioc-devel> -- Herv? Pag?s Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpages at fhcrc.org <mailto:hpages at fhcrc.org> Phone: (206) 667-5791 <tel:%28206%29%20667-5791> Fax: (206) 667-1319 <tel:%28206%29%20667-1319> -- Herv? Pag?s Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpages at fhcrc.org Phone: (206) 667-5791 Fax: (206) 667-1319 [[alternative HTML version deleted]] _______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
[[alternative HTML version deleted]]
_______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
[[alternative HTML version deleted]]
_______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
Herv? Pag?s Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpages at fhcrc.org Phone: (206) 667-5791 Fax: (206) 667-1319
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/bioc-devel/attachments/20130917/ad0aca9b/attachment.pl>
OK I just made sortSeqlevels() a generic with a method for character
vectors and a default method. So you can do sortSeqlevels() directly on
a Seqinfo object, a GRanges object and anything that contains a Seqinfo
component:
> gr
GRanges with 0 ranges and 0 metadata columns:
seqnames ranges strand
<Rle> <IRanges> <Rle>
---
seqlengths:
chr2RHet chr4 chrUextra chrYHet ... chr3R chr2R
chrX
NA NA NA NA ... NA NA
NA
> sortSeqlevels(gr)
GRanges with 0 ranges and 0 metadata columns:
seqnames ranges strand
<Rle> <IRanges> <Rle>
---
seqlengths:
chr2R chr3L chr3R chr4 ... chrXHet chrYHet
chrUextra
NA NA NA NA ... NA NA
NA
H.
On 09/17/2013 01:05 PM, Michael Lawrence wrote:
Thanks, Herv?, that's great.
Would it make sense to have a sort or sortSeqlevels method for Seqinfo
and then a sortSeqlevels,ANY method that just does seqinfo(x) <-
sort(seqinfo(x))?
Michael
On Tue, Sep 17, 2013 at 12:56 PM, Herv? Pag?s <hpages at fhcrc.org
<mailto:hpages at fhcrc.org>> wrote:
On 09/17/2013 10:58 AM, Herv? Pag?s wrote:
For sorting the chromosome names in "natural" order, you can use
the makeSeqnameIds() helper function in GenomicRanges:
seqlevels(gr) <- seqlevels(gr)[makeSeqnameIds(__seqlevels(gr))]
I meant:
seqlevels(gr) <-
seqlevels(gr)[order(__makeSeqnameIds(seqlevels(gr)))__]
Probably the source of Michael's confusion... sorry.
sortSeqlevels() just added to GenomicRanges 1.13.44:
seqlevels <- c("chr2RHet", "chr4", "chrUextra", "chrYHet",
"chrM", "chrXHet", "chr2LHet", "chrU",
"chr3L", "chr3R", "chr2R", "chrX")
Then:
> sortSeqlevels(seqlevels)
[1] "chr2R" "chr3L" "chr3R" "chr4" "chrX" "chrU"
[7] "chrM" "chr2LHet" "chr2RHet" "chrXHet" "chrYHet"
"chrUextra"
> sortSeqlevels(seqlevels, X.is.sexchrom=TRUE)
[1] "chr2R" "chr3L" "chr3R" "chr4" "chrX" "chrU"
[7] "chrM" "chr2LHet" "chr2RHet" "chrXHet" "chrYHet"
"chrUextra"
H.
It recognizes several naming conventions (chr-prefixed or not,
roman, etc...) e.g.:
> makeSeqnameIds(c("Y", "1", "9", "M", "2"))
[1] 4 1 3 5 2
> makeSeqnameIds(c("chrY", "chrI", "chrIX", "chrM", "chrII"))
[1] 4 1 3 5 2
I still need to improve the documentation of this function, put
more examples, and add unit tests. It's used internally by the
makeTranscriptDb* functions in GenomicFeatures to assign internal
ids to the chromosomes so that all the TranscriptDb extractors
can then return GRanges and GRangesList objects with the seqlevels
in the "natural" order.
Cheers,
H.
On 09/17/2013 10:23 AM, Kasper Daniel Hansen wrote:
Actually, what I (and I think several others) just want is
fixSeqnames()
which sorts and fixes them to some convention. I don't
really care which
one, but I want to do some easy harmonization, preferably in
line with
other bioc tools. I know this is hard to do for all
organisms, but
having
something that deals with the major ones (human, mouse,
fruit fly, yeast,
some monkeys) would be extremely valuable. If this function
existed, I
could just throw all my GRanges at it.
Kasper
On Tue, Sep 17, 2013 at 1:14 PM, Michael Lawrence
<lawrence.michael at gene.com <mailto:lawrence.michael at gene.com>
wrote:
When I hit this I usually just do:
seqlevels(gr) <- paste0("chr", c(1:22, "X", "Y"))
It would be nice if there were a function for sorting
chromosome names
according to a specified naming convention. Like
sortSeqlevels(gr,
UCSCNaming()) or the alternative replacement syntax:
seqinfo(gr) <-
sort(seqinfo(gr), UCSCNaming()). Although sort is only
single-dispatch.
Michael
On Tue, Sep 17, 2013 at 8:59 AM, Vincent Carey
<stvjc at channing.harvard.edu
<mailto:stvjc at channing.harvard.edu>>__wrote:
I am replying to this as Michael mentions that this
is a powerful
operation. Here's my
problem that I believe is related, but I do not see
a straightforward
solution
bhmm19uni
GRanges with 559456 ranges and 4 metadata columns:
seqnames ranges strand
| name
score
<Rle> <IRanges> <Rle>
| <character>
<numeric>
1 chr1 [ 67229025, 67341224] *
| 13_Heterochrom/lo
0
2 chr1 [203057955, 203060154] *
| 12_Repressed
0
3 chr1 [ 8458427, 8486226] *
| 13_Heterochrom/lo
0
4 chr1 [ 16904427, 16904826] *
| 5_Strong_Enhancer
0
5 chr1 [ 25289427, 25293426] *
| 11_Weak_Txn
0
... ... ... ...
... ...
...
570766 chr22 [51100931, 51101330] *
| 2_Weak_Promoter
0
570767 chr22 [51101331, 51101530] *
| 6_Weak_Enhancer
0
570768 chr22 [51101531, 51101730] *
| 2_Weak_Promoter
0
570769 chr22 [51101731, 51178130] *
| 13_Heterochrom/lo
0
570770 chr22 [51178131, 51178330] *
| 15_Repetitive/CNV
0
thick numcode
<IRanges> <character>
1 [ 67001613, 67113812] 13
2 [201324578, 201326777] 12
3 [ 8381014, 8408813] 13
4 [ 16777014, 16777413] 5
5 [ 25162014, 25166013] 11
... ... ...
570766 [49447797, 49448196] 2
570767 [49448197, 49448396] 6
570768 [49448397, 49448596] 2
570769 [49448597, 49524996] 13
570770 [49524997, 49525196] 15
---
seqlengths:
chr1 chr10 chr11 chr5 ...
chr2 chr21
chr4
249250621 135534747 135006516 180915260 ...
243199373 48129895
191154276
If I try to make a karyogram with ggbio, the
chromosomes are in an
unnatural order. What is the right approach to
getting the seqinfo
in a
natural order with respect to chromosome names and
lengths?
Reassigning
seqlevels seems dangerous but I haven't experimented
fully. It must
be a
common use case but I do not see it in doc.
On Tue, May 21, 2013 at 11:00 AM, Michael Lawrence <
lawrence.michael at gene.com
<mailto:lawrence.michael at gene.com>> wrote:
On Mon, May 20, 2013 at 10:36 PM, Herv? Pag?s
<hpages at fhcrc.org <mailto:hpages at fhcrc.org>>
wrote:
Michael,
On 05/20/2013 08:44 PM, Michael Lawrence wrote:
On Mon, May 20, 2013 at 4:13 PM, Herv?
Pag?s <hpages at fhcrc.org
<mailto:hpages at fhcrc.org>
<mailto:hpages at fhcrc.org
<mailto:hpages at fhcrc.org>>> wrote:
Hi,
On 05/20/2013 03:15 PM, Michael
Lawrence wrote:
On Mon, May 20, 2013 at 3:11
PM, Martin Morgan
<mtmorgan at fhcrc.org
<mailto:mtmorgan at fhcrc.org>
<mailto:mtmorgan at fhcrc.org
<mailto:mtmorgan at fhcrc.org>>> wrote:
Hi Michael --
On 5/20/2013 5:56 AM,
Michael Lawrence wrote:
While it's cool that
seqlevels<- has become so
flexible,
I still claim
that
rename/keep/drop would
be a lot easier to read,
because
they describe the
high-level operation,
and there's no reason to
deprecate
them. They're
also
easier to remember.
For example, for dropping with
seqlevels<-, the user
needs
to remember that
"force" is necessary to drop. Much
easier to just say
"dropSeqlevels(),
please". And reimplementing
keepSeqlevels is still too
complicated. Is it
such a maintenance burden to
have
these simple
wrappers sit
on top of the
low-level manipulators?
Another suggestion:
renameSeqlevels should not
require
a
named vector (in
cases
where it is unnamed,
require that it is parallel to
the
existing
seqlevels, as
with seqlevels<-).
I didn't really indicate
what drove my desire to see
keepSeqlevels
deprecated. keepSeqlevels,
seqlevels<-, and isActiveSeq
were
developed more
or less independently, and
have different contracts.
The
different
contracts are confusing to
the user, as is creating a
usable
help system
when there are multiple
end points for what is a common
operation. The help
pages of each were
inadequate in different ways. And
because
the code was
developed independently,
support for different objects
was
inconsistent. So
actually, yes, the
maintenance (and use) burden for the
previous state of
affairs was substantial.
On the other hand, I agree
that keepSeqlevels is
convenient
as a simple
wrapper around
seqlevels<-, in the same way that
setNames
and names<- are
both useful.
So we could iterate to
keepSeqlevels <-
function(x, value,
...)
{
seqlevels(x,
force=TRUE) <- value
x
}
but I'd be less
enthusiastic about maintaining the
original
contract of
keepSeqlevels, where
keepSeqlevels(gr1, gr2), would
have
worked for two
GRanges objects.
Why would this be called
keepSeqlevels() if, depending on how
it's
used,
it will either add, drop, rename,
or permute the seqlevels?
Couldn't this be called setSeqlevels?
I thought that new2old was necessary for
permuting.
The seqlevels setter has no 'new2old' arg.
You're right that
adding should be disallowed for
keepSeqlevels(). Adding seqlevels is
not
a common operation. The two common
operations are:
* Permuting, usually because the data
were imported in non-canonical
order (seqnameStyle was designed to
address this, no?).
Permuting is straightforward with seqlevels<-:
> seqlevels(gr)
[1] "chr1" "chr2" "chr3"
> seqlevels(gr) <- rev(seqlevels(gr))
> seqlevels(gr)
[1] "chr3" "chr2" "chr1"
* Subsetting, either by keeping or dropping.
Also straightforward:
> seqlevels(gr, force=TRUE) <- "chr2"
> seqlevels(gr)
[1] "chr2"
Two main reasons: taking a
small slice of the data for prototyping,
or removing problematic
chromosomes (sex, circular). This goes
back to bsapply and the
exclude
argument.
There are other important use cases:
- Dropping seqlevels that are *not* in
use. This happens for
example
when subsetting a BAM file with
Rsamtools::filterBam to keep
only
the alignments located on a given
chromosome. The entire
sequence
dictionary (in the header) is
propagated to the resulting BAM
file
so when you read the file with
readGAlignments() you end up
with a
bunch of useless seqlevels that you
generally start by dropping.
You don't want to drop any alignment
so force=FALSE is your
friend.
This is sort of tangential to the discussion,
but do you really
want to
do
this? I would preserve the universe as given by
the BAM.
- An even more common operation is
renaming: 90% of the times
that's
what I use seqlevels<- for, and that's
what I tell people to use
when they need to rename their
chromosomes. Also
straightforward:
> seqlevels(gr)
[1] "chr1" "chr2" "chr3" "chrM"
> seqlevels(gr) <- sub("^chr", "",
seqlevels(gr))
> seqlevels(gr)
[1] "1" "2" "3" "M"
> seqlevels(gr)[seqlevels(gr) ==
"M"] <- "MT"
> seqlevels(gr)
[1] "1" "2" "3" "MT"
Note that this is simpler to use than
renameSeqlevels() which
always required you to build the
entire right value as a named
vector.
Sure, renaming is a common use case. Not sure
how I forgot that.
As I wrote earlier in the thread,
renameSeqlevels should be changed so
as
not to require naming of the vector.
So the added-value of keepSeqlevels() seems
to boil down to:
(a) it always uses force=TRUE
(b) it preserves the original object and
returns the modified
object
If you want to restrict the use of
keepSeqlevels() to permuting and
dropping, you'll also have to disallow
renaming (in addition to
adding).
Then its name will more or less reflect what
it does (if "keep" means
"permute" or "drop"). The final result will
be something that does
less
than setSeqlevels() and that is not
necessarily easier to read: both
will set on the object whatever vector of
seqlevels they are
supplied,
except that one will fail when doing so
would actually result in
adding
or renaming seqlevels.
So it looks like seqlevels<- is pretty powerful.
Now I see that if I
scroll
down to the examples in the man page, I find the
actual documentation.
It's
cool to learn that we can replace seqlevels on
TxDb objects. My
argument
has always been that since these low-level
operations are so powerful,
it's
nice to have high-level operations to clarify
the code. We have to
think
hard to know what the RHS will do in that
replacement. It's
probably one
of
the most complex replacement operations; far
more complex than the
typical
one, including levels<- on factor. There's
nothing wrong with that;
it's
just complexity that we should be able to hide.
Michael
H.
Michael
H.
Well, I wasn't even aware of
that feature, so you've made
your
point about
the documentation ;) Sounds
like a good compromise to me.
Thanks for understanding,
Michael
Martin
Michael
On Sat, May 18, 2013
at 6:00 PM, Martin Morgan
<mtmorgan at fhcrc.org
<mailto:mtmorgan at fhcrc.org>
<mailto:mtmorgan at fhcrc.org
<mailto:mtmorgan at fhcrc.org>>
<mailto:mtmorgan at fhcrc.org
<mailto:mtmorgan at fhcrc.org> <mailto:
mtmorgan at fhcrc.org <mailto:mtmorgan at fhcrc.org>
wrote:
On 05/18/2013
05:42 PM, Martin Morgan wrote:
Some of the
most common operations are
straight-forward, e.g.,
> gr =
GRanges(paste0("chr",
c(1:22,
"X", "Y")), IRanges(1,
100))
>
seqlevels(gr) = sub("chr", "ch",
seqlevels(gr))
To get a more
comprehensive example I should
have
followed my own
advice and
grabbed from the
help page!
## Rename:
seqlevels(txdb) <- sub("chr", "",
seqlevels(txdb))
seqlevels(txdb)
seqlevels(txdb) <- paste0("CH",
seqlevels(txdb))
seqlevels(txdb)
seqlevels(txdb)[seqlevels(__**__**__txdb)
==
"CHM"] <- "M"
seqlevels(txdb)
--
Computational
Biology / Fred Hutchinson
Cancer
Research Center
1100 Fairview
Ave. N.
PO Box 19024
Seattle, WA 98109
Location: Arnold
Building M1 B861
Phone: (206)
667-2793 <tel:%28206%29%20667-2793>
<tel:%28206%29%20667-2793>
<tel:%28206%29%20667-2793>
--
Dr. Martin Morgan, PhD
Fred Hutchinson Cancer
Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109
[[alternative HTML
version deleted]]
________________________________**___________________
Bioc-devel at r-project.org
<mailto:Bioc-devel at r-project.org>
<mailto:Bioc-devel at r-project.
<mailto:Bioc-devel at r-project.>*__*org<
Bioc-devel at r-project.org
<mailto:Bioc-devel at r-project.org>>
mailing list
https://stat.ethz.ch/mailman/___**_listinfo/bioc-devel
<https://stat.ethz.ch/mailman/_**_listinfo/bioc-devel><
https://stat.ethz.ch/mailman/____listinfo/bioc-devel
<https://stat.ethz.ch/mailman/__listinfo/bioc-devel>>
<https://stat.ethz.ch/mailman/__**listinfo/bioc-devel <https://stat.ethz.ch/mailman/**listinfo/bioc-devel><
https://stat.ethz.ch/mailman/__listinfo/bioc-devel
<https://stat.ethz.ch/mailman/listinfo/bioc-devel>>
--
Herv? Pag?s
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024
E-mail: hpages at fhcrc.org
<mailto:hpages at fhcrc.org>
<mailto:hpages at fhcrc.org
<mailto:hpages at fhcrc.org>>
Phone: (206) 667-5791
<tel:%28206%29%20667-5791>
<tel:%28206%29%20667-5791>
Fax: (206) 667-1319
<tel:%28206%29%20667-1319>
<tel:%28206%29%20667-1319>
--
Herv? Pag?s
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024
E-mail: hpages at fhcrc.org
<mailto:hpages at fhcrc.org>
Phone: (206) 667-5791 <tel:%28206%29%20667-5791>
Fax: (206) 667-1319 <tel:%28206%29%20667-1319>
[[alternative HTML version deleted]]
_________________________________________________
Bioc-devel at r-project.org
<mailto:Bioc-devel at r-project.org> mailing list
https://stat.ethz.ch/mailman/__listinfo/bioc-devel
<https://stat.ethz.ch/mailman/listinfo/bioc-devel>
[[alternative HTML version deleted]]
_________________________________________________
Bioc-devel at r-project.org
<mailto:Bioc-devel at r-project.org> mailing list
https://stat.ethz.ch/mailman/__listinfo/bioc-devel
<https://stat.ethz.ch/mailman/listinfo/bioc-devel>
[[alternative HTML version deleted]]
_________________________________________________
Bioc-devel at r-project.org <mailto:Bioc-devel at r-project.org>
mailing list
https://stat.ethz.ch/mailman/__listinfo/bioc-devel
<https://stat.ethz.ch/mailman/listinfo/bioc-devel>
--
Herv? Pag?s
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024
E-mail: hpages at fhcrc.org <mailto:hpages at fhcrc.org>
Phone: (206) 667-5791 <tel:%28206%29%20667-5791>
Fax: (206) 667-1319 <tel:%28206%29%20667-1319>
Herv? Pag?s Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpages at fhcrc.org Phone: (206) 667-5791 Fax: (206) 667-1319
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/bioc-devel/attachments/20130917/e9f523b9/attachment.pl>
Hi Kasper,
On 09/17/2013 06:05 PM, Kasper Daniel Hansen wrote:
Nice. I thought you had made something like this. What would be extremely useful is a function that just takes a GRanges and return a fixed version - less typing, and easier to understand what goes on.
sortSeqlevels() only sorts the seqlevels. Fixing them (if by this you mean make them adhere to some naming covention) is another story and a more complicated one. We tried to tackled this with the seqnameStyle() which is only a half-baked solution at the moment. We need to revisit the current seqnameStyle() approach. Yes in the future we could have something like fixSeqlevels() for doing both: sorting and renaming. Cheers, H.
Best,
Kasper
On Tue, Sep 17, 2013 at 4:22 PM, Herv? Pag?s <hpages at fhcrc.org
<mailto:hpages at fhcrc.org>> wrote:
OK I just made sortSeqlevels() a generic with a method for character
vectors and a default method. So you can do sortSeqlevels() directly on
a Seqinfo object, a GRanges object and anything that contains a Seqinfo
component:
> gr
GRanges with 0 ranges and 0 metadata columns:
seqnames ranges strand
<Rle> <IRanges> <Rle>
---
seqlengths:
chr2RHet chr4 chrUextra chrYHet ... chr3R
chr2R chrX
NA NA NA NA ... NA
NA NA
> sortSeqlevels(gr)
GRanges with 0 ranges and 0 metadata columns:
seqnames ranges strand
<Rle> <IRanges> <Rle>
---
seqlengths:
chr2R chr3L chr3R chr4 ... chrXHet
chrYHet chrUextra
NA NA NA NA ... NA
NA NA
H.
On 09/17/2013 01:05 PM, Michael Lawrence wrote:
Thanks, Herv?, that's great.
Would it make sense to have a sort or sortSeqlevels method for
Seqinfo
and then a sortSeqlevels,ANY method that just does seqinfo(x) <-
sort(seqinfo(x))?
Michael
On Tue, Sep 17, 2013 at 12:56 PM, Herv? Pag?s <hpages at fhcrc.org
<mailto:hpages at fhcrc.org>
<mailto:hpages at fhcrc.org <mailto:hpages at fhcrc.org>>> wrote:
On 09/17/2013 10:58 AM, Herv? Pag?s wrote:
For sorting the chromosome names in "natural" order,
you can use
the makeSeqnameIds() helper function in GenomicRanges:
seqlevels(gr) <-
seqlevels(gr)[makeSeqnameIds(____seqlevels(gr))]
I meant:
seqlevels(gr) <-
seqlevels(gr)[order(____makeSeqnameIds(seqlevels(gr)))____]
Probably the source of Michael's confusion... sorry.
sortSeqlevels() just added to GenomicRanges 1.13.44:
seqlevels <- c("chr2RHet", "chr4", "chrUextra", "chrYHet",
"chrM", "chrXHet", "chr2LHet", "chrU",
"chr3L", "chr3R", "chr2R", "chrX")
Then:
> sortSeqlevels(seqlevels)
[1] "chr2R" "chr3L" "chr3R" "chr4"
"chrX" "chrU"
[7] "chrM" "chr2LHet" "chr2RHet" "chrXHet"
"chrYHet"
"chrUextra"
> sortSeqlevels(seqlevels, X.is.sexchrom=TRUE)
[1] "chr2R" "chr3L" "chr3R" "chr4"
"chrX" "chrU"
[7] "chrM" "chr2LHet" "chr2RHet" "chrXHet"
"chrYHet"
"chrUextra"
H.
It recognizes several naming conventions (chr-prefixed
or not,
roman, etc...) e.g.:
> makeSeqnameIds(c("Y", "1", "9", "M", "2"))
[1] 4 1 3 5 2
> makeSeqnameIds(c("chrY", "chrI", "chrIX", "chrM",
"chrII"))
[1] 4 1 3 5 2
I still need to improve the documentation of this
function, put
more examples, and add unit tests. It's used internally
by the
makeTranscriptDb* functions in GenomicFeatures to
assign internal
ids to the chromosomes so that all the TranscriptDb
extractors
can then return GRanges and GRangesList objects with
the seqlevels
in the "natural" order.
Cheers,
H.
On 09/17/2013 10:23 AM, Kasper Daniel Hansen wrote:
Actually, what I (and I think several others) just
want is
fixSeqnames()
which sorts and fixes them to some convention. I don't
really care which
one, but I want to do some easy harmonization,
preferably in
line with
other bioc tools. I know this is hard to do for all
organisms, but
having
something that deals with the major ones (human, mouse,
fruit fly, yeast,
some monkeys) would be extremely valuable. If this
function
existed, I
could just throw all my GRanges at it.
Kasper
On Tue, Sep 17, 2013 at 1:14 PM, Michael Lawrence
<lawrence.michael at gene.com
<mailto:lawrence.michael at gene.com>
<mailto:lawrence.michael at gene.__com
<mailto:lawrence.michael at gene.com>>
wrote:
When I hit this I usually just do:
seqlevels(gr) <- paste0("chr", c(1:22, "X", "Y"))
It would be nice if there were a function for
sorting
chromosome names
according to a specified naming convention. Like
sortSeqlevels(gr,
UCSCNaming()) or the alternative replacement
syntax:
seqinfo(gr) <-
sort(seqinfo(gr), UCSCNaming()). Although sort
is only
single-dispatch.
Michael
On Tue, Sep 17, 2013 at 8:59 AM, Vincent Carey
<stvjc at channing.harvard.edu
<mailto:stvjc at channing.harvard.edu>
<mailto:stvjc at channing.__harvard.edu
<mailto:stvjc at channing.harvard.edu>>>__wrote:
I am replying to this as Michael mentions
that this
is a powerful
operation. Here's my
problem that I believe is related, but I do
not see
a straightforward
solution
bhmm19uni
GRanges with 559456 ranges and 4 metadata
columns:
seqnames ranges
strand
| name
score
<Rle> <IRanges>
<Rle>
| <character>
<numeric>
1 chr1 [ 67229025, 67341224]
*
| 13_Heterochrom/lo
0
2 chr1 [203057955, 203060154]
*
| 12_Repressed
0
3 chr1 [ 8458427, 8486226]
*
| 13_Heterochrom/lo
0
4 chr1 [ 16904427, 16904826]
*
| 5_Strong_Enhancer
0
5 chr1 [ 25289427, 25293426]
*
| 11_Weak_Txn
0
... ... ...
...
... ...
...
570766 chr22 [51100931, 51101330]
*
| 2_Weak_Promoter
0
570767 chr22 [51101331, 51101530]
*
| 6_Weak_Enhancer
0
570768 chr22 [51101531, 51101730]
*
| 2_Weak_Promoter
0
570769 chr22 [51101731, 51178130]
*
| 13_Heterochrom/lo
0
570770 chr22 [51178131, 51178330]
*
| 15_Repetitive/CNV
0
thick numcode
<IRanges> <character>
1 [ 67001613, 67113812] 13
2 [201324578, 201326777] 12
3 [ 8381014, 8408813] 13
4 [ 16777014, 16777413] 5
5 [ 25162014, 25166013] 11
... ... ...
570766 [49447797, 49448196] 2
570767 [49448197, 49448396] 6
570768 [49448397, 49448596] 2
570769 [49448597, 49524996] 13
570770 [49524997, 49525196] 15
---
seqlengths:
chr1 chr10 chr11
chr5 ...
chr2 chr21
chr4
249250621 135534747 135006516
180915260 ...
243199373 48129895
191154276
If I try to make a karyogram with ggbio, the
chromosomes are in an
unnatural order. What is the right approach to
getting the seqinfo
in a
natural order with respect to chromosome
names and
lengths?
Reassigning
seqlevels seems dangerous but I haven't
experimented
fully. It must
be a
common use case but I do not see it in doc.
On Tue, May 21, 2013 at 11:00 AM, Michael
Lawrence <
lawrence.michael at gene.com <mailto:lawrence.michael at gene.com>
<mailto:lawrence.michael at gene.__com
<mailto:lawrence.michael at gene.com>>> wrote:
On Mon, May 20, 2013 at 10:36 PM, Herv?
Pag?s
<hpages at fhcrc.org
<mailto:hpages at fhcrc.org> <mailto:hpages at fhcrc.org
<mailto:hpages at fhcrc.org>>>
wrote:
Michael,
On 05/20/2013 08:44 PM, Michael
Lawrence wrote:
On Mon, May 20, 2013 at 4:13
PM, Herv?
Pag?s <hpages at fhcrc.org
<mailto:hpages at fhcrc.org>
<mailto:hpages at fhcrc.org
<mailto:hpages at fhcrc.org>>
<mailto:hpages at fhcrc.org
<mailto:hpages at fhcrc.org>
<mailto:hpages at fhcrc.org
<mailto:hpages at fhcrc.org>>>> wrote:
Hi,
On 05/20/2013 03:15 PM,
Michael
Lawrence wrote:
On Mon, May 20, 2013
at 3:11
PM, Martin Morgan
<mtmorgan at fhcrc.org
<mailto:mtmorgan at fhcrc.org>
<mailto:mtmorgan at fhcrc.org
<mailto:mtmorgan at fhcrc.org>>
<mailto:mtmorgan at fhcrc.org
<mailto:mtmorgan at fhcrc.org>
<mailto:mtmorgan at fhcrc.org
<mailto:mtmorgan at fhcrc.org>>>> wrote:
Hi Michael --
On 5/20/2013 5:56 AM,
Michael Lawrence wrote:
While it's
cool that
seqlevels<- has become so
flexible,
I still claim
that
rename/keep/drop would
be a lot easier to read,
because
they describe the
high-level
operation,
and there's no reason to
deprecate
them. They're
also
easier to
remember.
For example, for dropping with
seqlevels<-,
the user
needs
to remember that
"force" is necessary to drop. Much
easier to
just say
"dropSeqlevels(),
please". And reimplementing
keepSeqlevels
is still too
complicated.
Is it
such a maintenance burden to
have
these simple
wrappers sit
on top of the
low-level manipulators?
Another
suggestion:
renameSeqlevels should not
require
a
named vector (in
cases
where it is
unnamed,
require that it is parallel to
the
existing
seqlevels, as
with
seqlevels<-).
I didn't really
indicate
what drove my desire to see
keepSeqlevels
deprecated.
keepSeqlevels,
seqlevels<-, and isActiveSeq
were
developed more
or less
independently, and
have different contracts.
The
different
contracts are
confusing to
the user, as is creating a
usable
help system
when there are
multiple
end points for what is a common
operation. The help
pages of each were
inadequate in different ways. And
because
the code was
developed
independently,
support for different objects
was
inconsistent. So
actually, yes, the
maintenance (and use) burden
for the
previous state of
affairs was
substantial.
On the other
hand, I agree
that keepSeqlevels is
convenient
as a simple
wrapper around
seqlevels<-, in the same way that
setNames
and names<- are
both useful.
So we could
iterate to
keepSeqlevels <-
function(x, value,
...)
{
seqlevels(x,
force=TRUE) <- value
x
}
but I'd be less
enthusiastic about maintaining the
original
contract of
keepSeqlevels, where
keepSeqlevels(gr1, gr2), would
have
worked for two
GRanges objects.
Why would this be called
keepSeqlevels() if, depending
on how
it's
used,
it will either add, drop,
rename,
or permute the seqlevels?
Couldn't this be called
setSeqlevels?
I thought that new2old was
necessary for
permuting.
The seqlevels setter has no
'new2old' arg.
You're right that
adding should be disallowed for
keepSeqlevels(). Adding
seqlevels is
not
a common operation. The two common
operations are:
* Permuting, usually because
the data
were imported in non-canonical
order (seqnameStyle was designed to
address this, no?).
Permuting is straightforward with
seqlevels<-:
> seqlevels(gr)
[1] "chr1" "chr2" "chr3"
> seqlevels(gr) <-
rev(seqlevels(gr))
> seqlevels(gr)
[1] "chr3" "chr2" "chr1"
* Subsetting, either by keeping
or dropping.
Also straightforward:
> seqlevels(gr, force=TRUE) <-
"chr2"
> seqlevels(gr)
[1] "chr2"
Two main reasons: taking a
small slice of the data for
prototyping,
or removing problematic
chromosomes (sex, circular).
This goes
back to bsapply and the
exclude
argument.
There are other important use cases:
- Dropping seqlevels that are
*not* in
use. This happens for
example
when subsetting a BAM file with
Rsamtools::filterBam to keep
only
the alignments located on a given
chromosome. The entire
sequence
dictionary (in the header) is
propagated to the resulting BAM
file
so when you read the file with
readGAlignments() you end up
with a
bunch of useless seqlevels
that you
generally start by dropping.
You don't want to drop any
alignment
so force=FALSE is your
friend.
This is sort of tangential to the
discussion,
but do you really
want to
do
this? I would preserve the universe as
given by
the BAM.
- An even more common operation is
renaming: 90% of the times
that's
what I use seqlevels<- for,
and that's
what I tell people to use
when they need to rename their
chromosomes. Also
straightforward:
> seqlevels(gr)
[1] "chr1" "chr2" "chr3" "chrM"
> seqlevels(gr) <-
sub("^chr", "",
seqlevels(gr))
> seqlevels(gr)
[1] "1" "2" "3" "M"
>
seqlevels(gr)[seqlevels(gr) ==
"M"] <- "MT"
> seqlevels(gr)
[1] "1" "2" "3" "MT"
Note that this is simpler to
use than
renameSeqlevels() which
always required you to build the
entire right value as a named
vector.
Sure, renaming is a common use case.
Not sure
how I forgot that.
As I wrote earlier in the thread,
renameSeqlevels should be changed so
as
not to require naming of the vector.
So the added-value of
keepSeqlevels() seems
to boil down to:
(a) it always uses force=TRUE
(b) it preserves the original
object and
returns the modified
object
If you want to restrict the use of
keepSeqlevels() to permuting and
dropping, you'll also have to disallow
renaming (in addition to
adding).
Then its name will more or less
reflect what
it does (if "keep" means
"permute" or "drop"). The final
result will
be something that does
less
than setSeqlevels() and that is not
necessarily easier to read: both
will set on the object whatever
vector of
seqlevels they are
supplied,
except that one will fail when doing so
would actually result in
adding
or renaming seqlevels.
So it looks like seqlevels<- is pretty
powerful.
Now I see that if I
scroll
down to the examples in the man page, I
find the
actual documentation.
It's
cool to learn that we can replace
seqlevels on
TxDb objects. My
argument
has always been that since these low-level
operations are so powerful,
it's
nice to have high-level operations to
clarify
the code. We have to
think
hard to know what the RHS will do in that
replacement. It's
probably one
of
the most complex replacement
operations; far
more complex than the
typical
one, including levels<- on factor. There's
nothing wrong with that;
it's
just complexity that we should be able
to hide.
Michael
H.
Michael
H.
Well, I wasn't even
aware of
that feature, so you've made
your
point about
the documentation ;)
Sounds
like a good compromise to me.
Thanks for understanding,
Michael
Martin
Michael
On Sat, May
18, 2013
at 6:00 PM, Martin Morgan
<mtmorgan at fhcrc.org <mailto:mtmorgan at fhcrc.org>
<mailto:mtmorgan at fhcrc.org
<mailto:mtmorgan at fhcrc.org>>
<mailto:mtmorgan at fhcrc.org
<mailto:mtmorgan at fhcrc.org>
<mailto:mtmorgan at fhcrc.org
<mailto:mtmorgan at fhcrc.org>>>
<mailto:mtmorgan at fhcrc.org
<mailto:mtmorgan at fhcrc.org>
<mailto:mtmorgan at fhcrc.org
<mailto:mtmorgan at fhcrc.org>> <mailto:
mtmorgan at fhcrc.org <mailto:mtmorgan at fhcrc.org>
<mailto:mtmorgan at fhcrc.org <mailto:mtmorgan at fhcrc.org>>
wrote:
On
05/18/2013
05:42 PM, Martin Morgan wrote:
Some of the
most common operations are
straight-forward, e.g.,
> gr =
GRanges(paste0("chr",
c(1:22,
"X", "Y")),
IRanges(1,
100))
>
seqlevels(gr) = sub("chr", "ch",
seqlevels(gr))
To get
a more
comprehensive example I should
have
followed my own
advice and
grabbed
from the
help page!
## Rename:
seqlevels(txdb) <- sub("chr", "",
seqlevels(txdb))
seqlevels(txdb)
seqlevels(txdb) <- paste0("CH",
seqlevels(txdb))
seqlevels(txdb)
seqlevels(txdb)[seqlevels(__**____**__txdb)
==
"CHM"] <- "M"
seqlevels(txdb)
--
Computational
Biology / Fred Hutchinson
Cancer
Research Center
1100
Fairview
Ave. N.
PO Box
19024
Seattle, WA 98109
Location: Arnold
Building M1 B861
Phone:
(206)
667-2793 <tel:%28206%29%20667-2793>
<tel:%28206%29%20667-2793>
<tel:%28206%29%20667-2793>
--
Dr. Martin
Morgan, PhD
Fred Hutchinson
Cancer
Research Center
1100 Fairview Ave. N.
PO Box 19024
Seattle, WA 98109
[[alternative HTML
version deleted]]
__________________________________**___________________
Bioc-devel at r-project.org <mailto:Bioc-devel at r-project.org>
<mailto:Bioc-devel at r-project.__org
<mailto:Bioc-devel at r-project.org>>
<mailto:Bioc-devel at r-project
<mailto:Bioc-devel at r-project>.
<mailto:Bioc-devel at r-project
<mailto:Bioc-devel at r-project>.>__*__*org<
Bioc-devel at r-project.org <mailto:Bioc-devel at r-project.org>
<mailto:Bioc-devel at r-project.__org
<mailto:Bioc-devel at r-project.org>>>
mailing list
https://stat.ethz.ch/mailman/_____**_listinfo/bioc-devel
<https://stat.ethz.ch/mailman/___**_listinfo/bioc-devel>
<https://stat.ethz.ch/mailman/___**_listinfo/bioc-devel
<https://stat.ethz.ch/mailman/_**_listinfo/bioc-devel>><
https://stat.ethz.ch/mailman/______listinfo/bioc-devel
<https://stat.ethz.ch/mailman/____listinfo/bioc-devel>
<https://stat.ethz.ch/mailman/____listinfo/bioc-devel
<https://stat.ethz.ch/mailman/__listinfo/bioc-devel>>>
<https://stat.ethz.ch/mailman/____**listinfo/bioc-devel
<https://stat.ethz.ch/mailman/__**listinfo/bioc-devel>
<https://stat.ethz.ch/mailman/__**listinfo/bioc-devel
<https://stat.ethz.ch/mailman/**listinfo/bioc-devel>><
https://stat.ethz.ch/mailman/____listinfo/bioc-devel
<https://stat.ethz.ch/mailman/__listinfo/bioc-devel>
<https://stat.ethz.ch/mailman/__listinfo/bioc-devel
<https://stat.ethz.ch/mailman/listinfo/bioc-devel>>>
--
Herv? Pag?s
Program in Computational
Biology
Division of Public Health
Sciences
Fred Hutchinson Cancer
Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024
E-mail: hpages at fhcrc.org
<mailto:hpages at fhcrc.org>
<mailto:hpages at fhcrc.org
<mailto:hpages at fhcrc.org>>
<mailto:hpages at fhcrc.org
<mailto:hpages at fhcrc.org>
<mailto:hpages at fhcrc.org
<mailto:hpages at fhcrc.org>>>
Phone: (206) 667-5791
<tel:%28206%29%20667-5791>
<tel:%28206%29%20667-5791>
<tel:%28206%29%20667-5791>
Fax: (206) 667-1319
<tel:%28206%29%20667-1319>
<tel:%28206%29%20667-1319>
<tel:%28206%29%20667-1319>
--
Herv? Pag?s
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024
E-mail: hpages at fhcrc.org
<mailto:hpages at fhcrc.org>
<mailto:hpages at fhcrc.org
<mailto:hpages at fhcrc.org>>
Phone: (206) 667-5791
<tel:%28206%29%20667-5791> <tel:%28206%29%20667-5791>
Fax: (206) 667-1319
<tel:%28206%29%20667-1319> <tel:%28206%29%20667-1319>
[[alternative HTML version
deleted]]
___________________________________________________
Bioc-devel at r-project.org <mailto:Bioc-devel at r-project.org>
<mailto:Bioc-devel at r-project.__org
<mailto:Bioc-devel at r-project.org>> mailing list
https://stat.ethz.ch/mailman/____listinfo/bioc-devel
<https://stat.ethz.ch/mailman/__listinfo/bioc-devel>
<https://stat.ethz.ch/mailman/__listinfo/bioc-devel
<https://stat.ethz.ch/mailman/listinfo/bioc-devel>>
[[alternative HTML version deleted]]
___________________________________________________
Bioc-devel at r-project.org <mailto:Bioc-devel at r-project.org>
<mailto:Bioc-devel at r-project.__org
<mailto:Bioc-devel at r-project.org>> mailing list
https://stat.ethz.ch/mailman/____listinfo/bioc-devel
<https://stat.ethz.ch/mailman/__listinfo/bioc-devel>
<https://stat.ethz.ch/mailman/__listinfo/bioc-devel
<https://stat.ethz.ch/mailman/listinfo/bioc-devel>>
[[alternative HTML version deleted]]
___________________________________________________
Bioc-devel at r-project.org <mailto:Bioc-devel at r-project.org>
<mailto:Bioc-devel at r-project.__org
<mailto:Bioc-devel at r-project.org>>
mailing list
https://stat.ethz.ch/mailman/____listinfo/bioc-devel
<https://stat.ethz.ch/mailman/__listinfo/bioc-devel>
<https://stat.ethz.ch/mailman/__listinfo/bioc-devel
<https://stat.ethz.ch/mailman/listinfo/bioc-devel>>
--
Herv? Pag?s
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024
E-mail: hpages at fhcrc.org <mailto:hpages at fhcrc.org>
<mailto:hpages at fhcrc.org <mailto:hpages at fhcrc.org>>
Phone: (206) 667-5791 <tel:%28206%29%20667-5791>
<tel:%28206%29%20667-5791>
Fax: (206) 667-1319 <tel:%28206%29%20667-1319>
<tel:%28206%29%20667-1319>
--
Herv? Pag?s
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024
E-mail: hpages at fhcrc.org <mailto:hpages at fhcrc.org>
Phone: (206) 667-5791 <tel:%28206%29%20667-5791>
Fax: (206) 667-1319 <tel:%28206%29%20667-1319>
Herv? Pag?s Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpages at fhcrc.org Phone: (206) 667-5791 Fax: (206) 667-1319
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/bioc-devel/attachments/20130918/28b8095a/attachment.pl>
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/bioc-devel/attachments/20130918/6d570d27/attachment.pl>