[Bioc-devel] deprecation of keepSeqlevels and renameSeqlevels
Michael,
On 05/20/2013 08:44 PM, Michael Lawrence wrote:
On Mon, May 20, 2013 at 4:13 PM, Herv? Pag?s <hpages at fhcrc.org
<mailto:hpages at fhcrc.org>> wrote:
Hi,
On 05/20/2013 03:15 PM, Michael Lawrence wrote:
On Mon, May 20, 2013 at 3:11 PM, Martin Morgan
<mtmorgan at fhcrc.org <mailto:mtmorgan at fhcrc.org>> wrote:
Hi Michael --
On 5/20/2013 5:56 AM, Michael Lawrence wrote:
While it's cool that seqlevels<- has become so flexible,
I still claim
that
rename/keep/drop would be a lot easier to read, because
they describe the
high-level operation, and there's no reason to deprecate
them. They're
also
easier to remember. For example, for dropping with
seqlevels<-, the user
needs
to remember that "force" is necessary to drop. Much
easier to just say
"dropSeqlevels(), please". And reimplementing
keepSeqlevels is still too
complicated. Is it such a maintenance burden to have
these simple
wrappers sit
on top of the low-level manipulators?
Another suggestion: renameSeqlevels should not require a
named vector (in
cases
where it is unnamed, require that it is parallel to the
existing
seqlevels, as
with seqlevels<-).
I didn't really indicate what drove my desire to see
keepSeqlevels
deprecated. keepSeqlevels, seqlevels<-, and isActiveSeq were
developed more
or less independently, and have different contracts. The
different
contracts are confusing to the user, as is creating a usable
help system
when there are multiple end points for what is a common
operation. The help
pages of each were inadequate in different ways. And because
the code was
developed independently, support for different objects was
inconsistent. So
actually, yes, the maintenance (and use) burden for the
previous state of
affairs was substantial.
On the other hand, I agree that keepSeqlevels is convenient
as a simple
wrapper around seqlevels<-, in the same way that setNames
and names<- are
both useful.
So we could iterate to
keepSeqlevels <-
function(x, value, ...)
{
seqlevels(x, force=TRUE) <- value
x
}
but I'd be less enthusiastic about maintaining the original
contract of
keepSeqlevels, where keepSeqlevels(gr1, gr2), would have
worked for two
GRanges objects.
Why would this be called keepSeqlevels() if, depending on how it's used,
it will either add, drop, rename, or permute the seqlevels?
Couldn't this be called setSeqlevels?
I thought that new2old was necessary for permuting.
The seqlevels setter has no 'new2old' arg.
You're right that adding should be disallowed for keepSeqlevels(). Adding seqlevels is not a common operation. The two common operations are: * Permuting, usually because the data were imported in non-canonical order (seqnameStyle was designed to address this, no?).
Permuting is straightforward with seqlevels<-: > seqlevels(gr) [1] "chr1" "chr2" "chr3" > seqlevels(gr) <- rev(seqlevels(gr)) > seqlevels(gr) [1] "chr3" "chr2" "chr1"
* Subsetting, either by keeping or dropping.
Also straightforward: > seqlevels(gr, force=TRUE) <- "chr2" > seqlevels(gr) [1] "chr2"
Two main reasons: taking a small slice of the data for prototyping, or removing problematic chromosomes (sex, circular). This goes back to bsapply and the exclude argument.
There are other important use cases:
- Dropping seqlevels that are *not* in use. This happens for example
when subsetting a BAM file with Rsamtools::filterBam to keep only
the alignments located on a given chromosome. The entire sequence
dictionary (in the header) is propagated to the resulting BAM file
so when you read the file with readGAlignments() you end up with a
bunch of useless seqlevels that you generally start by dropping.
You don't want to drop any alignment so force=FALSE is your friend.
- An even more common operation is renaming: 90% of the times that's
what I use seqlevels<- for, and that's what I tell people to use
when they need to rename their chromosomes. Also straightforward:
> seqlevels(gr)
[1] "chr1" "chr2" "chr3" "chrM"
> seqlevels(gr) <- sub("^chr", "", seqlevels(gr))
> seqlevels(gr)
[1] "1" "2" "3" "M"
> seqlevels(gr)[seqlevels(gr) == "M"] <- "MT"
> seqlevels(gr)
[1] "1" "2" "3" "MT"
Note that this is simpler to use than renameSeqlevels() which
always required you to build the entire right value as a named
vector.
So the added-value of keepSeqlevels() seems to boil down to:
(a) it always uses force=TRUE
(b) it preserves the original object and returns the modified object
If you want to restrict the use of keepSeqlevels() to permuting and
dropping, you'll also have to disallow renaming (in addition to adding).
Then its name will more or less reflect what it does (if "keep" means
"permute" or "drop"). The final result will be something that does less
than setSeqlevels() and that is not necessarily easier to read: both
will set on the object whatever vector of seqlevels they are supplied,
except that one will fail when doing so would actually result in adding
or renaming seqlevels.
H.
Michael
H.
Well, I wasn't even aware of that feature, so you've made your
point about
the documentation ;) Sounds like a good compromise to me.
Thanks for understanding,
Michael
Martin
Michael
On Sat, May 18, 2013 at 6:00 PM, Martin Morgan
<mtmorgan at fhcrc.org <mailto:mtmorgan at fhcrc.org>
<mailto:mtmorgan at fhcrc.org <mailto:mtmorgan at fhcrc.org>>>
wrote:
On 05/18/2013 05:42 PM, Martin Morgan wrote:
Some of the most common operations are
straight-forward, e.g.,
> gr = GRanges(paste0("chr", c(1:22,
"X", "Y")), IRanges(1,
100))
> seqlevels(gr) = sub("chr", "ch",
seqlevels(gr))
To get a more comprehensive example I should have
followed my own
advice and
grabbed from the help page!
## Rename:
seqlevels(txdb) <- sub("chr", "",
seqlevels(txdb))
seqlevels(txdb)
seqlevels(txdb) <- paste0("CH", seqlevels(txdb))
seqlevels(txdb)
seqlevels(txdb)[seqlevels(__**__txdb) ==
"CHM"] <- "M"
seqlevels(txdb)
--
Computational Biology / Fred Hutchinson Cancer
Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109
Location: Arnold Building M1 B861
Phone: (206) 667-2793 <tel:%28206%29%20667-2793>
<tel:%28206%29%20667-2793>
--
Dr. Martin Morgan, PhD
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109
[[alternative HTML version deleted]]
_________________________________________________
Bioc-devel at r-project.org <mailto:Bioc-devel at r-project.org>
mailing list
https://stat.ethz.ch/mailman/__listinfo/bioc-devel
<https://stat.ethz.ch/mailman/listinfo/bioc-devel>
--
Herv? Pag?s
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024
E-mail: hpages at fhcrc.org <mailto:hpages at fhcrc.org>
Phone: (206) 667-5791 <tel:%28206%29%20667-5791>
Fax: (206) 667-1319 <tel:%28206%29%20667-1319>
Herv? Pag?s Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpages at fhcrc.org Phone: (206) 667-5791 Fax: (206) 667-1319