Dear list,
The function subset("AnnDbBimap", ...) is returning an error whenever
the resulting subset should be the empty set.
Example:
library(mouse4302.db)
subset(mouse4302SYMBOL,
Rkeys="foo")
returns:
Error in .checkKeys(value, Rkeys(x), x at ifnotfound) :
value for "foo" not found
This is true for either Lkeys or Rkeys.
The man page does say "
Lkeys
The new Lkeys (must be a subset of the current Lkeys).
Rkeys
The new Rkeys (must be a subset of the current Rkeys).
"
but this is limiting the use for the function and encourages the use of
the environment-like API, although marked as provided "for backward
compatibility".
Wouldn't it be good to either have:
- have subset return without an error
- have at least functions such as hasLkey and hasRkey ?
L.
[Bioc-devel] Problem with subset("AnnDbBimap", ...) ?
6 messages · Hervé Pagès, Laurent Gautier
Hi Laurent,
All this is consistent. One important part of the contract for
subset(), Lkeys<-, Rkeys<-, [ is that they behave like endomorphisms
i.e. they return an instance of the same class as the original
object.
mouse4302SYMBOL is an AnnDbBimap object so any of the functions above
must return a (valid) AnnDbBimap object.
The keys of a valid AnnDbBimap object cannot be anything. For example
if 'x' is a mapping from probeset ids to entrez ids, the left keys
must be valid probeset ids (for this chip) and the right keys must be
valid entrez ids.
What kind of AnnDbBimap object would be returned by
subset(mouse4302SYMBOL, Rkeys="foo") ? Or equivalently, what
kind of AnnDbBimap object would become 'x' after
x <- mouse4302SYMBOL; Rkeys(x) <- "foo".
It would be an AnnDbBimap object with junk keys but valid
AnnDbBimap objects don't support this.
I added these functions when I worked on faking the environment
interface for SQLite-based annotations. Note that they are not
part of the environment-like API. They are low-level
functions that I first wrote and used internally so it would
be easier for me to build the environment-like API (mget, get,
ls, etc...). My first intention was not to export them but then
I realized they had their own added-value so I exported and
documented them. Since they are not part of the environment-like
API, I had no constraint of backward compatibility which was nice
because then I could decide to make them do what I considered the
right thing. OTOH I had to make the environment-like API ackward
compatible and that's why you can use junk keys in mget (granted
that you specify ifnotfound=NA):
> mget("foo", mouse4302SYMBOL, ifnotfound=NA)
$foo
[1] NA
mget() returns a list, not an AnnDbBimap instance (it's not an
endomorphism) so it can return a list with anything in it without
breaking any rule.
We could add the hasLkey() and hasRkey() but since this would be
equivalent to "foo" %in% Rkeys(x), I'm not sure they would have
a lot of added value though. The performance of "foo" %in% Rkeys(x)
should be good enough, especially the 2nd time you do this on 'x'
because Rkeys() (like Lkeys() and other low-level functions in
AnnotationDbi) cache their result (in a hidden environment).
H.
Laurent Gautier wrote:
Dear list,
The function subset("AnnDbBimap", ...) is returning an error whenever
the resulting subset should be the empty set.
Example:
library(mouse4302.db)
subset(mouse4302SYMBOL,
Rkeys="foo")
returns:
Error in .checkKeys(value, Rkeys(x), x at ifnotfound) :
value for "foo" not found
This is true for either Lkeys or Rkeys.
The man page does say "
Lkeys
The new Lkeys (must be a subset of the current Lkeys).
Rkeys
The new Rkeys (must be a subset of the current Rkeys).
"
but this is limiting the use for the function and encourages the use of
the environment-like API, although marked as provided "for backward
compatibility".
Wouldn't it be good to either have:
- have subset return without an error
- have at least functions such as hasLkey and hasRkey ?
L.
_______________________________________________ Bioc-devel at stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
Herv? Pag?s Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M2-B876 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpages at fhcrc.org Phone: (206) 667-5791 Fax: (206) 667-1319
Hi Herv?,
Thanks for for detailled answer.
I understand that the choice was made to forbid the addition of
[L|R]keys to a Bimap (which you call "junk keys"). I suspect that
implementation concerns weighted in the decision, but that's for a
separate thread.
Unless the message is that the environment-like API is not around only
for back-compatibility reasons, the main (consistency) problem I am
having with subset("AnnDbBimap", ...) is still present after reading
your explanations, although it might be coming from the use of the
function subset in R's base working on data.frame.
I'll illustrate it with an example:
subset(CO2, Treatment == "unfair")
[1] Plant Type Treatment conc uptake
<0 rows> (or 0-length row.names)
The function subset("data.frame", ...) is then no less endomorphic than
subset("AnnDbBimap", ...), yet it returns an empty data.frame rather
than raise an error such as 'no "unfair" Treatment'.
In its current instance, the function subset("AnnDbBimap", ...) might be
pushing complexity toward the user for use-cases such as:
"I have a list of arbitrary gene symbols, and I'd like to get the
probes/probesets associated with those".
The current (as of today) implementation for subset("Bimap", ...) is:
setMethod("subset", "Bimap",
function(x, Lkeys=NULL, Rkeys=NULL)
{
Lkeys(x) <- Lkeys
Rkeys(x) <- Rkeys
x
}
)
while it could be like:
setMethod("subset", "Bimap",
function(x, Lkeys=NULL, Rkeys=NULL, quiet=FALSE)
{
if (quiet) {
Lkeys(x) <- Lkeys[Lkeys %in% Lkeys(x)]
Rkeys(x) <- Rkeys[Rkeys %in% Rkeys(x)]
} else {
Lkeys(x) <- Lkeys
Rkeys(x) <- Rkeys
}
x
}
)
Just a thought,
L.
Herv? Pag?s wrote:
Hi Laurent, All this is consistent. One important part of the contract for subset(), Lkeys<-, Rkeys<-, [ is that they behave like endomorphisms i.e. they return an instance of the same class as the original object. mouse4302SYMBOL is an AnnDbBimap object so any of the functions above must return a (valid) AnnDbBimap object. The keys of a valid AnnDbBimap object cannot be anything. For example if 'x' is a mapping from probeset ids to entrez ids, the left keys must be valid probeset ids (for this chip) and the right keys must be valid entrez ids. What kind of AnnDbBimap object would be returned by subset(mouse4302SYMBOL, Rkeys="foo") ? Or equivalently, what kind of AnnDbBimap object would become 'x' after x <- mouse4302SYMBOL; Rkeys(x) <- "foo". It would be an AnnDbBimap object with junk keys but valid AnnDbBimap objects don't support this. I added these functions when I worked on faking the environment interface for SQLite-based annotations. Note that they are not part of the environment-like API. They are low-level functions that I first wrote and used internally so it would be easier for me to build the environment-like API (mget, get, ls, etc...). My first intention was not to export them but then I realized they had their own added-value so I exported and documented them. Since they are not part of the environment-like API, I had no constraint of backward compatibility which was nice because then I could decide to make them do what I considered the right thing. OTOH I had to make the environment-like API ackward compatible and that's why you can use junk keys in mget (granted that you specify ifnotfound=NA):
> mget("foo", mouse4302SYMBOL, ifnotfound=NA)
$foo [1] NA mget() returns a list, not an AnnDbBimap instance (it's not an endomorphism) so it can return a list with anything in it without breaking any rule. We could add the hasLkey() and hasRkey() but since this would be equivalent to "foo" %in% Rkeys(x), I'm not sure they would have a lot of added value though. The performance of "foo" %in% Rkeys(x) should be good enough, especially the 2nd time you do this on 'x' because Rkeys() (like Lkeys() and other low-level functions in AnnotationDbi) cache their result (in a hidden environment). H. Laurent Gautier wrote:
Dear list,
The function subset("AnnDbBimap", ...) is returning an error whenever
the resulting subset should be the empty set.
Example:
library(mouse4302.db)
subset(mouse4302SYMBOL,
Rkeys="foo")
returns:
Error in .checkKeys(value, Rkeys(x), x at ifnotfound) :
value for "foo" not found
This is true for either Lkeys or Rkeys.
The man page does say "
Lkeys
The new Lkeys (must be a subset of the current Lkeys).
Rkeys
The new Rkeys (must be a subset of the current Rkeys).
"
but this is limiting the use for the function and encourages the use
of the environment-like API, although marked as provided "for backward
compatibility".
Wouldn't it be good to either have:
- have subset return without an error
- have at least functions such as hasLkey and hasRkey ?
L.
_______________________________________________ Bioc-devel at stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
2 days later
Hi Laurent,
OK I have to admit that I added "subset" as a convenient way
of reducing the sets of Lkeys and Rkeys in a single call.
But I didn't really take the time to check what "subset" is
doing exactly on data frames and I agree that it should
behave as consistently as possible across data structures.
I'll add the 'quiet' argument to subset("AnnDbBimap", ...).
You suggest to use 'quiet=FALSE' as default so by default
it won't behave like for data frames but I guess at this point
we don't want to make a change that could potentially turn into
less safe code written by people who expect "subset" to check
the validity of their keys.
I'll let you know when the change is ready.
Thanks for the feedback!
H.
Laurent Gautier wrote:
Hi Herv?,
Thanks for for detailled answer.
I understand that the choice was made to forbid the addition of
[L|R]keys to a Bimap (which you call "junk keys"). I suspect that
implementation concerns weighted in the decision, but that's for a
separate thread.
Unless the message is that the environment-like API is not around only
for back-compatibility reasons, the main (consistency) problem I am
having with subset("AnnDbBimap", ...) is still present after reading
your explanations, although it might be coming from the use of the
function subset in R's base working on data.frame.
I'll illustrate it with an example:
subset(CO2, Treatment == "unfair")
[1] Plant Type Treatment conc uptake
<0 rows> (or 0-length row.names)
The function subset("data.frame", ...) is then no less endomorphic than
subset("AnnDbBimap", ...), yet it returns an empty data.frame rather
than raise an error such as 'no "unfair" Treatment'.
In its current instance, the function subset("AnnDbBimap", ...) might be
pushing complexity toward the user for use-cases such as:
"I have a list of arbitrary gene symbols, and I'd like to get the
probes/probesets associated with those".
The current (as of today) implementation for subset("Bimap", ...) is:
setMethod("subset", "Bimap",
function(x, Lkeys=NULL, Rkeys=NULL)
{
Lkeys(x) <- Lkeys
Rkeys(x) <- Rkeys
x
}
)
while it could be like:
setMethod("subset", "Bimap",
function(x, Lkeys=NULL, Rkeys=NULL, quiet=FALSE)
{
if (quiet) {
Lkeys(x) <- Lkeys[Lkeys %in% Lkeys(x)]
Rkeys(x) <- Rkeys[Rkeys %in% Rkeys(x)]
} else {
Lkeys(x) <- Lkeys
Rkeys(x) <- Rkeys
}
x
}
)
Just a thought,
L.
Herv? Pag?s wrote:
Hi Laurent, All this is consistent. One important part of the contract for subset(), Lkeys<-, Rkeys<-, [ is that they behave like endomorphisms i.e. they return an instance of the same class as the original object. mouse4302SYMBOL is an AnnDbBimap object so any of the functions above must return a (valid) AnnDbBimap object. The keys of a valid AnnDbBimap object cannot be anything. For example if 'x' is a mapping from probeset ids to entrez ids, the left keys must be valid probeset ids (for this chip) and the right keys must be valid entrez ids. What kind of AnnDbBimap object would be returned by subset(mouse4302SYMBOL, Rkeys="foo") ? Or equivalently, what kind of AnnDbBimap object would become 'x' after x <- mouse4302SYMBOL; Rkeys(x) <- "foo". It would be an AnnDbBimap object with junk keys but valid AnnDbBimap objects don't support this. I added these functions when I worked on faking the environment interface for SQLite-based annotations. Note that they are not part of the environment-like API. They are low-level functions that I first wrote and used internally so it would be easier for me to build the environment-like API (mget, get, ls, etc...). My first intention was not to export them but then I realized they had their own added-value so I exported and documented them. Since they are not part of the environment-like API, I had no constraint of backward compatibility which was nice because then I could decide to make them do what I considered the right thing. OTOH I had to make the environment-like API ackward compatible and that's why you can use junk keys in mget (granted that you specify ifnotfound=NA):
> mget("foo", mouse4302SYMBOL, ifnotfound=NA)
$foo [1] NA mget() returns a list, not an AnnDbBimap instance (it's not an endomorphism) so it can return a list with anything in it without breaking any rule. We could add the hasLkey() and hasRkey() but since this would be equivalent to "foo" %in% Rkeys(x), I'm not sure they would have a lot of added value though. The performance of "foo" %in% Rkeys(x) should be good enough, especially the 2nd time you do this on 'x' because Rkeys() (like Lkeys() and other low-level functions in AnnotationDbi) cache their result (in a hidden environment). H. Laurent Gautier wrote:
Dear list,
The function subset("AnnDbBimap", ...) is returning an error whenever
the resulting subset should be the empty set.
Example:
library(mouse4302.db)
subset(mouse4302SYMBOL,
Rkeys="foo")
returns:
Error in .checkKeys(value, Rkeys(x), x at ifnotfound) :
value for "foo" not found
This is true for either Lkeys or Rkeys.
The man page does say "
Lkeys
The new Lkeys (must be a subset of the current Lkeys).
Rkeys
The new Rkeys (must be a subset of the current Rkeys).
"
but this is limiting the use for the function and encourages the use
of the environment-like API, although marked as provided "for
backward compatibility".
Wouldn't it be good to either have:
- have subset return without an error
- have at least functions such as hasLkey and hasRkey ?
L.
_______________________________________________ Bioc-devel at stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
Herv? Pag?s Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M2-B876 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpages at fhcrc.org Phone: (206) 667-5791 Fax: (206) 667-1319
Hi all,
I added the 'drop.invalid.keys' arg to the "subset" methods for Bimap
and AnnDbBimap objects in AnnotationDbi (devel and release):
> library(hgu95av2.db)
> subset(hgu95av2SYMBOL, Rkeys="foo")
Error in .checkKeys(value, Rkeys(x), x at ifnotfound) :
value for "foo" not found
> mymap <- subset(hgu95av2SYMBOL, Rkeys="foo", drop.invalid.keys=TRUE)
> mymap
SYMBOL submap for chip hgu95av2 (object of class "AnnDbBimap")
> summary(mymap)
SYMBOL submap for chip hgu95av2 (object of class "AnnDbBimap")
|
| Lkeyname: probe_id (Ltablename: probes)
| Lkeys: "1000_at", "1001_at", ... (total=12625/mapped=0)
|
| Rkeyname: symbol (Rtablename: gene_info)
| Rkeys:
|
| direction: L --> R
(Note that an argument named 'quiet' is generally not expected to
modify what a function is doing, only what the function is reporting,
hence the choice of 'drop.invalid.keys' for this arg.)
The new version of AnnotationDbi will propagate to the public repos
in about 24 hours.
Cheers,
H.
Quoting Herv? Pag?s <hpages at fhcrc.org>:
Hi Laurent,
OK I have to admit that I added "subset" as a convenient way
of reducing the sets of Lkeys and Rkeys in a single call.
But I didn't really take the time to check what "subset" is
doing exactly on data frames and I agree that it should
behave as consistently as possible across data structures.
I'll add the 'quiet' argument to subset("AnnDbBimap", ...).
You suggest to use 'quiet=FALSE' as default so by default
it won't behave like for data frames but I guess at this point
we don't want to make a change that could potentially turn into
less safe code written by people who expect "subset" to check
the validity of their keys.
I'll let you know when the change is ready.
Thanks for the feedback!
H.
Laurent Gautier wrote:
Hi Herv?,
Thanks for for detailled answer.
I understand that the choice was made to forbid the addition of
[L|R]keys to a Bimap (which you call "junk keys"). I suspect that
implementation concerns weighted in the decision, but that's for a
separate thread.
Unless the message is that the environment-like API is not around
only for back-compatibility reasons, the main (consistency) problem
I am having with subset("AnnDbBimap", ...) is still present after
reading your explanations, although it might be coming from the
use of the function subset in R's base working on data.frame.
I'll illustrate it with an example:
subset(CO2, Treatment == "unfair")
[1] Plant Type Treatment conc uptake
<0 rows> (or 0-length row.names)
The function subset("data.frame", ...) is then no less endomorphic than
subset("AnnDbBimap", ...), yet it returns an empty data.frame
rather than raise an error such as 'no "unfair" Treatment'.
In its current instance, the function subset("AnnDbBimap", ...)
might be pushing complexity toward the user for use-cases such as:
"I have a list of arbitrary gene symbols, and I'd like to get the
probes/probesets associated with those".
The current (as of today) implementation for subset("Bimap", ...) is:
setMethod("subset", "Bimap",
function(x, Lkeys=NULL, Rkeys=NULL)
{
Lkeys(x) <- Lkeys
Rkeys(x) <- Rkeys
x
}
)
while it could be like:
setMethod("subset", "Bimap",
function(x, Lkeys=NULL, Rkeys=NULL, quiet=FALSE)
{
if (quiet) {
Lkeys(x) <- Lkeys[Lkeys %in% Lkeys(x)]
Rkeys(x) <- Rkeys[Rkeys %in% Rkeys(x)]
} else {
Lkeys(x) <- Lkeys
Rkeys(x) <- Rkeys
}
x
}
)
Just a thought,
L.
Herv? Pag?s wrote:
Hi Laurent, All this is consistent. One important part of the contract for subset(), Lkeys<-, Rkeys<-, [ is that they behave like endomorphisms i.e. they return an instance of the same class as the original object. mouse4302SYMBOL is an AnnDbBimap object so any of the functions above must return a (valid) AnnDbBimap object. The keys of a valid AnnDbBimap object cannot be anything. For example if 'x' is a mapping from probeset ids to entrez ids, the left keys must be valid probeset ids (for this chip) and the right keys must be valid entrez ids. What kind of AnnDbBimap object would be returned by subset(mouse4302SYMBOL, Rkeys="foo") ? Or equivalently, what kind of AnnDbBimap object would become 'x' after x <- mouse4302SYMBOL; Rkeys(x) <- "foo". It would be an AnnDbBimap object with junk keys but valid AnnDbBimap objects don't support this. I added these functions when I worked on faking the environment interface for SQLite-based annotations. Note that they are not part of the environment-like API. They are low-level functions that I first wrote and used internally so it would be easier for me to build the environment-like API (mget, get, ls, etc...). My first intention was not to export them but then I realized they had their own added-value so I exported and documented them. Since they are not part of the environment-like API, I had no constraint of backward compatibility which was nice because then I could decide to make them do what I considered the right thing. OTOH I had to make the environment-like API ackward compatible and that's why you can use junk keys in mget (granted that you specify ifnotfound=NA):
> mget("foo", mouse4302SYMBOL, ifnotfound=NA)
$foo [1] NA mget() returns a list, not an AnnDbBimap instance (it's not an endomorphism) so it can return a list with anything in it without breaking any rule. We could add the hasLkey() and hasRkey() but since this would be equivalent to "foo" %in% Rkeys(x), I'm not sure they would have a lot of added value though. The performance of "foo" %in% Rkeys(x) should be good enough, especially the 2nd time you do this on 'x' because Rkeys() (like Lkeys() and other low-level functions in AnnotationDbi) cache their result (in a hidden environment). H. Laurent Gautier wrote:
Dear list,
The function subset("AnnDbBimap", ...) is returning an error
whenever the resulting subset should be the empty set.
Example:
library(mouse4302.db)
subset(mouse4302SYMBOL,
Rkeys="foo")
returns:
Error in .checkKeys(value, Rkeys(x), x at ifnotfound) :
value for "foo" not found
This is true for either Lkeys or Rkeys.
The man page does say "
Lkeys
The new Lkeys (must be a subset of the current Lkeys).
Rkeys
The new Rkeys (must be a subset of the current Rkeys).
"
but this is limiting the use for the function and encourages the
use of the environment-like API, although marked as provided "for
backward compatibility".
Wouldn't it be good to either have:
- have subset return without an error
- have at least functions such as hasLkey and hasRkey ?
L.
_______________________________________________ Bioc-devel at stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
-- Herv? Pag?s Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M2-B876 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpages at fhcrc.org Phone: (206) 667-5791 Fax: (206) 667-1319
_______________________________________________ Bioc-devel at stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
hpages at fhcrc.org wrote:
Hi all, I added the 'drop.invalid.keys' arg to the "subset" methods for Bimap and AnnDbBimap objects in AnnotationDbi (devel and release):
> library(hgu95av2.db) > subset(hgu95av2SYMBOL, Rkeys="foo")
Error in .checkKeys(value, Rkeys(x), x at ifnotfound) :
value for "foo" not found
> mymap <- subset(hgu95av2SYMBOL, Rkeys="foo", drop.invalid.keys=TRUE) > mymap
SYMBOL submap for chip hgu95av2 (object of class "AnnDbBimap")
> summary(mymap)
SYMBOL submap for chip hgu95av2 (object of class "AnnDbBimap") | | Lkeyname: probe_id (Ltablename: probes) | Lkeys: "1000_at", "1001_at", ... (total=12625/mapped=0) | | Rkeyname: symbol (Rtablename: gene_info) | Rkeys: | | direction: L --> R (Note that an argument named 'quiet' is generally not expected to modify what a function is doing, only what the function is reporting, hence the choice of 'drop.invalid.keys' for this arg.)
Indeed. The name for the parameter was not very well chosen... not chosen at all for the long term in fact, just to illustrate with an example what subset() could do.
The new version of AnnotationDbi will propagate to the public repos in about 24 hours.
Thanks for the quick response, L.
Cheers, H. Quoting Herv? Pag?s <hpages at fhcrc.org>:
Hi Laurent,
OK I have to admit that I added "subset" as a convenient way
of reducing the sets of Lkeys and Rkeys in a single call.
But I didn't really take the time to check what "subset" is
doing exactly on data frames and I agree that it should
behave as consistently as possible across data structures.
I'll add the 'quiet' argument to subset("AnnDbBimap", ...).
You suggest to use 'quiet=FALSE' as default so by default
it won't behave like for data frames but I guess at this point
we don't want to make a change that could potentially turn into
less safe code written by people who expect "subset" to check
the validity of their keys.
I'll let you know when the change is ready.
Thanks for the feedback!
H.
Laurent Gautier wrote:
Hi Herv?,
Thanks for for detailled answer.
I understand that the choice was made to forbid the addition of
[L|R]keys to a Bimap (which you call "junk keys"). I suspect that
implementation concerns weighted in the decision, but that's for a
separate thread.
Unless the message is that the environment-like API is not around
only for back-compatibility reasons, the main (consistency) problem
I am having with subset("AnnDbBimap", ...) is still present after
reading your explanations, although it might be coming from the use
of the function subset in R's base working on data.frame.
I'll illustrate it with an example:
subset(CO2, Treatment == "unfair")
[1] Plant Type Treatment conc uptake
<0 rows> (or 0-length row.names)
The function subset("data.frame", ...) is then no less endomorphic than
subset("AnnDbBimap", ...), yet it returns an empty data.frame rather
than raise an error such as 'no "unfair" Treatment'.
In its current instance, the function subset("AnnDbBimap", ...)
might be pushing complexity toward the user for use-cases such as:
"I have a list of arbitrary gene symbols, and I'd like to get the
probes/probesets associated with those".
The current (as of today) implementation for subset("Bimap", ...) is:
setMethod("subset", "Bimap",
function(x, Lkeys=NULL, Rkeys=NULL)
{
Lkeys(x) <- Lkeys
Rkeys(x) <- Rkeys
x
}
)
while it could be like:
setMethod("subset", "Bimap",
function(x, Lkeys=NULL, Rkeys=NULL, quiet=FALSE)
{
if (quiet) {
Lkeys(x) <- Lkeys[Lkeys %in% Lkeys(x)]
Rkeys(x) <- Rkeys[Rkeys %in% Rkeys(x)]
} else {
Lkeys(x) <- Lkeys
Rkeys(x) <- Rkeys
}
x
}
)
Just a thought,
L.
Herv? Pag?s wrote:
Hi Laurent, All this is consistent. One important part of the contract for subset(), Lkeys<-, Rkeys<-, [ is that they behave like endomorphisms i.e. they return an instance of the same class as the original object. mouse4302SYMBOL is an AnnDbBimap object so any of the functions above must return a (valid) AnnDbBimap object. The keys of a valid AnnDbBimap object cannot be anything. For example if 'x' is a mapping from probeset ids to entrez ids, the left keys must be valid probeset ids (for this chip) and the right keys must be valid entrez ids. What kind of AnnDbBimap object would be returned by subset(mouse4302SYMBOL, Rkeys="foo") ? Or equivalently, what kind of AnnDbBimap object would become 'x' after x <- mouse4302SYMBOL; Rkeys(x) <- "foo". It would be an AnnDbBimap object with junk keys but valid AnnDbBimap objects don't support this. I added these functions when I worked on faking the environment interface for SQLite-based annotations. Note that they are not part of the environment-like API. They are low-level functions that I first wrote and used internally so it would be easier for me to build the environment-like API (mget, get, ls, etc...). My first intention was not to export them but then I realized they had their own added-value so I exported and documented them. Since they are not part of the environment-like API, I had no constraint of backward compatibility which was nice because then I could decide to make them do what I considered the right thing. OTOH I had to make the environment-like API ackward compatible and that's why you can use junk keys in mget (granted that you specify ifnotfound=NA):
> mget("foo", mouse4302SYMBOL, ifnotfound=NA)
$foo [1] NA mget() returns a list, not an AnnDbBimap instance (it's not an endomorphism) so it can return a list with anything in it without breaking any rule. We could add the hasLkey() and hasRkey() but since this would be equivalent to "foo" %in% Rkeys(x), I'm not sure they would have a lot of added value though. The performance of "foo" %in% Rkeys(x) should be good enough, especially the 2nd time you do this on 'x' because Rkeys() (like Lkeys() and other low-level functions in AnnotationDbi) cache their result (in a hidden environment). H. Laurent Gautier wrote:
Dear list,
The function subset("AnnDbBimap", ...) is returning an error
whenever the resulting subset should be the empty set.
Example:
library(mouse4302.db)
subset(mouse4302SYMBOL,
Rkeys="foo")
returns:
Error in .checkKeys(value, Rkeys(x), x at ifnotfound) :
value for "foo" not found
This is true for either Lkeys or Rkeys.
The man page does say "
Lkeys
The new Lkeys (must be a subset of the current Lkeys).
Rkeys
The new Rkeys (must be a subset of the current Rkeys).
"
but this is limiting the use for the function and encourages the
use of the environment-like API, although marked as provided "for
backward compatibility".
Wouldn't it be good to either have:
- have subset return without an error
- have at least functions such as hasLkey and hasRkey ?
L.
_______________________________________________ Bioc-devel at stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
-- Herv? Pag?s Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M2-B876 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpages at fhcrc.org Phone: (206) 667-5791 Fax: (206) 667-1319
_______________________________________________ Bioc-devel at stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel