RFC: getifexists() {was [Bug 16065] "exists" ...}
For what it's worth, I think we would need a new function if the default behavior changes. Since we already have "get" and "mget", maybe "cget" for "conditional get"? "if get", "safe get", ... I like the idea of keeping the original "not found" behavior if the "if.not.found" arg is missing. However, it will be important to keep the number of arguments down. (I noticed that Martin's example lacks a "frame" argument.) I've heard rumors that there are plans to reduce the function call overhead, so perhaps this matters less now. I like Luke's idea of making exists/get/etc. .Primitives. I think that will be necessary in order to go fast. For my two cents, I also think get/assign should just be synonyms for the "[[" .Primitive. That could actually simplify things a bit. One might add "inherits=FALSE" and "if.not.found" arguments to the environment "[[" code, for example. Regards, Pete Pete ____________________ Peter M. Haverty, Ph.D. Genentech, Inc. phaverty at gene.com
On Thu, Jan 8, 2015 at 11:57 AM, <luke-tierney at uiowa.edu> wrote:
On Thu, 8 Jan 2015, Michael Lawrence wrote: If we do add an argument to get(), then it should be named consistently
with the ifnotfound argument of mget(). As mentioned, the possibility of a NULL value is problematic. One solution is a sentinel value that indicates an unbound value (like R_UnboundValue).
A null default is fine -- it's a default; if it isn't right for a particular case you can provide something else.
But another idea (and one pretty similar to John's) is to follow the
SYMSXP
design at the C level, where there is a structure that points to the name
and a value. We already have SYMSXPs at the R level of course (name
objects) but they do not provide access to the value, which is typically
R_UnboundValue. But this does not even need to be implemented with SYMSXP.
The design would allow something like:
binding <- getBinding("x", env)
if (hasValue(binding)) {
x <- value(binding) # throws an error if none
message(name(binding), "has value", x)
}
That I think it is a bit verbose but readable and could be made fast. And
I
think binding objects would be useful in other ways, as they are
essentially a "named object". For example, when iterating over an
environment.
This would need a lot more thought. Directly exposing the internals is definitely not something we want to do as we may well want to change that design. But there are lots of other corner issues that would have to be thought through before going forward, such as what happens if an rm occurs between obtaining a binding object and doing something with it. Serialization would also need thinking through. This doesn't seem like a worthwhile place to spend our efforts to me. Adding getIfExists, or .get, or get0, or whatever seems fine. Adding an argument to get() with missing giving current behavior may be OK too. Rewriting exists and get as .Primitives may be sufficient though. Best, luke Michael
On Thu, Jan 8, 2015 at 6:03 AM, John Nolan <jpnolan at american.edu> wrote: Adding an optional argument to get (and mget) like
val <- get(name, where, ..., value.if.not.found=NULL ) (*)
would be useful for many. HOWEVER, it is possible that there could be
some confusion here: (*) can give a NULL because either x exists and
has value NULL, or because x doesn't exist. If that matters, the user
would need to be careful about specifying a value.if.not.found that
cannot
be confused with a valid value of x.
To avoid this difficulty, perhaps we want both: have Martin's
getifexists(
)
return a list with two values:
- a boolean variable 'found' # = value returned by exists( )
- a variable 'value'
Then implement get( ) as:
get <- function(x,...,value.if.not.found ) {
if( missing(value.if.not.found) ) {
a <- getifexists(x,... )
if (!a$found) error("x not found")
} else {
a <- getifexists(x,...,value.if.not.found )
}
return(a$value)
}
Note that value.if.not.found has no default value in above.
It behaves exactly like current get does if value.if.not.found
is not specified, and if it is specified, it would be faster
in the common situation mentioned below:
if(exists(x,...)) { get(x,...) }
John
P.S. if you like dromedaries call it valueIfNotFound ...
..............................................................
John P. Nolan
Math/Stat Department
227 Gray Hall, American University
4400 Massachusetts Avenue, NW
Washington, DC 20016-8050
jpnolan at american.edu voice: 202.885.3140
web: academic2.american.edu/~jpnolan
..............................................................
-----"R-devel" <r-devel-bounces at r-project.org> wrote: -----
To: Martin Maechler <maechler at stat.math.ethz.ch>, R-devel at r-project.org
From: Duncan Murdoch
Sent by: "R-devel"
Date: 01/08/2015 06:39AM
Subject: Re: [Rd] RFC: getifexists() {was [Bug 16065] "exists" ...}
On 08/01/2015 4:16 AM, Martin Maechler wrote:
In November, we had a "bug repository conversation" with Peter Hagerty and myself: https://bugs.r-project.org/bugzilla/show_bug.cgi?id=16065 where the bug report title started with --->> "exists" is a bottleneck for dispatch and package loading, ... Peter proposed an extra simplified and henc faster version of exists(), and I commented
> --- Comment #2 from Martin Maechler <maechler at stat.math.ethz.ch>
---
> I'm very grateful that you've started exploring the bottlenecks
of loading
> packages with many S4 classes (and methods)...
> and I hope we can make real progress there rather sooner than
later.
> OTOH, your `summaryRprof()` in your vignette indicates that
exists() may use
> upto 10% of the time spent in library(reportingTools), and your
speedup
> proposals of exist() may go up to ca 30% which is good and well
worth
> considering, but still we can only expect 2-3% speedup for
package loading
> which unfortunately is not much.
> Still I agree it is worth looking at exists() as you did ... and
> consider providing a fast simplified version of it in addition to
current
> exists() [I think].
> BTW, as we talk about enhancements here, maybe consider a further
possibility:
> My subjective guess is that probably more than half of exists()
uses are of the
> form
> if(exists(name, where, .......)) {
> get(name, whare, ....)
> ..
> } else {
> NULL / error() / .. or similar
> }
> i.e. many exists() calls when returning TRUE are immediately
followed by the
> corresponding get() call which repeats quite a bit of the lookup
that exists()
> has done.
> Instead, I'd imagine a function, say getifexists(name, ...) that
does both at
> once in the "exists is TRUE" case but in a way we can easily keep
the if(.) ..
> else clause above. One already existing approach would use
> if(!inherits(tryCatch(xx <- get(name, where, ...),
error=function(e)e), "error")) {
> ... (( work with xx )) ...
> } else {
> NULL / error() / .. or similar
> }
> but of course our C implementation would be more efficient and
use more concise
> syntax {which should not look like error handling}. Follow ups
to this idea
> should really go to R-devel (the mailing list).
and now I do follow up here myself :
I found that 'getifexists()' is actually very simple to implement,
I have already tested it a bit, but not yet committed to R-devel
(the "R trunk" aka "master branch") because I'd like to get
public comments {RFC := Request For Comments}.
I don't like the name -- I'd prefer getIfExists. As Baath (2012, R Journal) pointed out, R names are very inconsistent in naming conventions, but lowerCamelCase is the most common choice. Second most common is period.separated, so an argument could be made for get.if.exists, but there's still the possibility of confusion with S3 methods, and users of other languages where "." is an operator find it a little strange. If you don't like lowerCamelCase (and a lot of people don't), then I think underscore_separated is the next best choice, so would use get_if_exists. Another possibility is to make no new name at all, and just add an optional parameter to get() (which if present acts as your value.if.not parameter, if not present keeps the current "object not found" error). Duncan Murdoch
My version of the help file {for both exists() and getifexists()}
rendered in text is
---------------------- help(getifexists) ------------------------------
-
Is an Object Defined?
Description:
Look for an R object of the given name and possibly return it
Usage:
exists(x, where = -1, envir = , frame, mode = "any",
inherits = TRUE)
getifexists(x, where = -1, envir = as.environment(where),
mode = "any", inherits = TRUE, value.if.not = NULL)
Arguments:
x: a variable name (given as a character string).
where: where to look for the object (see the details section); if
omitted, the function will search as if the name of the
object appeared unquoted in an expression.
envir: an alternative way to specify an environment to look in, but
it is usually simpler to just use the 'where' argument.
frame: a frame in the calling list. Equivalent to giving 'where' as
'sys.frame(frame)'.
mode: the mode or type of object sought: see the 'Details' section.
inherits: should the enclosing frames of the environment be searched?
value.if.not: the return value of 'getifexists(x, *)' when 'x' does not
exist.
Details:
The 'where' argument can specify the environment in which to look
for the object in any of several ways: as an integer (the position
in the 'search' list); as the character string name of an element
in the search list; or as an 'environment' (including using
'sys.frame' to access the currently active function calls). The
'envir' argument is an alternative way to specify an environment,
but is primarily there for back compatibility.
This function looks to see if the name 'x' has a value bound to it
in the specified environment. If 'inherits' is 'TRUE' and a value
is not found for 'x' in the specified environment, the enclosing
frames of the environment are searched until the name 'x' is
encountered. See 'environment' and the 'R Language Definition'
manual for details about the structure of environments and their
enclosures.
*Warning:* 'inherits = TRUE' is the default behaviour for R but
not for S.
If 'mode' is specified then only objects of that type are sought.
The 'mode' may specify one of the collections '"numeric"' and
'"function"' (see 'mode'): any member of the collection will
suffice. (This is true even if a member of a collection is
specified, so for example 'mode = "special"' will seek any type of
function.)
Value:
'exists():' Logical, true if and only if an object of the correct
name and mode is found.
'getifexists():' The object-as from 'get(x, *)'- if 'exists(x, *)'
is true, otherwise 'value.if.not'.
Note:
With 'getifexists()', instead of the easy to read but somewhat
inefficient
if (exists(myVarName, envir = myEnvir)) {
r <- get(myVarName, envir = myEnvir)
## ... deal with r ...
}
you now can use the more efficient (and slightly harder to read)
if (!is.null(r <- getifexists(myVarName, envir = myEnvir))) {
## ... deal with r ...
}
References:
Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) _The New S
Language_. Wadsworth & Brooks/Cole.
See Also:
'get'. For quite a different kind of "existence" checking, namely
if function arguments were specified, 'missing'; and for yet a
different kind, namely if a file exists, 'file.exists'.
Examples:
## Define a substitute function if necessary:
if(!exists("some.fun", mode = "function"))
some.fun <- function(x) { cat("some.fun(x)\n"); x }
search()
exists("ls", 2) # true even though ls is in pos = 3
exists("ls", 2, inherits = FALSE) # false
## These are true (in most circumstances):
identical(ls, getifexists("ls"))
identical(NULL, getifexists(".foo.bar.")) # default value.if.not =
NULL(!)
----------------- end[ help(getifexists) ]
-----------------------------
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel ______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[[alternative HTML version deleted]]
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
-- Luke Tierney Ralph E. Wareham Professor of Mathematical Sciences University of Iowa Phone: 319-335-3386 Department of Statistics and Fax: 319-335-3017 Actuarial Science 241 Schaeffer Hall email: luke-tierney at uiowa.edu Iowa City, IA 52242 WWW: http://www.stat.uiowa.edu
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel