Skip to content

The default behaviour of a missing entry in an environment

15 messages · Henrik Bengtsson, Trishank Karthik Kuppusamy, Gabor Grothendieck +3 more

#
Greetings everyone,

I have a question about the default behaviour of a missing entry in an environment.
Let us look at the following sequence of R statements:
[1] 1
NULL
I think I understand the logic for returning NULL to a missing entry in an environment,
but I do not think that it is fully justified.
I am sure that the R developers must have seen this argument before,
but I wish to call for attention to this problem again,
because I think that it is important to the default safety of the R programming language.

I suppose that one could argue that a good R programmer must be careful
not to use NULL in any of his environment entries,
but I think it is better to remove altogether this burden from the programmer
and simply raise a good, old-fashioned exception when the "$" operator
encounters a missing entry in an environment.
The biggest advantage is that it will easily eliminate a whole class of programming error.
The biggest disadvantage is that it is not backwards-compatible with old R programs.

I suppose a personal solution would be to simply redefine the "$" operator in my programs.
However, I really do think that the default safety of an R environment matters very much.
At the very least, it would be nice to be able to configure the safety of a new environment,
perhaps through a parameter.

-Trishank
#
On 11/13/2009 2:03 PM, Trishank Karthik Kuppusamy wrote:
You get the same behaviour when asking for a nonexistent element of a 
list, or a nonexistent attribute.   If you want stricter checking, don't 
use $, use get():

 > get("b", e)
Error in get("b", e) : object 'b' not found

or check first with exists():

 > exists("b", e)
[1] FALSE
But then it would be inconsistent with what it does in other situations.

Duncan Murdoch
#
Hello Duncan,

Thanks for your reply.
On Nov 13, 2009, at 2:27 PM, Duncan Murdoch wrote:

            
Yes, this is a solution. However, if we agree that "$" is (as it should be) syntactic sugar for get(), then why do we have different behaviour
for what should essentially be the same operations, albeit the former being easier to read and write than the latter?
Or is my premise mistaken and that is the whole point of having "$" and get() which are not identical?
I am afraid that I did not fully understand this point. What would the inconsistencies be in other situations?

-Trishank
#
On 11/13/2009 2:39 PM, Trishank Karthik Kuppusamy wrote:
Inconsistent with what happens for lists:

 > x <- list()
 > x$b
NULL

and attributes:

 > attr(x, "b")
NULL

It is already a little stricter than $ on a list:

 > x$longname <- 1
 > x$long
[1] 1
 > e$longname <- 1
 > e$long
NULL

so I supposed we could make it even more strict, but there is an awful 
lot of code out there that uses tests like

if (!is.null(x <- e$b)) { do something with x }

and all of that would break.

Duncan Murdoch
#
On Nov 13, 2009, at 2:47 PM, Duncan Murdoch wrote:

            
Ah, I see. I would claim that the same argument for default safety should apply here too.
I apologize that I cannot say that this is a good idea for reasons of safety and readability.
Unfortunately, such code does make it harder to detect programming errors.
I understand should the hands of R be tied by backwards-compatability; bad habits are hard to break.
Thanks for your time.

-Trishank
#
If you develop your own code you can add your own behavior by
"extending" the environment class.  I put "extending" in quotation
marks, because 'environment' is one of few classes you should *not*
extend from in the regular S3 (and S4?) sense, at least that was the
case a few years ago.  You can search the r-devel list about issues
when trying to do so.  One thing I remember is that it didn't work
well to save such objects.  Bla bla bla, there are workarounds for it
and the Object class in the R.oo package is one.  Here is how you can
add your protection for your own environment-like objects:

library("R.oo");
o <- Object();
o$foo
[1] NULL

setConstructorS3("PickyObject", function(...) {
  extend(Object(), "PickyObject");
});
setMethodS3("$", "PickyObject", function(this, name) {
  hasField(this, name) || throw("No such field: ", name);
  NextMethod("$");
});

po <- PickyObject();
po$foo

Error in list(`po$foo` = <environment>, ``$.PickyObject`(po, foo)` = <environmen
t>,  :

[2009-11-13 21:39:51] Exception: No such field: foo
  at throw(Exception(...))
  at throw.default("No such field: ", name)
  at throw("No such field: ", name)
  at `$.PickyObject`(po, foo)
  at po$foo

po$foo <- TRUE;
po$foo
[1] TRUE

If of any use.

/Henrik

On Fri, Nov 13, 2009 at 9:03 PM, Trishank Karthik Kuppusamy
<tk47 at nyu.edu> wrote:
#
Hello Henrik,
On Nov 13, 2009, at 3:42 PM, Henrik Bengtsson wrote:

            
I like this solution! (As well as the name of the picky object.)
If the environment class can be properly subclassed,
then everything should work in principle. Thanks for the tip.

-Trishank
#
On 11/13/2009 3:03 PM, Trishank Karthik Kuppusamy wrote:
I have mixed feelings about this.  If you follow the rule in your 
programs that setting x to NULL acts the same as not having x at all, 
then things are fine.  (Sometimes that's impossible, but it is what 
happens when you do the list assignment x$b <- NULL).  Use NA or some 
other special value to signal missing, and NULL will usually cause a 
visible error soon after if you mess up.
I think the list behaviour is a bad design, but it's been in the 
language forever, so we're stuck with it.  It's related to the bad 
design of function calls, where arguments can similarly be abbreviated.

Duncan Murdoch
#
Note that one should use inherits = FALSE argument on get and exists
to avoid returning objects from the parent, the parent of the parent,
etc.
On Fri, Nov 13, 2009 at 2:27 PM, Duncan Murdoch <murdoch at stats.uwo.ca> wrote:
#
On 13/11/2009 6:39 PM, Gabor Grothendieck wrote:
I disagree.  Normally you would want to receive those objects.  If you 
didn't, why didn't you set the parent of the environment to emptyenv() 
when you created it?

Duncan Murdoch
#
On Fri, Nov 13, 2009 at 7:21 PM, Duncan Murdoch <murdoch at stats.uwo.ca> wrote:
$ does not look into the parent so if you are trying to get those
semantics you must use inherits = FALSE.
[1] FALSE
[1] 3
#
On 13/11/2009 7:26 PM, Gabor Grothendieck wrote:
Whoops, yes.  That's another complaint about $ on environments.

Duncan Murdoch
2 days later
#
Hi,
On Fri, Nov 13, 2009 at 4:55 PM, Duncan Murdoch <murdoch at stats.uwo.ca> wrote:
That was an intentional choice. AFAIR neither $ nor [[ on
environments was not meant to mimic get, but rather to work on the
current environment as if it were a hash-like object. One can always
get the inherits semantics by simple programming, but under the model
you seem to be suggesting, preventing such behavior when you don't own
the environments in question is problematic.

  Robert

  
    
#
On 11/16/2009 12:07 PM, Robert Gentleman wrote:
Sure, I agree with how you did that; I'm not sure you had any choice at 
the time (didn't all environments have base as a parent then?).  Even 
now, you do want both inherits=TRUE and inherits=FALSE behaviour in 
different circumstances, and $ has to pick just one.  Probably my 
wording should have been "That's another gotcha about $ on environments."

Duncan Murdoch
#
Robert Gentleman wrote:
Yes. Also, AFAIR, emptyenv() came later. At the time you couldn't go 
deeper than baseenv().

And at any rate, some of the intended applications are dataframe-like, 
and I don't think you want to preclude use of with() and other forms of 
evaluation in the environment, as in

 > e <- evalq(environment(),airquality)
 > ls(e)
[1] "Day"     "Month"   "Ozone"   "Solar.R" "Temp"    "Wind"

 > evalq(logO3 <- log(Ozone), e) # works fine
 > lm(Ozone~Wind, data=e) # ditto
...

 > parent.env(e) <- emptyenv()
 > evalq(logO3 <- log(Ozone), e)
Error in eval(substitute(expr), envir, enclos) :
   could not find function "<-"
 > lm(Ozone~Wind, data=e)
Error in eval(expr, envir, enclos) : could not find function "list"