The default behaviour of a missing entry in an environment

Greetings everyone,

I have a question about the default behaviour of a missing entry in an environment.
Let us look at the following sequence of R statements:
e <- new.env()
e$a <- 1
e$a
[1] 1
e$b
NULL

I think I understand the logic for returning NULL to a missing entry in an environment,
but I do not think that it is fully justified.
I am sure that the R developers must have seen this argument before,
but I wish to call for attention to this problem again,
because I think that it is important to the default safety of the R programming language.

I suppose that one could argue that a good R programmer must be careful
not to use NULL in any of his environment entries,
but I think it is better to remove altogether this burden from the programmer
and simply raise a good, old-fashioned exception when the "$" operator
encounters a missing entry in an environment.
The biggest advantage is that it will easily eliminate a whole class of programming error.
The biggest disadvantage is that it is not backwards-compatible with old R programs.

I suppose a personal solution would be to simply redefine the "$" operator in my programs.
However, I really do think that the default safety of an R environment matters very much.
At the very least, it would be nice to be able to configure the safety of a new environment,
perhaps through a parameter.

-Trishank
Greetings everyone,

I have a question about the default behaviour of a missing entry in an environment.
Let us look at the following sequence of R statements:

e <- new.env()
e$a <- 1
e$a
[1] 1
e$b
NULL

I think I understand the logic for returning NULL to a missing entry in an environment,
but I do not think that it is fully justified.
I am sure that the R developers must have seen this argument before,
but I wish to call for attention to this problem again,
because I think that it is important to the default safety of the R programming language.
You get the same behaviour when asking for a nonexistent element of a 
list, or a nonexistent attribute.   If you want stricter checking, don't 
use $, use get():

 > get("b", e)
Error in get("b", e) : object 'b' not found

or check first with exists():

 > exists("b", e)
[1] FALSE
I suppose that one could argue that a good R programmer must be careful
not to use NULL in any of his environment entries,
but I think it is better to remove altogether this burden from the programmer
and simply raise a good, old-fashioned exception when the "$" operator
encounters a missing entry in an environment.
But then it would be inconsistent with what it does in other situations.

Duncan Murdoch
The biggest advantage is that it will easily eliminate a whole class of programming error.
The biggest disadvantage is that it is not backwards-compatible with old R programs.

I suppose a personal solution would be to simply redefine the "$" operator in my programs.
However, I really do think that the default safety of an R environment matters very much.
At the very least, it would be nice to be able to configure the safety of a new environment,
perhaps through a parameter.

-Trishank
______________________________________________
R-devel at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Hello Duncan,

Thanks for your reply.

You get the same behaviour when asking for a nonexistent element of a list, or a nonexistent attribute.   If you want stricter checking, don't use $, use get():

get("b", e)
Error in get("b", e) : object 'b' not found
Yes, this is a solution. However, if we agree that "$" is (as it should be) syntactic sugar for get(), then why do we have different behaviour
for what should essentially be the same operations, albeit the former being easier to read and write than the latter?
Or is my premise mistaken and that is the whole point of having "$" and get() which are not identical?
But then it would be inconsistent with what it does in other situations.
I am afraid that I did not fully understand this point. What would the inconsistencies be in other situations?

-Trishank
Hello Duncan,

Thanks for your reply.

On Nov 13, 2009, at 2:27 PM, Duncan Murdoch wrote:

You get the same behaviour when asking for a nonexistent element of a list, or a nonexistent attribute.   If you want stricter checking, don't use $, use get():

get("b", e)
Error in get("b", e) : object 'b' not found
Yes, this is a solution. However, if we agree that "$" is (as it should be) syntactic sugar for get(), then why do we have different behaviour
for what should essentially be the same operations, albeit the former being easier to read and write than the latter?
Or is my premise mistaken and that is the whole point of having "$" and get() which are not identical?

But then it would be inconsistent with what it does in other situations.
I am afraid that I did not fully understand this point. What would the inconsistencies be in other situations?
Inconsistent with what happens for lists:

 > x <- list()
 > x$b
NULL

and attributes:

 > attr(x, "b")
NULL

It is already a little stricter than $ on a list:

 > x$longname <- 1
 > x$long
[1] 1
 > e$longname <- 1
 > e$long
NULL

so I supposed we could make it even more strict, but there is an awful 
lot of code out there that uses tests like

if (!is.null(x <- e$b)) { do something with x }

and all of that would break.

Duncan Murdoch

Inconsistent with what happens for lists:

x <- list()
x$b
NULL

and attributes:

attr(x, "b")
NULL
Ah, I see. I would claim that the same argument for default safety should apply here too.
It is already a little stricter than $ on a list:

x$longname <- 1
x$long
[1] 1
e$longname <- 1
e$long
NULL
I apologize that I cannot say that this is a good idea for reasons of safety and readability.
so I supposed we could make it even more strict, but there is an awful lot of code out there that uses tests like

if (!is.null(x <- e$b)) { do something with x }

and all of that would break.
Unfortunately, such code does make it harder to detect programming errors.
I understand should the hands of R be tied by backwards-compatability; bad habits are hard to break.
Thanks for your time.

-Trishank
If you develop your own code you can add your own behavior by
"extending" the environment class.  I put "extending" in quotation
marks, because 'environment' is one of few classes you should *not*
extend from in the regular S3 (and S4?) sense, at least that was the
case a few years ago.  You can search the r-devel list about issues
when trying to do so.  One thing I remember is that it didn't work
well to save such objects.  Bla bla bla, there are workarounds for it
and the Object class in the R.oo package is one.  Here is how you can
add your protection for your own environment-like objects:

library("R.oo");
o <- Object();
o$foo
[1] NULL

setConstructorS3("PickyObject", function(...) {
  extend(Object(), "PickyObject");
});
setMethodS3("$", "PickyObject", function(this, name) {
  hasField(this, name) || throw("No such field: ", name);
  NextMethod("$");
});

po <- PickyObject();
po$foo

Error in list(`po$foo` = <environment>, ``$.PickyObject`(po, foo)` = <environmen
t>,  :

[2009-11-13 21:39:51] Exception: No such field: foo
  at throw(Exception(...))
  at throw.default("No such field: ", name)
  at throw("No such field: ", name)
  at `$.PickyObject`(po, foo)
  at po$foo

po$foo <- TRUE;
po$foo
[1] TRUE

If of any use.

/Henrik

On Fri, Nov 13, 2009 at 9:03 PM, Trishank Karthik Kuppusamy
On Nov 13, 2009, at 2:47 PM, Duncan Murdoch wrote:

Inconsistent with what happens for lists:

x <- list()
x$b
NULL

and attributes:

attr(x, "b")
NULL
Ah, I see. I would claim that the same argument for default safety should apply here too.

It is already a little stricter than $ on a list:

x$longname <- 1
x$long
[1] 1
e$longname <- 1
e$long
NULL
I apologize that I cannot say that this is a good idea for reasons of safety and readability.

so I supposed we could make it even more strict, but there is an awful lot of code out there that uses tests like

if (!is.null(x <- e$b)) { do something with x }

and all of that would break.
Unfortunately, such code does make it harder to detect programming errors.
I understand should the hands of R be tied by backwards-compatability; bad habits are hard to break.
Thanks for your time.

-Trishank
______________________________________________
R-devel at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Hello Henrik,

If you develop your own code you can add your own behavior by
"extending" the environment class.  I put "extending" in quotation
marks, because 'environment' is one of few classes you should *not*
extend from in the regular S3 (and S4?) sense, at least that was the
case a few years ago.  You can search the r-devel list about issues
when trying to do so.  One thing I remember is that it didn't work
well to save such objects.  Bla bla bla, there are workarounds for it
and the Object class in the R.oo package is one.  Here is how you can
add your protection for your own environment-like objects:
I like this solution! (As well as the name of the picky object.)
If the environment class can be properly subclassed,
then everything should work in principle. Thanks for the tip.

-Trishank
On Nov 13, 2009, at 2:47 PM, Duncan Murdoch wrote:

Inconsistent with what happens for lists:

x <- list()
x$b
NULL

and attributes:

attr(x, "b")
NULL
Ah, I see. I would claim that the same argument for default safety should apply here too.
I have mixed feelings about this.  If you follow the rule in your 
programs that setting x to NULL acts the same as not having x at all, 
then things are fine.  (Sometimes that's impossible, but it is what 
happens when you do the list assignment x$b <- NULL).  Use NA or some 
other special value to signal missing, and NULL will usually cause a 
visible error soon after if you mess up.

It is already a little stricter than $ on a list:

x$longname <- 1
x$long
[1] 1
e$longname <- 1
e$long
NULL
I apologize that I cannot say that this is a good idea for reasons of safety and readability.
I think the list behaviour is a bad design, but it's been in the 
language forever, so we're stuck with it.  It's related to the bad 
design of function calls, where arguments can similarly be abbreviated.

Duncan Murdoch

so I supposed we could make it even more strict, but there is an awful lot of code out there that uses tests like

if (!is.null(x <- e$b)) { do something with x }

and all of that would break.
Unfortunately, such code does make it harder to detect programming errors.
I understand should the hands of R be tied by backwards-compatability; bad habits are hard to break.
Thanks for your time.

-Trishank
______________________________________________
R-devel at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Note that one should use inherits = FALSE argument on get and exists
to avoid returning objects from the parent, the parent of the parent,
etc.
On 11/13/2009 2:03 PM, Trishank Karthik Kuppusamy wrote:
Greetings everyone,

I have a question about the default behaviour of a missing entry in an
environment.
Let us look at the following sequence of R statements:

e <- new.env()
e$a <- 1
e$a
[1] 1
e$b
NULL

I think I understand the logic for returning NULL to a missing entry in an
environment,
but I do not think that it is fully justified.
I am sure that the R developers must have seen this argument before,
but I wish to call for attention to this problem again,
because I think that it is important to the default safety of the R
programming language.
You get the same behaviour when asking for a nonexistent element of a list,
or a nonexistent attribute. ? If you want stricter checking, don't use $,
use get():

get("b", e)
Error in get("b", e) : object 'b' not found

or check first with exists():

exists("b", e)
[1] FALSE

I suppose that one could argue that a good R programmer must be careful
not to use NULL in any of his environment entries,
but I think it is better to remove altogether this burden from the
programmer
and simply raise a good, old-fashioned exception when the "$" operator
encounters a missing entry in an environment.
But then it would be inconsistent with what it does in other situations.

Duncan Murdoch

The biggest advantage is that it will easily eliminate a whole class of
programming error.
The biggest disadvantage is that it is not backwards-compatible with old R
programs.

I suppose a personal solution would be to simply redefine the "$" operator
in my programs.
However, I really do think that the default safety of an R environment
matters very much.
At the very least, it would be nice to be able to configure the safety of
a new environment,
perhaps through a parameter.

-Trishank
______________________________________________
R-devel at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

______________________________________________
R-devel at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Note that one should use inherits = FALSE argument on get and exists
to avoid returning objects from the parent, the parent of the parent,
etc.
I disagree.  Normally you would want to receive those objects.  If you 
didn't, why didn't you set the parent of the environment to emptyenv() 
when you created it?

Duncan Murdoch
On Fri, Nov 13, 2009 at 2:27 PM, Duncan Murdoch <murdoch at stats.uwo.ca> wrote:
On 11/13/2009 2:03 PM, Trishank Karthik Kuppusamy wrote:
Greetings everyone,

I have a question about the default behaviour of a missing entry in an
environment.
Let us look at the following sequence of R statements:

e <- new.env()
e$a <- 1
e$a
[1] 1
e$b
NULL
I think I understand the logic for returning NULL to a missing entry in an
environment,
but I do not think that it is fully justified.
I am sure that the R developers must have seen this argument before,
but I wish to call for attention to this problem again,
because I think that it is important to the default safety of the R
programming language.
You get the same behaviour when asking for a nonexistent element of a list,
or a nonexistent attribute.   If you want stricter checking, don't use $,
use get():

get("b", e)
Error in get("b", e) : object 'b' not found

or check first with exists():

exists("b", e)
[1] FALSE

I suppose that one could argue that a good R programmer must be careful
not to use NULL in any of his environment entries,
but I think it is better to remove altogether this burden from the
programmer
and simply raise a good, old-fashioned exception when the "$" operator
encounters a missing entry in an environment.
But then it would be inconsistent with what it does in other situations.

Duncan Murdoch

The biggest advantage is that it will easily eliminate a whole class of
programming error.
The biggest disadvantage is that it is not backwards-compatible with old R
programs.

I suppose a personal solution would be to simply redefine the "$" operator
in my programs.
However, I really do think that the default safety of an R environment
matters very much.
At the very least, it would be nice to be able to configure the safety of
a new environment,
perhaps through a parameter.

-Trishank
______________________________________________
R-devel at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
______________________________________________
R-devel at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

On 13/11/2009 6:39 PM, Gabor Grothendieck wrote:
Note that one should use inherits = FALSE argument on get and exists
to avoid returning objects from the parent, the parent of the parent,
etc.
I disagree. ?Normally you would want to receive those objects. ?If you
didn't, why didn't you set the parent of the environment to emptyenv() when
you created it?

$ does not look into the parent so if you are trying to get those
semantics you must use inherits = FALSE.
x <- 3
e <- new.env()
"x" %in% names(e)
[1] FALSE
get("x", e) # oops
[1] 3
On Fri, Nov 13, 2009 at 7:21 PM, Duncan Murdoch <murdoch at stats.uwo.ca> wrote:
On 13/11/2009 6:39 PM, Gabor Grothendieck wrote:
Note that one should use inherits = FALSE argument on get and exists
to avoid returning objects from the parent, the parent of the parent,
etc.
I disagree.  Normally you would want to receive those objects.  If you
didn't, why didn't you set the parent of the environment to emptyenv() when
you created it?

$ does not look into the parent so if you are trying to get those
semantics you must use inherits = FALSE.
Whoops, yes.  That's another complaint about $ on environments.

Duncan Murdoch

x <- 3
e <- new.env()
"x" %in% names(e)
[1] FALSE
get("x", e) # oops
[1] 3
Hi,
On 13/11/2009 7:26 PM, Gabor Grothendieck wrote:
On Fri, Nov 13, 2009 at 7:21 PM, Duncan Murdoch <murdoch at stats.uwo.ca>
wrote:
On 13/11/2009 6:39 PM, Gabor Grothendieck wrote:
Note that one should use inherits = FALSE argument on get and exists
to avoid returning objects from the parent, the parent of the parent,
etc.
I disagree. ?Normally you would want to receive those objects. ?If you
didn't, why didn't you set the parent of the environment to emptyenv()
when
you created it?

$ does not look into the parent so if you are trying to get those
semantics you must use inherits = FALSE.
Whoops, yes. ?That's another complaint about $ on environments.
That was an intentional choice. AFAIR neither $ nor [[ on
environments was not meant to mimic get, but rather to work on the
current environment as if it were a hash-like object. One can always
get the inherits semantics by simple programming, but under the model
you seem to be suggesting, preventing such behavior when you don't own
the environments in question is problematic.

  Robert
Duncan Murdoch

x <- 3
e <- new.env()
"x" %in% names(e)
[1] FALSE
get("x", e) # oops
[1] 3

______________________________________________
R-devel at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Robert Gentleman
rgentlem at gmail.com
Hi,

On Fri, Nov 13, 2009 at 4:55 PM, Duncan Murdoch <murdoch at stats.uwo.ca> wrote:
On 13/11/2009 7:26 PM, Gabor Grothendieck wrote:
On Fri, Nov 13, 2009 at 7:21 PM, Duncan Murdoch <murdoch at stats.uwo.ca>
wrote:
On 13/11/2009 6:39 PM, Gabor Grothendieck wrote:
Note that one should use inherits = FALSE argument on get and exists
to avoid returning objects from the parent, the parent of the parent,
etc.
I disagree.  Normally you would want to receive those objects.  If you
didn't, why didn't you set the parent of the environment to emptyenv()
when
you created it?

$ does not look into the parent so if you are trying to get those
semantics you must use inherits = FALSE.
Whoops, yes.  That's another complaint about $ on environments.
 That was an intentional choice. AFAIR neither $ nor [[ on
environments was not meant to mimic get, but rather to work on the
current environment as if it were a hash-like object. One can always
get the inherits semantics by simple programming, but under the model
you seem to be suggesting, preventing such behavior when you don't own
the environments in question is problematic.
Sure, I agree with how you did that; I'm not sure you had any choice at 
the time (didn't all environments have base as a parent then?).  Even 
now, you do want both inherits=TRUE and inherits=FALSE behaviour in 
different circumstances, and $ has to pick just one.  Probably my 
wording should have been "That's another gotcha about $ on environments."

Duncan Murdoch
Hi,

On Fri, Nov 13, 2009 at 4:55 PM, Duncan Murdoch <murdoch at stats.uwo.ca> wrote:
On 13/11/2009 7:26 PM, Gabor Grothendieck wrote:
On Fri, Nov 13, 2009 at 7:21 PM, Duncan Murdoch <murdoch at stats.uwo.ca>
wrote:
On 13/11/2009 6:39 PM, Gabor Grothendieck wrote:
Note that one should use inherits = FALSE argument on get and exists
to avoid returning objects from the parent, the parent of the parent,
etc.
I disagree.  Normally you would want to receive those objects.  If you
didn't, why didn't you set the parent of the environment to emptyenv()
when
you created it?

$ does not look into the parent so if you are trying to get those
semantics you must use inherits = FALSE.
Whoops, yes.  That's another complaint about $ on environments.
 That was an intentional choice. AFAIR neither $ nor [[ on
environments was not meant to mimic get, but rather to work on the
current environment as if it were a hash-like object. One can always
get the inherits semantics by simple programming, but under the model
you seem to be suggesting, preventing such behavior when you don't own
the environments in question is problematic.

  Robert
Yes. Also, AFAIR, emptyenv() came later. At the time you couldn't go 
deeper than baseenv().

And at any rate, some of the intended applications are dataframe-like, 
and I don't think you want to preclude use of with() and other forms of 
evaluation in the environment, as in

 > e <- evalq(environment(),airquality)
 > ls(e)
[1] "Day"     "Month"   "Ozone"   "Solar.R" "Temp"    "Wind"

 > evalq(logO3 <- log(Ozone), e) # works fine
 > lm(Ozone~Wind, data=e) # ditto
...

 > parent.env(e) <- emptyenv()
 > evalq(logO3 <- log(Ozone), e)
Error in eval(substitute(expr), envir, enclos) :
   could not find function "<-"
 > lm(Ozone~Wind, data=e)
Error in eval(expr, envir, enclos) : could not find function "list"
O__  ---- Peter Dalgaard             ?ster Farimagsgade 5, Entr.B
   c/ /'_ --- Dept. of Biostatistics     PO Box 2099, 1014 Cph. K
  (*) \(*) -- University of Copenhagen   Denmark      Ph:  (+45) 35327918
~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk)              FAX: (+45) 35327907