Peculiar behavior of attached objects
Prof. Ripley, Thanks for your quick reply. It was nice to hear an answer from one of the experts in the field. I agree that this behavior of R is as documented, and found a good summary on p.29 of "An Introduction to R" at http://cran.r-project.org/doc/manuals/R-intro.pdf When I had read the shorter online documentation, I had assumed that the attach() command was like the "import" command of Python or the "use" statement of Fortran90, which are very useful for avoiding name clashes or allowing the usage of short names to refer to variables with longer names in a package or module. It still seems to me that requiring an R user to detach() and attach() after each change of a dataframe variable in order to be able to access it by a short name is somewhat awkward. Perhaps this is such an intrinsic part of the language that it would be hard to extend, in which case the developers might consider adding a new command which had more of the flexibility of the Python "import" command or Fortran90 "use" statement. (Of course it's easy for me to suggest such a change, and much harder for someone to actually implement it...) For example, in python, one can refer to variables by their fully qualified module names:
import math math.pi
3.1415926535897931 or by a shorter name defined inside a module:
from math import * pi
3.1415926535897931 or if there is a name clash with another variable by the same name it can be renamed:
from math import pi as pim pim
3.1415926535897931 Similar things can be done with the Fortran90 "use" statement, which I know does not create a new copy of a variable with separate memory storage, which would be inefficient and awkward to use, it is just a new name that points to the same location in memory. More info at: http://www.python.org/doc/current/tut/tut.html http://www.python.org/doc/current/ref/import.html http://w3.pppl.gov/~hammett/comp/f90tut/f90.tut7.html As for my "explanation", I had assumed that it was only the action of assigning to d$y while attach(d) was in effect that created a new variable called y that was a copy of the original value of the dataframe variable. I guess you are telling me that a copy of the whole dataframe is made every time an attach() command is issued. I had assumed that only new pointers with short names were created by an attach() command (which is what the documentation seemed to imply to me), as it seems inefficient to make a copy of a whole dataframe at every attach() command, particularly for a very large database. Thanks, Greg ----------------------------------------------------------------------------
ripley at stats.ox.ac.uk wrote:
Sorry, but you mis-read the help page for `attach', and your explanation is poppycock.
d <- data.frame(y=10) attach(d) d$y <- 20 y
[1] 10
find("y")
[1] "d" There is no `new variable': you are still seeing the one in the database which is attached. As the help page clearly says, that is not changed but a copy in the global environment is. You can change the attached copy by direct use of assign():
assign("y", 30, pos=2)
y
[1] 30 but that does not change d. You can also detach and attach. If you find the R documentation terse (it can be) do cross-check the S documentation (and this point is in both Venables & Ripley books, too). On Sat, 17 Aug 2002, Greg Hammett wrote:
I've just discovered R and think it is terrific. I quickly reproduced results with a few lines of R commands that 7 years ago I had to do with a larger fortran code and many calls to NAG routines. (I'm mostly a computational plasma physicist, but occasionally delve into statistical analysis of data.) But I've come accross a very peculiar behavior of attached objects that cost me hours of searching for a bug, and it would be nice if the R developers could implement a small change to make the language easier to use. The problem was originally buried in a much larger code, but I've boiled it down to a 6 line example: -----------
d <- data.frame(y=10) attach(d) d$y <- 20
----------- The online help for attach() warns not to assign to the short variable name "y", as that creates a new variable named "y" and the original variable "d$y" remains unchanged. So I assumed that I could assign to the fully qualified name "d$y", and indeed that successfully changed the value of d$y: -----------
d$y
[1] 20
y
[1] 10
ls()
[1] "d"
------------
However, unbeknownst to me at first, it also created a new variable "y"
that keeps the original value of "d$y" and no longer points to the
present value of "d$y$". Furthermore, this new variable "y" doesn't
show up in the list of objects reported by ls()! (This is unlike the
example given in help(attach), where the new variable "height" created
by the assignment shows up in the ls() object list.) If a user assumes
that "y" points to the present value of "d$y$, as the attach() command
usually does, he will have bugs that will be very hard to track down.
Although the new variable "y" is hidden from the ls() list of objects,
it will be removed by doing a detach("d") command:
-----------
detach("d")
y
Error: Object "y" not found ----------- I can't think of any good reason why R should behave like this. I've tried this same example in Splus, and was surprised to see that it has the same behavior, so I suppose R at least has compatible peculiarities. I understand that assigning to a short variable name when attach is operational is supposed to create a new variable instead of modifying the original:
d <- data.frame(y=10) attach(d) y <- 20 d$y
[1] 10 and that a lot of R code might have been written assuming this behavior so it probably shouldn't be changed at this point. But if one makes an assignment to a fully qualified long variable name, I can't think of any good reason for a new semi-hidden variable to be created. Thus I think that R should instead do the following:
d <- data.frame(y=10) attach(d) d$y <- 20 y
[1] 20
This seems to me to be a much more natural and intuitive behavior that
the user should expect. Compatibility issues may require adding a
switch to allow users to get the old behavior if they really wanted, but
I can't think of how any users could have relied on this undocumented
"feature"...
--------------------------------------------------------------
I'm new to R, so perhaps I'm missing something that could be explained
to me. If it is decided not to change R's behavior, then at the least
I suggest that the example given by help(attach) be extended by
appending the following:
attach(women)
women$height <- height*2.54 ## Don't try to do this either, as it
## will still create a new variable "height" with the original
## values of women$height. I.e., height no longer points to the
## present value of women$height:
sd(women$height-height) # shows 6.88709
## furthermore, this new variable is not listed by ls() and
## disappears after doing detach("women")
ls()
detach("women")
height # gives an error message
------------
Greg Hammett hammett at princeton.edu
Lecturer with rank of Professor,
Astrophysical Sciences, Princeton University
Principal Research Physicist,
Princeton Plasma Physics Laboratory
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
-- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272860 (secr) Oxford OX1 3TG, UK Fax: +44 1865 272595 -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._