Skip to content

pros/cons of teaching attach()

18 messages · Joshua Wiley, Graham Smith, Gavin Simpson +8 more

#
Hi all,

I am wondering if anyone teaches the attach function to new students
and if so, why?


I have never taught a class using R, but in small tutorials and
providing direct help, I have run across this issue several times.  I
have always avoided it, but a couple of times, students had learned to
use it in a class (though without much detail) and were rather grumpy
with me for using full variable names or with().  New converts,
particularly from GUIs like SPSS, already tend to be leery of R, and I
hate to make working at the command line more onerous since any typing
is more than they are used to, but at the same time a discussion of
environments, which I think is necessary to avoid trouble with attach,
does not seem like beginner material either.

I have wondered about this many times, but was just reminded again by
an R-help post.

Thanks,

Josh
#
Hi Joshua,

I do teach R to my students and face each time what to do about attach() 
.  I can't really add to your clear statement of the main reasons for 
and against teaching attach() and I tend sometimes to go one way and 
sometimes the other.

Regards,  Murray
On 22/09/2010 4:34 p.m., Joshua Wiley wrote:

  
    
#
Josh
I'm not sure whether my approach is the best or worst of both worlds.

I use attach() at the beginning with warnings of it being bad practice
and explain why. But also explain why on occasions, it can make life
easier. Such as your first few lessons in R.

Later we subset the data and then use the data$variable approach and I
remind them of my earlier explanation of attach() and why we are now
using a different approach.

Graham
#
On Tue, 2010-09-21 at 21:34 -0700, Joshua Wiley wrote:
I don't use attach() at all; too easy to make mistakes that are
difficult to track down. I tend not to use $ either as that encourages
people to use it to go fiddling in the bowels of other lists (objects);
model$residuals might be OK for lm() models (though not if you've used
na.exclude for example) but students might be surprised with what they
get with more complex modelling functions with different residuals.

If I was really teaching R, I would mention attach() and then show why
it is bad so I've dealt with the issue on my terms.

G

  
    
#
Thanks to everyone for your thoughtful replies.  I think I am inclined
not to use attach, but to bring it up and briefly mention some of the
mistakes that are easy to make.  If students already know/use attach,
it will at least be clear why I do not use it, and for students
unfamiliar with it, they should not be tempted to start.

Thanks again for all the responses!

Josh
#
On Wed, Sep 22, 2010 at 4:04 AM, Joshua Wiley <jwiley.psych at gmail.com> wrote:
I avoid teaching attach() and discourage its use if the students have
already seen it.  I do teach both with and within (with provides
read-only access, within provides read-write access to the variable
names in a data frame or a list).  As mentioned in an earlier reply in
this thread, the use of extractor functions, like residuals() or
fitted() or coef(), should be preferred to reaching inside a data
object and grabbing a component that may or may not continue to be
defined in future versions,

The supplementary material on R that I use in a first-year grad course
on applied statistics is available, for this semester, at
www.stat.wisc.edu/~st849-1/Rnotes  My approach to data organization is
shown in the "Introduction to R".
1 day later
#
This thread is interesting. For those who use the
subset command, when you extract a variable from a
data frame, do you give it the same name as in the data set?

For instance,

my.data with variables trout, whale

trout <- subset(my.data, select=trout, drop=T)

I guess if you never attach, then you don't have to
worry about masking in the future.

Thanks

-Laura
On 9/21/2010 11:34 PM, Joshua Wiley wrote:

  
    
#
On Thu, Sep 23, 2010 at 5:53 AM, Laura Chihara <lchihara at carleton.edu> wrote:
For me this would depend on whether I would be merging the data back
in and/or whether I would be using the data set (and original
variable) again.  For example,

conc <- subset(DNase, select = "conc")
# now forget about DNase
# various operations with conc
--but--
conc.tmp <- subset(DNase, select = "conc") # I often add .tmp or 2 or something
# do some stuff (data cleaning, transformations, whatever)
DNase[ , "conc"] <- conc.tmp
# on to model fitting or whatever with full data

That said unless there is reason to be doing alot of work with only
one variable from the data set or to assign changes you may want to
reverse, my preference would always be to leave it in the data set (it
was in there for a reason, after all).
If I was only selecting entire variables, I would just use one of the
extraction operators, e.g., my.data[, "trout"]

  
    
#
> On Wed, Sep 22, 2010 at 4:04 AM, Joshua Wiley
> <jwiley.psych at gmail.com> wrote:
>> Thanks to everyone for your thoughtful replies. ?I think
    >> I am inclined not to use attach, but to bring it up and
    >> briefly mention some of the mistakes that are easy to
    >> make. ?If students already know/use attach, it will at
    >> least be clear why I do not use it, and for students
    >> unfamiliar with it, they should not be tempted to start.
    >> 
    >> Thanks again for all the responses!

    > I avoid teaching attach() and discourage its use if the
    > students have already seen it.  I do teach both with and
    > within (with provides read-only access, within provides
    > read-write access to the variable names in a data frame or
    > a list).  As mentioned in an earlier reply in this thread,
    > the use of extractor functions, like residuals() or
    > fitted() or coef(), should be preferred to reaching inside
    > a data object and grabbing a component that may or may not
    > continue to be defined in future versions,

    > The supplementary material on R that I use in a first-year
    > grad course on applied statistics is available, for this
    > semester, at www.stat.wisc.edu/~st849-1/Rnotes My approach
    > to data organization is shown in the "Introduction to R".

I agree entirely with Doug and Gavin(mostly) and Jonathan.

However,  there's one remark about attach() that "everyone" seems to
forget (AFAIR even Doug ;-):

Do consider (and teach!) using  attach("foobar.rda") for *.rda files
which I often find quite preferable to load().
I forgot who (from R-core) introduced this idea,
but I do like it:  It rehabilitates attach() into a "decent R
function" :-)

Martin
#
On Thu, Sep 23, 2010 at 12:50 PM, Martin Maechler
<maechler at stat.math.ethz.ch> wrote:
[snip]
This is amazing.  In two sentences, you made working with *.rda files
containing multiple objects easier (which makes save.image suddenly
seem much more useful) and provided a good answer to, "why was it
written if it should be avoided?"

--Josh
#
On 09/22/2010 04:36 AM, Gavin Simpson wrote:
This is going to be a dumb question, but what do you use instead of $ 
for accessing variables in a list? [[]] allows for the same fiddling.

Thanks,

Tyler
#
On Thu, Sep 23, 2010 at 3:48 PM, Tyler Smith <tyler.smith at eku.edu> wrote:
Most people prefer the $ operator because it is easier to type and you
don't need to quote the name.  The expression

fr$x

is simpler to type and probably to understand than is

fr[["x"]]

However, it is good to know that "[[]]" can be used because it is the
answer to a frequent question from beginners, which is "How do I pass
the name of the variable into a function?".  They will often try

meanMed <- function(frm, nm) c(mean=mean(frm$nm), median=median(frm$nm))

and are frustrated that it doesn't work as intended.  If you want to
use the value of the argument called nm as the name of the variable to
extract you must use frm[[nm]]
#
I think that the reference to not using $ or [[ was meant for cases where there is a proper extraction function, residuals being the example used.

If "fit" is an lm object then I could plot the fitted vs. residuals plot like this:
Or like this:
The second one is preferred as it properly extracts the information without needing to know the exact contents of fit (and the axis limits look a little nicer). 

With an lm object the 2 plots will be essentially the same, but what if fit is a glm object?  Then fit$residuals does give something that fits the definition of residuals, but of the different types of residuals available for glm fits, this gives the one that is least interesting/interpretable to humans (they were useful to the program for getting the fit).  Here the resid (or residuals) function defaults to a more meaningful set of residuals and gives options for other types.

If we have arbitrary objects without extractor functions then we need to use $ or [[ to extract the individual elements, but when working with fitted objects it is much better to teach students to use the proper extractor functions rather than directly working with elements of the object itself.
#
I think what Greg say here is the "official wisdom" and it leads to more 
future-proof code as the structure of various objects can change in new 
versions of R.

OTOH the structure of a kind of object can always be explored with str() 
but it may not be easy to find out what extractor functions are 
available for the object.

Murray
On 25/09/2010 8:16 a.m., Greg Snow wrote:

  
    
#
When will I ever learn to proof-read my emails?!
=============
I think what Greg sayS here is the "official wisdom" and it leads to 
more future-proofED code as the structure of various objects can change 
in new versions of R.

OTOH the structure of a kind of object can always be explored with 
str(), but it may not be easy to find out what extractor functions are 
available for the object.

Murray
On 25/09/2010 8:16 a.m., Greg Snow wrote:

  
    
#
This may not always be useful when we venture too far from the cosy 
world of lm().

For example I have just been looking at the function flexmixedruns()
from package fpc, and the object  fr  that is its value in the examples 
for that function.

 > methods(class=class(fr))

just produces the standard list methods because fr is defined as a list. 
To get stuff out you need $.

Still I like Gabor's suggestion and I will use it when a function 
returns an object of an unusual class.

Murray
On 25/09/2010 11:07 a.m., Gabor Grothendieck wrote: