Skip to content
Prev 300199 / 398521 Next

variable (column) in a data frame

On 2012-07-15 10:01, Paulo Barata wrote:
Paulo,

I understand your concerns and I do think that the "best"
thing would be to excise the $ shortcut from the language
or, at least, make y$x equivalent to
y[["x", exact = TRUE]]. But, as has been pointed out
before, that might not be easy. Nevertheless, even y[["x"]]
may not be the ultimate panacea. Consider your own
example:

df <- data.frame(a = 1:3, b=11:13)
sum(df[["aaa"]] == 2)
#[1] 0

which results from

df[["aaa"]] == 2
#logical(0)

The safest extraction is y[ , "x"]:

sum(df[ , "aaa"] == 2)
#Error in `[.data.frame`(df, , "aaa") : undefined columns selected

But then, this comes down to whether one thinks that
addressing a nonexistent variable should result in an
error or should return NULL.

The bottom line probably is that the $ behaviour will not change
in the near future and one would simply be well advised to be
aware of its behaviour. Every language has its quirks. Just be
thankful that the R language isn't as big a mess as the English
language (which I do love dearly).

Peter Ehlers