Every time I have to prefix a dataframe column inside the indexing brackets with the dataframe name, e.g. df[df$colname==value,] -- I am wondering, why isn't there an R scoping rule that search starts with the dataframe names, as if we'd said with(df, df[colname==value,]) -- wouldn't that be a reasonable default to prepend to the name search path? Cheers, Alexy
name scoping within dataframe index
7 messages · Duncan Murdoch, Gabor Grothendieck, Alexy Khrabrov
On 1/26/2009 1:46 PM, Alexy Khrabrov wrote:
Every time I have to prefix a dataframe column inside the indexing brackets with the dataframe name, e.g. df[df$colname==value,] -- I am wondering, why isn't there an R scoping rule that search starts with the dataframe names, as if we'd said with(df, df[colname==value,]) -- wouldn't that be a reasonable default to prepend to the name search path?
If you did that, it would be quite difficult to get at a "colname"
variable that *isn't* the column of df. It would be something like
df[get("colname", parent.frame()) == value,]
So just use subset(), or with(), or type the extra 3 chars.
Duncan
Try: subset(df, colname == value)
On Mon, Jan 26, 2009 at 1:46 PM, Alexy Khrabrov <deliverable at gmail.com> wrote:
Every time I have to prefix a dataframe column inside the indexing brackets with the dataframe name, e.g. df[df$colname==value,] -- I am wondering, why isn't there an R scoping rule that search starts with the dataframe names, as if we'd said with(df, df[colname==value,]) -- wouldn't that be a reasonable default to prepend to the name search path? Cheers, Alexy
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
On 1/26/2009 1:46 PM, Alexy Khrabrov wrote:
Every time I have to prefix a dataframe column inside the indexing brackets with the dataframe name, e.g. df[df$colname==value,] -- I am wondering, why isn't there an R scoping rule that search starts with the dataframe names, as if we'd said with(df, df[colname==value,]) -- wouldn't that be a reasonable default to prepend to the name search path?
If you did that, it would be quite difficult to get at a "colname"
variable that *isn't* the column of df. It would be something like
df[get("colname", parent.frame()) == value,]
Actually, what I propose is a special search rule which simply looks at the enclosing dataframe.name[...] outside the brackets and looks up the columns first. It would break legacy code which used the column names identical to variables in this context, but there's probably other ideas to enhance R readability which would break legacy code. Perhaps when the next major overhaul occurs, this is something folks can voice opinions about. I find the need for inner prefixing quite unnatural, FWIW. Cheers, Alexy
On 1/26/2009 2:01 PM, Alexy Khrabrov wrote:
On 1/26/2009 1:46 PM, Alexy Khrabrov wrote:
Every time I have to prefix a dataframe column inside the indexing brackets with the dataframe name, e.g. df[df$colname==value,] -- I am wondering, why isn't there an R scoping rule that search starts with the dataframe names, as if we'd said with(df, df[colname==value,]) -- wouldn't that be a reasonable default to prepend to the name search path?
If you did that, it would be quite difficult to get at a "colname"
variable that *isn't* the column of df. It would be something like
df[get("colname", parent.frame()) == value,]
Actually, what I propose is a special search rule which simply looks at the enclosing dataframe.name[...] outside the brackets and looks up the columns first.
Yes, I understood that, and I explained why it would be a bad idea. Duncan Murdoch
It would break legacy code which used the column names identical to variables in this context, but there's probably other ideas to enhance R readability which would break legacy code. Perhaps when the next major overhaul occurs, this is something folks can voice opinions about. I find the need for inner prefixing quite unnatural, FWIW. Cheers, Alexy
On Jan 26, 2009, at 2:12 PM, Duncan Murdoch wrote:
df[get("colname", parent.frame()) == value,]
Actually, what I propose is a special search rule which simply looks at the enclosing dataframe.name[...] outside the brackets and looks up the columns first.
Yes, I understood that, and I explained why it would be a bad idea.
Well this is the case in all programming languages with scoping where inner-scope variables override the outer ones. Usually it's solved with prefixing with the outer scope, outercsope.name or outerscope::name or so. So it only underscores the need to improve scoping access in R. Dataframe column names belong to the dataframe object and the natural thing would be to enable easy access to naming; you'd need to apply an extra effort to access an overridden unrelated external variable. Again, just an analogy from other programming languages. Cheers, Alexy
On 1/26/2009 2:20 PM, Alexy Khrabrov wrote:
On Jan 26, 2009, at 2:12 PM, Duncan Murdoch wrote:
df[get("colname", parent.frame()) == value,]
Actually, what I propose is a special search rule which simply looks at the enclosing dataframe.name[...] outside the brackets and looks up the columns first.
Yes, I understood that, and I explained why it would be a bad idea.
Well this is the case in all programming languages with scoping where inner-scope variables override the outer ones. Usually it's solved with prefixing with the outer scope, outercsope.name or outerscope::name or so. So it only underscores the need to improve scoping access in R. Dataframe column names belong to the dataframe object and the natural thing would be to enable easy access to naming; you'd need to apply an extra effort to access an overridden unrelated external variable. Again, just an analogy from other programming languages.
The issue is that in most cases the outer scope would be unnamed: it's the one that currently doesn't need a prefix. So if we have a prefix meaning "this scope", why wouldn't that evaluate to "df" in that context? I guess we need a prefix meaning "the caller's scope", but that's just going to lead to confusion: is it the caller of the function that is trying to index df, or the function trying to do the indexing? So we'd need a prefix specific to indexing: and that's just too ugly for words. As I said, use subset() or with(). For subset selection, subset() works very nicely. (I don't like the way it does column selection, but that's a different argument.) Duncan Murdoch