variable (column) in a data frame
On 2012-07-15 10:01, Paulo Barata wrote:
Dear Peter, Thank you. I will try to modify my programming habits. But it seems there is a flaw in R, when it accepts a reference to a non-existent variable inside a data frame with the df$var notation. This should be corrected somehow. Paulo Barata
Paulo, I understand your concerns and I do think that the "best" thing would be to excise the $ shortcut from the language or, at least, make y$x equivalent to y[["x", exact = TRUE]]. But, as has been pointed out before, that might not be easy. Nevertheless, even y[["x"]] may not be the ultimate panacea. Consider your own example: df <- data.frame(a = 1:3, b=11:13) sum(df[["aaa"]] == 2) #[1] 0 which results from df[["aaa"]] == 2 #logical(0) The safest extraction is y[ , "x"]: sum(df[ , "aaa"] == 2) #Error in `[.data.frame`(df, , "aaa") : undefined columns selected But then, this comes down to whether one thinks that addressing a nonexistent variable should result in an error or should return NULL. The bottom line probably is that the $ behaviour will not change in the near future and one would simply be well advised to be aware of its behaviour. Every language has its quirks. Just be thankful that the R language isn't as big a mess as the English language (which I do love dearly). Peter Ehlers
--------------------------------------------------------------------- ---------- Original Message ----------- From: Peter Ehlers<ehlers at ucalgary.ca> To: Paulo Barata<paulo.barata at ensp.fiocruz.br> Cc: "r-help at r-project.org"<r-help at r-project.org>, peter dalgaard <pdalgd at gmail.com> Sent: Sun, 15 Jul 2012 09:29:11 -0700 Subject: Re: [R] variable (column) in a data frame
On 2012-07-15 08:41, Paulo Barata wrote:
Dr. Dalgaard, Thank you. But pre-checking with is.null() or using with() doesn't solve the problem of catching spelling mistakes in the name of a variable inside a data frame, when using the df$var notation often in a program. Is there some way for R to behave, in relation to a variable inside a data frame, the same way it behaves for a variable not in a data frame? For example: ##---------------------------------------- a<- c(1,2,3) ## the variable exists, we get a correct answer a==1 ## the variable does not exist, R rightly points this out aaa==1 ##---------------------------------------- My point is, if we make a spelling mistake in a program when referring to a variable inside a data frame, using the df$var notation, there seems to be no way of getting warned about that.
You could wean yourself from the $-habit. It's convenient but can lead to the problems you're experiencing (and this has been discussed before). For programming, if you're prone to make spelling errors, you should prefer df[, "aaa"]. See ?Extract. Peter Ehlers
Thank you once again. Paulo Barata --------------------------------------------------------------------- ---------- Original Message ----------- From: peter dalgaard<pdalgd at gmail.com> To: "Paulo Barata"<paulo.barata at ensp.fiocruz.br> Sent: Sun, 15 Jul 2012 16:47:35 +0200 Subject: Re: [R] variable (column) in a data frame
On Jul 15, 2012, at 16:30 , Paulo Barata wrote:
To the R help list, When using a data frame, there is no warning or error message when I refer to a non-existent variable inside the data frame. Example: ##---------------------------------------------- a<- c(1,2,3) b<- c(11,22,33) df<- data.frame(a,b) df ## correct: there is a column in df named 'a' ## the sum is correctly performed sum(df$a==2) ## incorrect: there is no column in df named 'aaa', ## but the sum is performed anyway without either warning or error sum(df$aaa==2) ##---------------------------------------------- Is there some way to make R issue either a warning or an error message in such a situation?
You can pre-check for is.null(df$aaa) or use with(df, sum(aaa==2)). -- Peter Dalgaard, Professor, Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean.
------- End of Original Message -------
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
-- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean.
------- End of Original Message -------