To the R help list, When using a data frame, there is no warning or error message when I refer to a non-existent variable inside the data frame. Example: ##---------------------------------------------- a <- c(1,2,3) b <- c(11,22,33) df <- data.frame(a,b) df ## correct: there is a column in df named 'a' ## the sum is correctly performed sum(df$a==2) ## incorrect: there is no column in df named 'aaa', ## but the sum is performed anyway without either warning or error sum(df$aaa==2) ##---------------------------------------------- Is there some way to make R issue either a warning or an error message in such a situation? I am using R version 2.15.1 64-bit on Windows 7 Professional. Thank you very much. Paulo Barata --------------------------------------------------------------------- Paulo Barata ENSP - Funda??o Oswaldo Cruz Rua Leopoldo Bulh?es 1480 - 8A 21041-210 Rio de Janeiro - RJ Brazil E-mail: paulo.barata at ensp.fiocruz.br
variable (column) in a data frame
8 messages · Paulo Barata, John Kane, arun +3 more
This seems more or less correct to me. 1> sum(df$a==1) [1] 1 1> sum(df$a==2) [1] 1 1> sum(df$aaa==2) [1] 0 There is no df$aaa so the length is 0 which is what I think you are asking. What am I missing? John Kane Kingston ON Canada
-----Original Message----- From: paulo.barata at ensp.fiocruz.br Sent: Sun, 15 Jul 2012 11:30:37 -0300 To: r-help at r-project.org Subject: [R] variable (column) in a data frame To the R help list, When using a data frame, there is no warning or error message when I refer to a non-existent variable inside the data frame. Example: ##---------------------------------------------- a <- c(1,2,3) b <- c(11,22,33) df <- data.frame(a,b) df ## correct: there is a column in df named 'a' ## the sum is correctly performed sum(df$a==2) ## incorrect: there is no column in df named 'aaa', ## but the sum is performed anyway without either warning or error sum(df$aaa==2) ##---------------------------------------------- Is there some way to make R issue either a warning or an error message in such a situation? I am using R version 2.15.1 64-bit on Windows 7 Professional. Thank you very much. Paulo Barata --------------------------------------------------------------------- Paulo Barata ENSP - Funda??o Oswaldo Cruz Rua Leopoldo Bulh?es 1480 - 8A 21041-210 Rio de Janeiro - RJ Brazil E-mail: paulo.barata at ensp.fiocruz.br
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
____________________________________________________________ GET FREE SMILEYS FOR YOUR IM & EMAIL - Learn more at http://www.inbox.com/smileys Works with AIM?, MSN? Messenger, Yahoo!? Messenger, ICQ?, Google Talk? and most webmails
Hi, I guess you can try this: #You will get the same result here: ?df$aaa==2 logical(0) !df$aaa==2 logical(0) #But it is different for the variable present in the dataframe ?df$a==4 [1] FALSE FALSE FALSE ?!df$a==4 [1] TRUE TRUE TRUE ?identical(df$aaa==2,!df$aaa==2) [1] TRUE ?identical(df$a==4,!df$a==4) [1] FALSE A.K. ----- Original Message ----- From: Paulo Barata <paulo.barata at ensp.fiocruz.br> To: r-help at r-project.org Cc: Sent: Sunday, July 15, 2012 10:30 AM Subject: [R] variable (column) in a data frame To the R help list, When using a data frame, there is no warning or error message when I refer to a non-existent variable inside the data frame. Example: ##---------------------------------------------- a <- c(1,2,3) b <- c(11,22,33) df <- data.frame(a,b) df ## correct: there is a column in df named 'a' ## the sum is correctly performed sum(df$a==2) ## incorrect: there is no column in df named 'aaa', ## but the sum is performed anyway without either warning or error sum(df$aaa==2) ##---------------------------------------------- Is there some way to make R issue either a warning or an error message in such a situation? I am using R version 2.15.1 64-bit on Windows 7 Professional. Thank you very much. Paulo Barata --------------------------------------------------------------------- Paulo Barata ENSP - Funda??o Oswaldo Cruz Rua Leopoldo Bulh?es 1480 - 8A 21041-210? Rio de Janeiro - RJ Brazil E-mail: paulo.barata at ensp.fiocruz.br ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Hoi Pauli, There is a difference between two ways of accessing columns in a matrex:
df$aaa
NULL
df["AAA"]
Error in `[.data.frame`(df, "AAA") : undefined columns selected So df["AAA"] or df[,"AAA"] gives the error message you expect. ------------------- Frans -----Oorspronkelijk bericht----- Van: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] Namens Paulo Barata Verzonden: zondag 15 juli 2012 16:31 Aan: r-help at r-project.org Onderwerp: [R] variable (column) in a data frame To the R help list, When using a data frame, there is no warning or error message when I refer to a non-existent variable inside the data frame. Example: ##---------------------------------------------- a <- c(1,2,3) b <- c(11,22,33) df <- data.frame(a,b) df ## correct: there is a column in df named 'a' ## the sum is correctly performed sum(df$a==2) ## incorrect: there is no column in df named 'aaa', ## but the sum is performed anyway without either warning or error sum(df$aaa==2) ##---------------------------------------------- Is there some way to make R issue either a warning or an error message in such a situation? I am using R version 2.15.1 64-bit on Windows 7 Professional. Thank you very much. Paulo Barata --------------------------------------------------------------------- Paulo Barata ENSP - Funda??o Oswaldo Cruz Rua Leopoldo Bulh?es 1480 - 8A 21041-210 Rio de Janeiro - RJ Brazil E-mail: paulo.barata at ensp.fiocruz.br ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
1 day later
Dear Frans and Peter, Yes, the notation df[,'var'] is able to catch a non-existent variable var inside a data frame df. But the notation df$var isn't. So we have this situation, where two different notations, which (as far as I understand) perform the same action, have different kinds of response. Couldn't this situation be fixed? Isn't it possible to make the df$var notation to issue an error when referring to a non-existent variable inside the data frame? Thank you very much. Paulo Barata --------------------------------------------------------------------- ---------- Original Message ----------- From: "Frans Marcelissen" <frans.marcelissen at digipsy.nl> To: "'Paulo Barata'" <paulo.barata at ensp.fiocruz.br>, <r-help at r-project.org> Sent: Mon, 16 Jul 2012 14:25:21 +0200 Subject: RE: [R] variable (column) in a data frame
Hoi Pauli, There is a difference between two ways of accessing columns in a matrex:
df$aaa
NULL
df["AAA"]
Error in `[.data.frame`(df, "AAA") : undefined columns selected So df["AAA"] or df[,"AAA"] gives the error message you expect. ------------------- Frans -----Oorspronkelijk bericht----- Van: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] Namens Paulo Barata Verzonden: zondag 15 juli 2012 16:31 Aan: r-help at r-project.org Onderwerp: [R] variable (column) in a data frame To the R help list, When using a data frame, there is no warning or error message when I refer to a non-existent variable inside the data frame. Example: ##---------------------------------------------- a <- c(1,2,3) b <- c(11,22,33) df <- data.frame(a,b) df ## correct: there is a column in df named 'a' ## the sum is correctly performed sum(df$a==2) ## incorrect: there is no column in df named 'aaa', ## but the sum is performed anyway without either warning or error sum(df$aaa==2) ##---------------------------------------------- Is there some way to make R issue either a warning or an error message in such a situation? I am using R version 2.15.1 64-bit on Windows 7 Professional. Thank you very much. Paulo Barata --------------------------------------------------------------------- Paulo Barata ENSP - Funda??o Oswaldo Cruz Rua Leopoldo Bulh?es 1480 - 8A 21041-210 Rio de Janeiro - RJ Brazil E-mail: paulo.barata at ensp.fiocruz.br
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean.
------- End of Original Message -------
Hi, On Tue, Jul 17, 2012 at 10:40 AM, Paulo Barata
<paulo.barata at ensp.fiocruz.br> wrote:
Dear Frans and Peter, Yes, the notation df[,'var'] is able to catch a non-existent variable var inside a data frame df. But the notation df$var isn't. So we have this situation, where two different notations, which (as far as I understand) perform the same action, have different kinds of response.
But they don't perform the same action: the defaults are different. This is documented, although verbosely and somewhat confusingly, see for instance ?"$" and pay particular attention to the sections on partial matching.
Couldn't this situation be fixed? Isn't it possible to make the df$var notation to issue an error when referring to a non-existent variable inside the data frame?
Not without completely changing the way partial matching is handled. The answer has already been offered: don't use $ - it's really just there as a shortcut, and like all shortcuts has attendant risks not found on the longer, smoother main road. Sarah
Thank you very much. Paulo Barata
Sarah Goslee http://www.functionaldiversity.org
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20120717/371e4453/attachment.pl>
Dear Bert and Sarah,
Thank you very much for your clarifications on this matter. I will
have to study more closely the way extracting subsets of data
structures is performed, and I will change my programming habits
accordingly.
Best regards,
Paulo Barata
---------------------------------------------------------------------
---------- Original Message -----------
From: Bert Gunter <gunter.berton at gene.com>
To: Paulo Barata <paulo.barata at ensp.fiocruz.br>
Cc: Frans Marcelissen <frans.marcelissen at digipsy.nl>, r-help at r-project.org,
ehlers at ucalgary.ca
Sent: Tue, 17 Jul 2012 08:06:57 -0700
Subject: {Link Suspeito} Re: [R] variable (column) in a data frame
Inline below. -- Bert On Tue, Jul 17, 2012 at 7:40 AM, Paulo Barata <paulo.barata at ensp.fiocruz.br>wrote:
Dear Frans and Peter, Yes, the notation df[,'var'] is able to catch a non-existent variable var inside a data frame df. But the notation df$var isn't. So we have this situation, where two different notations, which (as far as I understand) perform the same action, have different kinds of response. You don't understand far enough. Your assumption is simply not true. For
example, from ?"[" : "The most important distinction between [, [[ and $ is that the [ can select more than one element whereas the other two select a single element. The default methods work somewhat differently for atomic vectors, matrices/arrays and for recursive (list-like, see is.recursive<http://127.0.0.1:25542/library/base/help/is.recursive>) objects. $ is only valid for recursive objects, and is only discussed in the section below on recursive objects." So the Help page already notes that there are differences among them. Nevertheless, your discomfort is, imo, understandable. Extraction/replacement for data structures is a complex business, and R's approach to the issues have "evolved" over time, with "inconsistencies," especially for edge cases, baked in. Because these issues are at the very core of R's behavior, I think it likely that except for egregious inconsistencies and outright bugs -- which at this point are most unlikely to exist -- it is well nigh impossible to change them. I see no recourse but to always check such edge cases carefully and to be as consistent as possible in your own programming usage (e.g. always using [,".."] for extracting columns). As Peter has pointed out several times, the $ extractor is convenient syntactic sugar that can get one into a lot of trouble, and is probably best avoided. Cheers, Bert
Couldn't this situation be fixed? Isn't it possible to make the df$var notation to issue an error when referring to a non-existent variable inside the data frame? Thank you very much. Paulo Barata --------------------------------------------------------------------- ---------- Original Message ----------- From: "Frans Marcelissen" <frans.marcelissen at digipsy.nl> To: "'Paulo Barata'" <paulo.barata at ensp.fiocruz.br>, <r-help at r-project.org
Sent: Mon, 16 Jul 2012 14:25:21 +0200 Subject: RE: [R] variable (column) in a data frame
Hoi Pauli, There is a difference between two ways of accessing columns in a matrex:
df$aaa
NULL
df["AAA"]
Error in `[.data.frame`(df, "AAA") : undefined columns selected So df["AAA"] or df[,"AAA"] gives the error message you expect. ------------------- Frans -----Oorspronkelijk bericht----- Van: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] Namens Paulo Barata Verzonden: zondag 15 juli 2012 16:31 Aan: r-help at r-project.org Onderwerp: [R] variable (column) in a data frame To the R help list, When using a data frame, there is no warning or error message when I refer to a non-existent variable inside the data frame. Example: ##---------------------------------------------- a <- c(1,2,3) b <- c(11,22,33) df <- data.frame(a,b) df ## correct: there is a column in df named 'a' ## the sum is correctly performed sum(df$a==2) ## incorrect: there is no column in df named 'aaa', ## but the sum is performed anyway without either warning or error sum(df$aaa==2) ##---------------------------------------------- Is there some way to make R issue either a warning or an error message in such a situation? I am using R version 2.15.1 64-bit on Windows 7 Professional. Thank you very much. Paulo Barata --------------------------------------------------------------------- Paulo Barata ENSP - Funda??o Oswaldo Cruz Rua Leopoldo Bulh?es 1480 - 8A 21041-210 Rio de Janeiro - RJ Brazil E-mail: paulo.barata at ensp.fiocruz.br
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. ------- End of Original Message ------- ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
-- Bert Gunter Genentech Nonclinical Biostatistics Internal Contact Info: Phone: 467-7374 Website: http://pharmadevelopment.roche.com/index/pdb/pdb-functional- groups/pdb-biostatistics/pdb-ncb-home.htm -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean.
------- End of Original Message -------