Skip to content

variable (column) in a data frame

8 messages · Paulo Barata, John Kane, arun +3 more

#
To the R help list,

When using a data frame, there is no warning or error message 
when I refer to a non-existent variable inside the data frame.

Example:

##----------------------------------------------

a <- c(1,2,3)
b <- c(11,22,33)
df <- data.frame(a,b)
df

## correct: there is a column in df named 'a'
## the sum is correctly performed
sum(df$a==2)

## incorrect: there is no column in df named 'aaa', 
## but the sum is performed anyway without either warning or error
sum(df$aaa==2)

##----------------------------------------------

Is there some way to make R issue either a warning or an error
message in such a situation?

I am using R version 2.15.1 64-bit on Windows 7 Professional.

Thank you very much.

Paulo Barata

---------------------------------------------------------------------
Paulo Barata

ENSP - Funda??o Oswaldo Cruz
Rua Leopoldo Bulh?es 1480 - 8A
21041-210  Rio de Janeiro - RJ
Brazil
E-mail: paulo.barata at ensp.fiocruz.br
#
This seems more or less correct to me.

1> sum(df$a==1)
[1] 1
1> sum(df$a==2)
[1] 1
1> sum(df$aaa==2)
[1] 0

There is no df$aaa so the length is 0 which is what I think you are asking.
What am I missing?


John Kane
Kingston ON Canada
____________________________________________________________
GET FREE SMILEYS FOR YOUR IM & EMAIL - Learn more at http://www.inbox.com/smileys
Works with AIM?, MSN? Messenger, Yahoo!? Messenger, ICQ?, Google Talk? and most webmails
#
Hi,

I guess you can try this:

#You will get the same result here:

?df$aaa==2
logical(0)
!df$aaa==2
logical(0)
#But it is different for the variable present in the dataframe

?df$a==4
[1] FALSE FALSE FALSE
?!df$a==4
[1] TRUE TRUE TRUE
?identical(df$aaa==2,!df$aaa==2)
[1] TRUE
?identical(df$a==4,!df$a==4)
[1] FALSE


A.K.






----- Original Message -----
From: Paulo Barata <paulo.barata at ensp.fiocruz.br>
To: r-help at r-project.org
Cc: 
Sent: Sunday, July 15, 2012 10:30 AM
Subject: [R] variable (column) in a data frame


To the R help list,

When using a data frame, there is no warning or error message 
when I refer to a non-existent variable inside the data frame.

Example:

##----------------------------------------------

a <- c(1,2,3)
b <- c(11,22,33)
df <- data.frame(a,b)
df

## correct: there is a column in df named 'a'
## the sum is correctly performed
sum(df$a==2)

## incorrect: there is no column in df named 'aaa', 
## but the sum is performed anyway without either warning or error
sum(df$aaa==2)

##----------------------------------------------

Is there some way to make R issue either a warning or an error
message in such a situation?

I am using R version 2.15.1 64-bit on Windows 7 Professional.

Thank you very much.

Paulo Barata

---------------------------------------------------------------------
Paulo Barata

ENSP - Funda??o Oswaldo Cruz
Rua Leopoldo Bulh?es 1480 - 8A
21041-210? Rio de Janeiro - RJ
Brazil
E-mail: paulo.barata at ensp.fiocruz.br

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
#
Hoi Pauli,
There is a difference between two ways of accessing columns in a matrex:
NULL
Error in `[.data.frame`(df, "AAA") : undefined columns selected
So df["AAA"] or df[,"AAA"] gives the error message you expect.
-------------------
Frans


-----Oorspronkelijk bericht-----
Van: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org]
Namens Paulo Barata
Verzonden: zondag 15 juli 2012 16:31
Aan: r-help at r-project.org
Onderwerp: [R] variable (column) in a data frame


To the R help list,

When using a data frame, there is no warning or error message when I refer
to a non-existent variable inside the data frame.

Example:

##----------------------------------------------

a <- c(1,2,3)
b <- c(11,22,33)
df <- data.frame(a,b)
df

## correct: there is a column in df named 'a'
## the sum is correctly performed
sum(df$a==2)

## incorrect: there is no column in df named 'aaa', ## but the sum is
performed anyway without either warning or error
sum(df$aaa==2)

##----------------------------------------------

Is there some way to make R issue either a warning or an error message in
such a situation?

I am using R version 2.15.1 64-bit on Windows 7 Professional.

Thank you very much.

Paulo Barata

---------------------------------------------------------------------
Paulo Barata

ENSP - Funda??o Oswaldo Cruz
Rua Leopoldo Bulh?es 1480 - 8A
21041-210  Rio de Janeiro - RJ
Brazil
E-mail: paulo.barata at ensp.fiocruz.br

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
1 day later
#
Dear Frans and Peter,

Yes, the notation df[,'var'] is able to catch a non-existent
variable var inside a data frame df. But the notation df$var
isn't. 

So we have this situation, where two different notations, which
(as far as I understand) perform the same action, have different
kinds of response.

Couldn't this situation be fixed? Isn't it possible to make the 
df$var notation to issue an error when referring to a non-existent
variable inside the data frame?

Thank you very much.

Paulo Barata

---------------------------------------------------------------------


---------- Original Message -----------
From: "Frans Marcelissen" <frans.marcelissen at digipsy.nl>
To: "'Paulo Barata'" <paulo.barata at ensp.fiocruz.br>, <r-help at r-project.org>
Sent: Mon, 16 Jul 2012 14:25:21 +0200
Subject: RE: [R] variable (column) in a data frame
------- End of Original Message -------
#
Hi,

On Tue, Jul 17, 2012 at 10:40 AM, Paulo Barata
<paulo.barata at ensp.fiocruz.br> wrote:
But they don't perform the same action: the defaults are different.
This is documented, although verbosely and somewhat confusingly, see
for instance ?"$" and pay particular attention to the sections on
partial matching.
Not without completely changing the way partial matching is handled.
The answer has already been offered: don't use $ - it's really just
there as a shortcut, and like all shortcuts has attendant risks not
found on the longer, smoother main road.

Sarah

  
    
#
Dear Bert and Sarah,

Thank you very much for your clarifications on this matter. I will
have to study more closely the way extracting subsets of data
structures is performed, and I will change my programming habits 
accordingly.

Best regards,

Paulo Barata

---------------------------------------------------------------------


---------- Original Message -----------
From: Bert Gunter <gunter.berton at gene.com>
To: Paulo Barata <paulo.barata at ensp.fiocruz.br>
Cc: Frans Marcelissen <frans.marcelissen at digipsy.nl>, r-help at r-project.org,
ehlers at ucalgary.ca
Sent: Tue, 17 Jul 2012 08:06:57 -0700
Subject: {Link Suspeito} Re: [R] variable (column) in a data frame
------- End of Original Message -------