Skip to content

Basic question for subset of dataframe

4 messages · Kapil Shukla, MacQueen, Don, Ivan Calandra +1 more

#
Try a simpler example:
x a   c  y
1 1 a Jan 11
2 2 b Feb 12
3 3 c Mar 13
4 4 d Apr 14
5 5 e May 15
a
1 a
2 b
3 c
4 d
5 e
c
1 Jan
2 Feb
3 Mar
4 Apr
5 May

If you use [] without a comma, it returns the specified columns.

  ick[ c(FALSE,TRUE,TRUE,FALSE) ]

will return the second and third columns, those where the logical vector
is TRUE.

This is because data frames are actually lists in disguise
-Don
#
Hi,

Thanks for the example!

I cannot really tell you why you get what you get when you type 
leadership[leadership$country == "US"]

But what I know (or think I know) is that when you don't write the 
comma, R will take it as a condition for the columns.
It means that leadership[1:2] is identical to leadership[,1:2]
identical(leadership[1:2],leadership[,1:2])
[1] TRUE

If you want all rows where "US" is present in "country", then you did it 
fine using leadership[leadership$country == "US", ]

HTH,
Ivan

--
Ivan Calandra, ATER
Universit? de Franche-Comt?
UFR STGI - UMR 6249 Chrono-Environnement
4 Place Tharradin - BP 71427
25211 Montb?liard Cedex, FRANCE
ivan.calandra at univ-fcomte.fr
http://biogeosciences.u-bourgogne.fr/calandra

Le 27/02/14 16:00, Kapil Shukla a ?crit :
#
You have discovered two features of R with your example. Don
told you about the first. Data frames are considered to be lists
so if you provide only one index, you get the columns (the list
elements) when you type
'data.frame':   5 obs. of  10 variables:
 $ manager: num  1 2 3 4 5
 $ date   : chr  "10/24/08" "10/28/08" "10/1/08" "10/12/08" ...
 $ country: chr  "US" "US" "UK" "UK" ...
 $ gender : chr  "M" "F" "F" "M" ...
 $ age    : num  32 45 25 39 99
 $ q1     : num  5 3 3 3 2
 $ q2     : num  4 5 5 3 2
 $ q3     : num  5 2 5 4 1
 $ q4     : num  5 5 5 NA 2
 $ q5     : num  5 5 2 NA 1

The second is that when you give R less than it is expecting, it
often recycles what you gave it. You gave it a logical vector of
five values:
[1]  TRUE  TRUE FALSE FALSE FALSE

But there are 10 list elements so R recycled your vector to make
it equal to the number of variables. As a result you got
variables 1 and 2, skipped the next three, then 6 and 7, and
skipped the last three.

-------------------------------------
David L Carlson
Department of Anthropology
Texas A&M University
College Station, TX 77840-4352

-----Original Message-----
From: r-help-bounces at r-project.org
[mailto:r-help-bounces at r-project.org] On Behalf Of Ivan Calandra
Sent: Thursday, February 27, 2014 9:46 AM
To: r-help at r-project.org
Subject: Re: [R] Basic question for subset of dataframe

Hi,

Thanks for the example!

I cannot really tell you why you get what you get when you type 
leadership[leadership$country == "US"]

But what I know (or think I know) is that when you don't write
the 
comma, R will take it as a condition for the columns.
It means that leadership[1:2] is identical to leadership[,1:2]
identical(leadership[1:2],leadership[,1:2])
[1] TRUE

If you want all rows where "US" is present in "country", then
you did it 
fine using leadership[leadership$country == "US", ]

HTH,
Ivan

--
Ivan Calandra, ATER
Universit? de Franche-Comt?
UFR STGI - UMR 6249 Chrono-Environnement
4 Place Tharradin - BP 71427
25211 Montb?liard Cedex, FRANCE
ivan.calandra at univ-fcomte.fr
http://biogeosciences.u-bourgogne.fr/calandra

Le 27/02/14 16:00, Kapil Shukla a ?crit :
tried myself
dataframe[row,column] and if i
similarly i can do
returned
"5/1/09")
q1, q2, q3,
frame
http://www.R-project.org/posting-guide.html
code.
______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible
code.