Basic question for subset of dataframe
You have discovered two features of R with your example. Don told you about the first. Data frames are considered to be lists so if you provide only one index, you get the columns (the list elements) when you type
str(leadership)
'data.frame': 5 obs. of 10 variables: $ manager: num 1 2 3 4 5 $ date : chr "10/24/08" "10/28/08" "10/1/08" "10/12/08" ... $ country: chr "US" "US" "UK" "UK" ... $ gender : chr "M" "F" "F" "M" ... $ age : num 32 45 25 39 99 $ q1 : num 5 3 3 3 2 $ q2 : num 4 5 5 3 2 $ q3 : num 5 2 5 4 1 $ q4 : num 5 5 5 NA 2 $ q5 : num 5 5 2 NA 1 The second is that when you give R less than it is expecting, it often recycles what you gave it. You gave it a logical vector of five values:
leadership$country == "US"
[1] TRUE TRUE FALSE FALSE FALSE But there are 10 list elements so R recycled your vector to make it equal to the number of variables. As a result you got variables 1 and 2, skipped the next three, then 6 and 7, and skipped the last three. ------------------------------------- David L Carlson Department of Anthropology Texas A&M University College Station, TX 77840-4352 -----Original Message----- From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of Ivan Calandra Sent: Thursday, February 27, 2014 9:46 AM To: r-help at r-project.org Subject: Re: [R] Basic question for subset of dataframe Hi, Thanks for the example! I cannot really tell you why you get what you get when you type leadership[leadership$country == "US"] But what I know (or think I know) is that when you don't write the comma, R will take it as a condition for the columns. It means that leadership[1:2] is identical to leadership[,1:2] identical(leadership[1:2],leadership[,1:2]) [1] TRUE If you want all rows where "US" is present in "country", then you did it fine using leadership[leadership$country == "US", ] HTH, Ivan -- Ivan Calandra, ATER Universit? de Franche-Comt? UFR STGI - UMR 6249 Chrono-Environnement 4 Place Tharradin - BP 71427 25211 Montb?liard Cedex, FRANCE ivan.calandra at univ-fcomte.fr http://biogeosciences.u-bourgogne.fr/calandra Le 27/02/14 16:00, Kapil Shukla a ?crit :
All - firstly apology if this is a very basic question but i
tried myself
and could not find a satisfied answer. I know that i can subset a dataframe using
dataframe[row,column] and if i
give dataframe[row,] that specific row is provided and
similarly i can do
dataframe[,column] to get the entire column. what i don't understand is that if i do dataframe[<conditional expression>]and don't provide the 'comma' what is being
returned
e.g. i have the below code:
manager <- c(1, 2, 3, 4, 5)
date <- c("10/24/08", "10/28/08", "10/1/08", "10/12/08",
"5/1/09")
country <- c("US", "US", "UK", "UK", "UK")
gender <- c("M", "F", "F", "M", "F")
age <- c(32, 45, 25, 39, 99)
q1 <- c(5, 3, 3, 3, 2)
q2 <- c(4, 5, 5, 3, 2)
q3 <- c(5, 2, 5, 4, 1)
q4 <- c(5, 5, 5, NA, 2)
q5 <- c(5, 5, 2, NA, 1)
leadership <- data.frame(manager, date, country, gender, age,
q1, q2, q3,
q4, q5, stringsAsFactors=FALSE) now if i do leadership[leadership$country == "US",] two row are being returned as managerID JoinDate country gender age q1 q2 q3 q4 q5 agecat 1 1 10/24/08 US M 32 5 4 5 5 5 Young 2 2 10/28/08 US F 45 3 5 2 5 5 Young but if i do leadership[leadership$country == "US"] to get the entire data
frame
where country is US i am getting below managerID JoinDate q1 q2 agecat 1 1 10/24/08 5 4 Young 2 2 10/28/08 3 5 Young 3 3 10/1/08 3 5 Young 4 4 10/12/08 3 3 Young 5 5 5/1/09 2 2 <NA> Please guide me what am i doing wrong. Thanks [[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible
code.
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.