Skip to content
Prev 172537 / 398503 Next

Selecting one row or multiple rows per ID

On Wed, Mar 4, 2009 at 12:09 AM, Vedula, Satyanarayana
<svedula at jhsph.edu> wrote:
I'd approach this problem in the following way:

df <- read.csv(textConnection("
Patient,Clinic,Visit,Outcome_left,Outcome_right
patient 1,clinic 1,visit 2,22,21
patient 1,clinic 3,visit 1,21,21
patient 1,clinic 3,visit 2,21,22
patient 1,clinic 3,visit 3,20,22
patient 3,clinic 5,visit 1,24,21
patient 3,clinic 5,visit 3,21,22
patient 3,clinic 5,visit 4,22,23
patient 3,clinic 5,visit 5,22,22
"), header = T)
closeAllConnections()


# With a single patient it's pretty easy to find the preferred visit
preferred_visit <- paste("visit", c(2, 5, 4, 3, 1))

one <- subset(df, Patient == "patient 3" & Clinic == "clinic 5")
best_visit <- na.omit(match(preferred_visit, one$Visit))[1]
one[best_visit, ]

# We then turn this into a function
find_best_visit <- function(one) {
  best_visit <- na.omit(match(preferred_visit, one$Visit))[1]
  one[best_visit, ]
}

# Then apply it to every combination of patient and clinic with plyr
ddply(df, .(Patient, Clinic), find_best_visit)

# You can learn more about plyr at http://had.co.nz/plyr


Hadley