Skip to content

selecting rows by maximum value of one variables in dataframe nested by another Variable

4 messages · Miriam, PIKAL Petr, Rui Barradas +1 more

#
How could I select the rows of a dataset that have the maximum value in one variable and to do this nested in another variable. It is a dataframe in long format with repeated measures per subject.  
I was not successful using aggregate, because one of the columns has character values (and/or possibly because of another reason).
I would like to transfer something like this: 
subject    time.ms  V3 
1		1	stringA
1		12	stringB
1		22  	stringC
2		1 	stringB
2		14	stringC
2		25	stringA
?. 
To something like this: 
subject	   time.ms	V3
1		22	stringC
2		25 	stringA
? 

Thank you very much for you help!
Miriam
#
Hi
dataframe
You could do it by aggregate and subsequent selection matching values from 
your data frame but it is perfect example for powerfull list operations
x[which.max(x[,2]),]))
  subject time.ms      V3
1       1      22 stringC
2       2      25 stringA
split splits data frame test according to subject variable into list of 
sub data frames
function x computes which is maximum value in second column in each sub 
data frame and selects the appropriate row
do.call takes the list and rbinds it to one final data frame.

Regards
Petr
http://www.R-project.org/posting-guide.html
#
Hello,

Here's a solution using aggregate and merge. I've kept it in two steps 
for clarity.



d <- read.table(text="
subject    time.ms  V3
1      1   stringA
1      12   stringB
1      22     stringC
2      1    stringB
2      14   stringC
2      25   stringA
", header=TRUE)

ag <- aggregate(time.ms~subject, data=d, max)
merge(ag, d)

# It also works if the maximum is not unique
d2 <- rbind(d, c(1, 22, "stringA"))
ag2 <- aggregate(time.ms~subject, data=d2, max)
merge(ag2, d2)


The split version would have to be slightly modified, to make use of 
'which' and 'max' separately.


do.call("rbind",lapply(split(d2, d2$subject), function(x)
	x[which(x[, 2] == max(x[, 2])), ]))

Hope this helps,

Rui Barradas


Em 27-06-2012 09:30, Petr PIKAL escreveu:
#
HI,

Try this:
dat1 <- read.table(text="
subject??? time.ms V3 
1????? 1? stringA
1????? 12? stringB
1????? 22??? stringC
2????? 1??? stringB
2????? 14? stringC
2????? 25? stringA
", sep="",header=TRUE)
dat2<-aggregate(dat1$time.ms,list(dat1$subject),max)
colnames(dat2)<-c("subject","time.ms")


?merge(dat2,dat1)
? subject time.ms????? V3
1?????? 1????? 22 stringC
2?????? 2????? 25 stringA

A.K.




----- Original Message -----
From: Miriam <Miriam_Katharina at gmx.de>
To: r-help at r-project.org
Cc: 
Sent: Tuesday, June 26, 2012 5:21 PM
Subject: [R] selecting rows by maximum value of one variables in dataframe nested by another Variable

How could I select the rows of a dataset that have the maximum value in one variable and to do this nested in another variable. It is a dataframe in long format with repeated measures per subject.? 
I was not successful using aggregate, because one of the columns has character values (and/or possibly because of another reason).
I would like to transfer something like this: 
subject? ? time.ms? V3 
1??? ??? 1??? stringA
1??? ??? 12??? stringB
1??? ??? 22? ??? stringC
2??? ??? 1 ??? stringB
2??? ??? 14??? stringC
2??? ??? 25??? stringA
?. 
To something like this: 
subject??? ?  time.ms??? V3
1??? ??? 22??? stringC
2??? ??? 25 ??? stringA
? 

Thank you very much for you help!
Miriam