An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20130306/a18588c4/attachment.pl>
How to conditionally remove dataframe rows?
4 messages · Francisco Carvalho Diniz, David Winsemius, arun +1 more
On Mar 6, 2013, at 3:21 PM, Francisco Carvalho Diniz wrote:
Hi,
I have a data frame with two columns. I need to remove duplicated rows in
first column, but I need to do it conditionally to values of the second
column.
Example:
Point_counts Psi_Sp
1 A 0
2 A 1
3 B 1
4 B 2
5 B 0
6 C 1
7 D 1
8 D 2
I need to turn this data frame in one without duplicated rows at
point-counts (one visit per point) but maintain the ones with maximum value
at Psi_Sp, e.g. remove row 1 and maintain 2 or remove rows 3 and 5 and
maintain 4. At the end I want a data frame like the one below:
Try this: dfrm <- dfrm[ order(dfrm[[1]], -dfrm[[2]] ) , ] #put desired rows at top of each Point_counts category # then take top item in each category dfrm[ !duplicated(dfrm[[1]]) , ]
Point_counts Psi_Sp 1 A 1 2 B 2 3 C 0 4 D 2 How can I do it? I found several ways to edit data frames, but unfortunately I cound not use none of them. I appreciate
David Winsemius Alameda, CA, USA
Hi, dfrm<- read.table(text=" ??????? Point_counts????? Psi_Sp 1??????????? A????????????????????? 0 2??????????? A????????????????????? 1 3??????????? B????????????????????? 1 4??????????? B????????????????????? 2 5??????????? B????????????????????? 0 6??????????? C????????????????????? 1 7??????????? D????????????????????? 1 8??????????? D????????????????????? 2 ",sep="",header=TRUE,stringsAsFactors=FALSE) ?res<-do.call(rbind,lapply(split(dfrm,dfrm$Point_counts),function(x) x[which.max(x$Psi_Sp),])) ?row.names(res)<-1:nrow(res) ?# Point_counts Psi_Sp #1??????????? A????? 1 #2??????????? B????? 2 #3??????????? C????? 1 #your input data doesn't have 0 #4??????????? D????? 2 A.K. ----- Original Message ----- From: Francisco Carvalho Diniz <chicocdiniz at gmail.com> To: r-help at r-project.org Cc: Sent: Wednesday, March 6, 2013 6:21 PM Subject: [R] Fwd: How to conditionally remove dataframe rows? Hi, I have a data frame with two columns. I need to remove duplicated rows in first column, but I need to do it conditionally to values of the second column. Example: ? ? ? ? Point_counts? ? ? Psi_Sp 1? ? ? ? ? ? A? ? ? ? ? ? ? ? ? ? ? 0 2? ? ? ? ? ? A? ? ? ? ? ? ? ? ? ? ? 1 3? ? ? ? ? ? B? ? ? ? ? ? ? ? ? ? ? 1 4? ? ? ? ? ? B? ? ? ? ? ? ? ? ? ? ? 2 5? ? ? ? ? ? B? ? ? ? ? ? ? ? ? ? ? 0 6? ? ? ? ? ? C? ? ? ? ? ? ? ? ? ? ? 1 7? ? ? ? ? ? D? ? ? ? ? ? ? ? ? ? ? 1 8? ? ? ? ? ? D? ? ? ? ? ? ? ? ? ? ? 2 I need to turn this data frame in one without duplicated rows at point-counts (one visit per point) but maintain the ones with maximum value at Psi_Sp, e.g. remove row 1 and maintain 2 or remove rows 3 and 5 and maintain 4. At the end I want a data frame like the one below: ? ? ? ? Point_counts? ? ? ? ? Psi_Sp 1? ? ? ? ? ? ? A? ? ? ? ? ? ? ? ? ? ? ? ? 1 2? ? ? ? ? ? ? B? ? ? ? ? ? ? ? ? ? ? ? ? 2 3? ? ? ? ? ? ? C? ? ? ? ? ? ? ? ? ? ? ? ? 0 4? ? ? ? ? ? ? D? ? ? ? ? ? ? ? ? ? ? ? ? 2 How can I do it? I found several ways to edit data frames, but unfortunately I cound not use none of them. I appreciate Francisco ??? [[alternative HTML version deleted]] ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Just to add another option to what Arun has provided below. That approach is very generalizable to data frames with >2 columns, where you want to filter based upon a finding a maximum value (or other perhaps more complex criteria) within one or more grouping columns and return all of the columns in the original data frame. In this special case of a two column data frame, you can use ?aggregate easily with a formula based approach that might be easier to read. aggregate() essentially encapsulates what Arun has done below. Thus:
DF
Point_counts Psi_Sp 1 A 0 2 A 1 3 B 1 4 B 2 5 B 0 6 C 1 7 D 1 8 D 2
aggregate(Psi_Sp ~ Point_counts, data = DF, max)
Point_counts Psi_Sp 1 A 1 2 B 2 3 C 1 4 D 2 Regards, Marc Schwartz
On Mar 6, 2013, at 8:42 PM, arun <smartpink111 at yahoo.com> wrote:
Hi,
dfrm<- read.table(text="
Point_counts Psi_Sp
1 A 0
2 A 1
3 B 1
4 B 2
5 B 0
6 C 1
7 D 1
8 D 2
",sep="",header=TRUE,stringsAsFactors=FALSE)
res<-do.call(rbind,lapply(split(dfrm,dfrm$Point_counts),function(x) x[which.max(x$Psi_Sp),]))
row.names(res)<-1:nrow(res)
# Point_counts Psi_Sp
#1 A 1
#2 B 2
#3 C 1 #your input data doesn't have 0
#4 D 2
A.K.
----- Original Message -----
From: Francisco Carvalho Diniz <chicocdiniz at gmail.com>
To: r-help at r-project.org
Cc:
Sent: Wednesday, March 6, 2013 6:21 PM
Subject: [R] Fwd: How to conditionally remove dataframe rows?
Hi,
I have a data frame with two columns. I need to remove duplicated rows in
first column, but I need to do it conditionally to values of the second
column.
Example:
Point_counts Psi_Sp
1 A 0
2 A 1
3 B 1
4 B 2
5 B 0
6 C 1
7 D 1
8 D 2
I need to turn this data frame in one without duplicated rows at
point-counts (one visit per point) but maintain the ones with maximum value
at Psi_Sp, e.g. remove row 1 and maintain 2 or remove rows 3 and 5 and
maintain 4. At the end I want a data frame like the one below:
Point_counts Psi_Sp
1 A 1
2 B 2
3 C 0
4 D 2
How can I do it? I found several ways to edit data frames, but
unfortunately I cound not use none of them.
I appreciate
Francisco