How to conditionally remove dataframe rows?

An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20130306/a18588c4/attachment.pl>

Hi,

I have a data frame with two columns. I need to remove duplicated rows in
first column, but I need to do it conditionally to values of the second
column.

Example:

       Point_counts       Psi_Sp

1            A                       0
2            A                       1
3            B                       1
4            B                       2
5            B                       0
6            C                       1
7            D                       1
8            D                       2

I need to turn this data frame in one without duplicated rows at
point-counts (one visit per point) but maintain the ones with maximum value
at Psi_Sp, e.g. remove row 1 and maintain 2 or remove rows 3 and 5 and
maintain 4. At the end I want a data frame like the one below:

Try this:

dfrm <- dfrm[ order(dfrm[[1]], -dfrm[[2]] ) , ]  
#put desired rows at top of each Point_counts category

# then take top item in each category

dfrm[ !duplicated(dfrm[[1]]) , ]
        Point_counts           Psi_Sp

1              A                           1
2              B                           2
3              C                           0
4              D                           2

How can I do it? I found several ways to edit data frames, but
unfortunately I cound not use none of them.

I appreciate

David Winsemius
Alameda, CA, USA
Hi,

dfrm<- read.table(text="
??????? Point_counts????? Psi_Sp

1??????????? A????????????????????? 0
2??????????? A????????????????????? 1
3??????????? B????????????????????? 1
4??????????? B????????????????????? 2
5??????????? B????????????????????? 0
6??????????? C????????????????????? 1
7??????????? D????????????????????? 1
8??????????? D????????????????????? 2
",sep="",header=TRUE,stringsAsFactors=FALSE)
?res<-do.call(rbind,lapply(split(dfrm,dfrm$Point_counts),function(x) x[which.max(x$Psi_Sp),]))
?row.names(res)<-1:nrow(res)
?# Point_counts Psi_Sp
#1??????????? A????? 1
#2??????????? B????? 2
#3??????????? C????? 1 #your input data doesn't have 0
#4??????????? D????? 2
A.K.

----- Original Message -----
From: Francisco Carvalho Diniz <chicocdiniz at gmail.com>
To: r-help at r-project.org
Cc: 
Sent: Wednesday, March 6, 2013 6:21 PM
Subject: [R] Fwd: How to conditionally remove dataframe rows?

Hi,

I have a data frame with two columns. I need to remove duplicated rows in
first column, but I need to do it conditionally to values of the second
column.

Example:

? ? ? ? Point_counts? ? ?  Psi_Sp

1? ? ? ? ? ? A? ? ? ? ? ? ? ? ? ? ?  0
2? ? ? ? ? ? A? ? ? ? ? ? ? ? ? ? ?  1
3? ? ? ? ? ? B? ? ? ? ? ? ? ? ? ? ?  1
4? ? ? ? ? ? B? ? ? ? ? ? ? ? ? ? ?  2
5? ? ? ? ? ? B? ? ? ? ? ? ? ? ? ? ?  0
6? ? ? ? ? ? C? ? ? ? ? ? ? ? ? ? ?  1
7? ? ? ? ? ? D? ? ? ? ? ? ? ? ? ? ?  1
8? ? ? ? ? ? D? ? ? ? ? ? ? ? ? ? ?  2

I need to turn this data frame in one without duplicated rows at
point-counts (one visit per point) but maintain the ones with maximum value
at Psi_Sp, e.g. remove row 1 and maintain 2 or remove rows 3 and 5 and
maintain 4. At the end I want a data frame like the one below:

? ? ? ?  Point_counts? ? ? ? ?  Psi_Sp

1? ? ? ? ? ? ? A? ? ? ? ? ? ? ? ? ? ? ? ?  1
2? ? ? ? ? ? ? B? ? ? ? ? ? ? ? ? ? ? ? ?  2
3? ? ? ? ? ? ? C? ? ? ? ? ? ? ? ? ? ? ? ?  0
4? ? ? ? ? ? ? D? ? ? ? ? ? ? ? ? ? ? ? ?  2

How can I do it? I found several ways to edit data frames, but
unfortunately I cound not use none of them.

I appreciate

Francisco

??? [[alternative HTML version deleted]]

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Just to add another option to what Arun has provided below. That approach is very generalizable to data frames with >2 columns, where you want to filter based upon a finding a maximum value (or other perhaps more complex criteria) within one or more grouping columns and return all of the columns in the original data frame.

In this special case of a two column data frame, you can use ?aggregate easily with a formula based approach that might be easier to read. aggregate() essentially encapsulates what Arun has done below.

Thus:
DF
Point_counts Psi_Sp
1            A      0
2            A      1
3            B      1
4            B      2
5            B      0
6            C      1
7            D      1
8            D      2
aggregate(Psi_Sp ~ Point_counts, data = DF, max)
Point_counts Psi_Sp
1            A      1
2            B      2
3            C      1
4            D      2

Regards,

Marc Schwartz

Hi,

dfrm<- read.table(text="
        Point_counts      Psi_Sp

1            A                      0
2            A                      1
3            B                      1
4            B                      2
5            B                      0
6            C                      1
7            D                      1
8            D                      2
",sep="",header=TRUE,stringsAsFactors=FALSE)
 res<-do.call(rbind,lapply(split(dfrm,dfrm$Point_counts),function(x) x[which.max(x$Psi_Sp),]))
 row.names(res)<-1:nrow(res)
 # Point_counts Psi_Sp
#1            A      1
#2            B      2
#3            C      1 #your input data doesn't have 0
#4            D      2
A.K.

----- Original Message -----
From: Francisco Carvalho Diniz <chicocdiniz at gmail.com>
To: r-help at r-project.org
Cc: 
Sent: Wednesday, March 6, 2013 6:21 PM
Subject: [R] Fwd: How to conditionally remove dataframe rows?

Hi,

I have a data frame with two columns. I need to remove duplicated rows in
first column, but I need to do it conditionally to values of the second
column.

Example:

        Point_counts       Psi_Sp

1            A                       0
2            A                       1
3            B                       1
4            B                       2
5            B                       0
6            C                       1
7            D                       1
8            D                       2

I need to turn this data frame in one without duplicated rows at
point-counts (one visit per point) but maintain the ones with maximum value
at Psi_Sp, e.g. remove row 1 and maintain 2 or remove rows 3 and 5 and
maintain 4. At the end I want a data frame like the one below:

         Point_counts           Psi_Sp

1              A                           1
2              B                           2
3              C                           0
4              D                           2

How can I do it? I found several ways to edit data frames, but
unfortunately I cound not use none of them.

I appreciate

Francisco