Skip to content

removing duplicated rows from a data.frame

3 messages · Liaw, Andy, Brian Ripley, Gary Collins

#
Should one of the suggestion be implemented as the unique method for
data.frame?  Or maybe uniquerows.data.frame?  Just a thought...  This is
probably nearly a FAQ.

Andy

-----Original Message-----
From: Prof Brian Ripley [mailto:ripley at stats.ox.ac.uk]
Sent: Wednesday, October 31, 2001 6:54 AM
To: Peter Dalgaard BSA
Cc: Gary Collins; r-help
Subject: Re: [R] removing duplicated rows from a data.frame
On 31 Oct 2001, Peter Dalgaard BSA wrote:

            
merge.data.frame does the equivalent of

mypaste <- function(...) paste(..., sep="\r")
do.call("mypaste", dfr)

which seems reliable enough.  Identical numerical data should
as.character identically, and embedded CRs are very rare in R character
strings.

As a test

data(iris)
duplicated(do.call("mypaste", iris))

(or duplicated(do.call("paste", c(iris, sep="\r"))) if you prefer a
one-liner).
#
On Wed, 31 Oct 2001, Liaw, Andy wrote:

            
Yes. I'd noted earler that S4 has unique.data.frame and
duplicated.data.frame via a variant on the paste method.

Will add.

  
    
#
Dear All,
Thanks. Prof. Ripleys approach worked perfectly. I implemented a quick and
durty approach via Andy Liaws suggestion via a unique.data.frame, and called
it by unique(), and tried it on about 50 dfs with no problems.
Ta.
Gary.
On Wed, 31 Oct 2001, Liaw, Andy wrote:

            
Yes. I'd noted earler that S4 has unique.data.frame and
duplicated.data.frame via a variant on the paste method.

Will add.
function(x,y)ifelse(is.na(x),is.na(y),ifelse(is.na(y),FALSE,x==y))
-.
http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._.
--
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272860 (secr)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

__________________________________________________
Dr. Gary S. Collins,
Statistics Research Fellow,
Quality of Life Unit,
European Organisation for Research and Treatment of Cancer,
EORTC Data Center,
Avenue E. Mounier 83, bte. 11,
B-1200 Brussels, Belgium.

Tel: +32 2 774 1 606
Fax: +32 2 779 4 568
http://www.eortc.be/home/qol/
__________________________________________________



-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._