Skip to content
Prev 5019 / 63421 Next

Proposal: Generalizing unique() and duplicated()

Prof. Ripley wrote on r-help:
Hmmm... couldn't one build on this in order to generalize the 
unique() function?

I'm asking because when I once tried to use unique() on a matrix (to collapse 
duplicate rows), I found that it and duplicated() work only on vectors. I 
think a generalization, at least for matrices and simple data.frames, would 
be useful.

I tried my hand at it and came up with this:

----------------------------------------------------

"unique.default" <- get("unique", pos="package:base")    # old version becomes
                                                         # default behaviour
"unique" <- function(object, ...)
{
   if (data.class(object)=="matrix")
       return(unique.matrix(object, ...))
   else
       UseMethod("unique")      # doesn't seem to work for matrices, hence 
}                               # the condition
                         


"duplicated.default" <- get("duplicated", pos="package:base")	

"duplicated" <- function(object, ...)
{
   if (data.class(object)=="matrix")
       return(duplicated.matrix(object, ...))
   else
       UseMethod("duplicated")  
}


"duplicated.matrix" <-
  function(mat, MARGIN=1)    # defaulting to work on rows
{
  strvect <- drop(apply(mat, MARGIN, function(x) paste(x, collapse = "\r")))
  return(duplicated(strvect))
}


"unique.matrix" <-
  function(mat, MARGIN=1)    # defaulting to work on rows
{
  dup <- duplicated(mat, MARGIN)
  return(if (MARGIN==1) mat[!dup,] else mat[,!dup])
}


"duplicated.data.frame" <-
  function(df, MARGIN=1)
{
  strvect <- drop(apply(as.matrix(df), MARGIN, function(x) paste(x, collapse 
= "\r")))
  duplicated(strvect)
}


"unique.data.frame" <-
  function(df, MARGIN=1)
{
  dup <- duplicated(df, MARGIN)
  return(if (MARGIN==1) df[!dup,] else df[,!dup])
}

----------------------------------------------------

I couldn't figure out how to generalize to more than two dimensions (more 
accurately, how to subset in the dimension given by the variable MARGIN). 

Does anybody else consider this useful?


Cheers

Kaspar Pflugshaupt
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-devel-request@stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._