Skip to content

suggesting a new feature for unique()

1 message · Liaw, Andy

#
Dear R-devel,

May I suggest that a new feature be added to a couple of unique() methods?
Sometimes it's useful to have the indices of the original data that the
unique elements come from, so that the original data can be recreated from
the unique()ed data.  I suggest that an `index' argument be added for
unique.  Below is a suggested patch against
R/src/library/base/R/duplicated.R:

*** R-devel/src/library/base/R/duplicated.R	Tue Jul 20 12:45:52 2004
--- duplicated.R	Thu Aug 19 14:50:38 2004
***************
*** 34,50 ****
  
  ## NB unique.default is used by factor to avoid unique.matrix,
  ## so it needs to handle some other cases.
! unique.default <- function(x, incomparables = FALSE, ...)
  {
      if(!is.logical(incomparables) || incomparables)
  	.NotYetUsed("incomparables != FALSE")
      z <- .Internal(unique(x))
      if(is.factor(x))
! 	factor(z, levels = seq(len=nlevels(x)), labels = levels(x),
!                ordered = is.ordered(x))
      else if(inherits(x, "POSIXct") || inherits(x, "Date"))
!         structure(z, class=class(x))
!     else z
  }
  
  unique.data.frame <- function(x, incomparables = FALSE, ...)
--- 34,51 ----
  
  ## NB unique.default is used by factor to avoid unique.matrix,
  ## so it needs to handle some other cases.
! unique.default <- function(x, incomparables = FALSE, index=FALSE, ...)
  {
      if(!is.logical(incomparables) || incomparables)
  	.NotYetUsed("incomparables != FALSE")
      z <- .Internal(unique(x))
      if(is.factor(x))
! 	z <- factor(z, levels = seq(len=nlevels(x)), labels = levels(x),
!                     ordered = is.ordered(x))
      else if(inherits(x, "POSIXct") || inherits(x, "Date"))
!         z <- structure(z, class=class(x))
!     if (index) attr(z, "index") <- match(x, z)
!     z
  }
  
  unique.data.frame <- function(x, incomparables = FALSE, ...)
***************
*** 55,61 ****
  }
  
  unique.matrix <- unique.array <-
!     function(x, incomparables = FALSE , MARGIN = 1, ...)
  {
      if(!is.logical(incomparables) || incomparables)
  	.NotYetUsed("incomparables != FALSE")
--- 56,62 ----
  }
  
  unique.matrix <- unique.array <-
!     function(x, incomparables = FALSE , MARGIN = 1, index=FALSE, ...)
  {
      if(!is.logical(incomparables) || incomparables)
  	.NotYetUsed("incomparables != FALSE")
***************
*** 66,70 ****
      args <- rep(alist(a=), ndim)
      names(args) <- NULL
      args[[MARGIN]] <- !duplicated(as.vector(temp))
!     do.call("[", c(list(x=x), args, list(drop=FALSE)))
  }
--- 67,76 ----
      args <- rep(alist(a=), ndim)
      names(args) <- NULL
      args[[MARGIN]] <- !duplicated(as.vector(temp))
!     res <- do.call("[", c(list(x=x), args, list(drop=FALSE)))
!     if (index) {
!         resTemp <- apply(res, MARGIN, function(x) paste(x, collapse =
"\r"))
!         attr(res, "index") <- match(temp, resTemp)
!     }
!     res
  }

An example usage:
[1] 4 2 5 3 2 3 4 2 2 3
[1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[1] 4 2 5 3
Levels: 2 3 4 5
[1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE


[I have not tried adding the same thing for the unique.data.frame method,
but that shouldn't be too hard...]

Best,
Andy

Andy Liaw, PhD
Biometrics Research      PO Box 2000, RY33-300     
Merck Research Labs           Rahway, NJ 07065
mailto:andy_liaw@merck.com        732-594-0820