aggregate taking way too long to count. - R-help

Tue, Feb 10, 2009 3:17 PM #

To quickly see if any duplicates exist you could use table() and
checking for entries that are more than 1.  Use na.omit()
on the entire data.frame before passing it to table.  E.g.,
   tmp <- with(na.omit(df1), table(parameter, station, site))
   sum(tmp>1) # number of parameter/station/site keys with >1 entry
That took 0.13 seconds on my machine, where your aggregate call took
18.42 seconds.

To keep only the first entry for a given key try something like
   df1.nodups <- df1[with(df1,
!duplicated(paste(sep="\1",parameter,station,site))),]
That is also very quick (0.06 seconds here).

Bill Dunlap
TIBCO Software Inc - Spotfire Division
wdunlap tibco.com

data

has

associate