Skip to content

aggregate taking way too long to count.

2 messages · William Dunlap, Seeliger.Curt at epamail.epa.gov

#
To quickly see if any duplicates exist you could use table() and
checking for entries that are more than 1.  Use na.omit()
on the entire data.frame before passing it to table.  E.g.,
   tmp <- with(na.omit(df1), table(parameter, station, site))
   sum(tmp>1) # number of parameter/station/site keys with >1 entry
That took 0.13 seconds on my machine, where your aggregate call took
18.42 seconds.

To keep only the first entry for a given key try something like
   df1.nodups <- df1[with(df1,
!duplicated(paste(sep="\1",parameter,station,site))),]
That is also very quick (0.06 seconds here).

Bill Dunlap
TIBCO Software Inc - Spotfire Division
wdunlap tibco.com
a
data
has
associate