help: program efficiency
If the input vector t is known to be ordered
(or if you only care about runs of duplicated
values, not all duplicated values) the following
is pretty quick
nodup3 <- function (t) {
t + (sequence(rle(t)$lengths) - 1)/100
}
If you don't know if the the input will be ordered
then ave() will do it a bit faster than your
code
nodup2 <- function (t) {
ave(t, t, FUN = function(x) x + (seq_along(x) - 1)/100)
}
E.g., for a sorted sequence of 300,000 numbers drawn with
replacement from 1:100,000 I get:
a2 <- sort(sample(1:1e5, size=3e5, replace=TRUE)) system.time(v <- nodup(a2))
user system elapsed 2.78 0.05 3.97
system.time(v2 <- nodup2(a2))
user system elapsed 1.83 0.02 2.66
system.time(v3 <- nodup3(a2))
user system elapsed 0.18 0.00 0.14
identical(v,v2) && identical(v,v3)
[1] TRUE
If speed is truly an issue, the built-in sequence may
be replaced by a faster one that does the same thing:
nodup3a <- function (t) {
faster.sequence <- function(nvec) {
seq_len(sum(nvec)) - rep(cumsum(c(0L, nvec[-length(nvec)])),
nvec)
}
t + (faster.sequence(rle(t)$lengths) - 1)/100
}
That took 0.05 seconds on the a2 dataset and produced
identical results.
Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com
-----Original Message-----
From: r-help-bounces at r-project.org
[mailto:r-help-bounces at r-project.org] On Behalf Of randomcz
Sent: Thursday, November 25, 2010 6:49 AM
To: r-help at r-project.org
Subject: [R] help: program efficiency
hey guys,
I am working on a function to make a duplicated value unique.
For example,
the original vector would be like : a = c(2,1,1,3,3,3,4)
I'll like to transform it into:
a.nodup = 2, 1.01, 1.02, 3.01, 3.02, 3.03, 4
basically, find the duplicates and assign a unique value by
adding a small
amount and keep it in order.
I come up with the following codes, but it runs slow if t is
large. Is there
a better way to do it?
nodup = function(t)
{
t.index=0
t.dup=duplicated(t)
for (i in 2:length(t))
{
if (t.dup[i]==T)
t.index=t.index+0.01
else t.index=0
t[i]=t[i]+t.index
}
return(t)
}
--
View this message in context:
http://r.789695.n4.nabble.com/help-program-efficiency-tp305907
9p3059079.html
Sent from the R help mailing list archive at Nabble.com.
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.