Skip to content

Identify duplicate numbers and to increase a value

7 messages · Ortiz, John, Moritz Grenke, Joshua Wiley +4 more

#
Hi everybody.

I want to identify duplicate numbers and to increase a value of 0.01 for each time that it is duplicated.  

Example:
x=c(1,2,3,5,6,2,8,9,2,2)

I want to do this:

1
2 + 0.01
3
5 
6
2 + 0.02
8
9
2 + 0.03
2 + 0.04

I am trying to get something like this:

1
2.01
3
5
6
2.02
8
9
2.03
2.04

Actually I just know the way to identify the duplicated numbers

rbind(x, duplicated(x) | duplicated(x, fromLast=TRUE))

  [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
x    1    2    3    5    6    2    8    9    2     2
     0    1    0    0    0    1    0    0    1     1

Some advice?

Thanks and regards
John Ortiz
#
If you haven't got so much data a loop should do: 

while(sum(duplicated(x))>0) #if this condition is TRUE then there are still
duplicates in x
{	
	x[duplicated(x)] <- x[duplicated(x)]+0.01 #using duplicated(x) to
index the x vector
}

Hope this helps, 
Regards
Moritz

____________________
Moritz Grenke
http://www.360mix.de

-----Urspr?ngliche Nachricht-----
Von: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] Im
Auftrag von Ortiz, John
Gesendet: Donnerstag, 20. Januar 2011 16:13
An: r-help at r-project.org
Betreff: [R] Identify duplicate numbers and to increase a value

Hi everybody.

I want to identify duplicate numbers and to increase a value of 0.01 for
each time that it is duplicated.  

Example:
x=c(1,2,3,5,6,2,8,9,2,2)

I want to do this:

1
2 + 0.01
3
5 
6
2 + 0.02
8
9
2 + 0.03
2 + 0.04

I am trying to get something like this:

1
2.01
3
5
6
2.02
8
9
2.03
2.04

Actually I just know the way to identify the duplicated numbers

rbind(x, duplicated(x) | duplicated(x, fromLast=TRUE))

  [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
x    1    2    3    5    6    2    8    9    2     2
     0    1    0    0    0    1    0    0    1     1

Some advice?

Thanks and regards
John Ortiz

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
#
Hi John,

If you only have one duplicated number (e.g., just 2), then this will work:

x <- c(1,2,3,5,6,2,8,9,2,2)
xd <- duplicated(x)
x[xd] <- x[xd] + seq(sum(xd))/100
x

otherwise, I think a different framework than duplicated() will be
necessary, because it will matter not just if the number is duplicated
but which one how many times and where.

Cheers,

Josh
On Thu, Jan 20, 2011 at 7:12 AM, Ortiz, John <OrtizJ at si.edu> wrote:

  
    
#
Hello John,

If many numbers are duplicated, then one way is to coerce to a factor
and use the levels() function. For instance:

x <- c(1,1,2,2,2,3,3,4,1,1,2,4)
X <- factor(x)
for (i in levels(X))
{
	loc <- (X==i); len = length(loc)
	x[loc] <- x[loc] + 0.01 * (1:len)
}
x

 [1] 1.01 1.02 2.01 2.02 2.03 3.01 3.02 4.01 1.03 1.04 2.04 4.02

Hope that helps

James Lawrence
On Thu, 2011-01-20 at 08:00 -0800, Joshua Wiley wrote:
#
Your words made it sound like you wanted the
following
  > x + (ave(x, x, FUN=seq_along)-1)/100
   [1] 1.00 2.00 3.00 5.00 6.00 2.01 8.00 9.00 2.02 2.03
but your example indicates that you want to
alter any value that has a duplicate (including
the first) so it gets a bit more complicated.
E.g.,
  > x + ave(x, x, FUN=function(xi)if(length(xi)==1) 0.0 else
seq_along(xi))/100
   [1] 1.00 2.01 3.00 5.00 6.00 2.02 8.00 9.00 2.03 2.04
You could also use subscripting to use ave() only on
those elements of x which had duplicates.

There are trickier but faster ways (based on runs) of
doing this if you have very long vectors with lots of
unique values.

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com
#
On Thu, Jan 20, 2011 at 10:12 AM, Ortiz, John <OrtizJ at si.edu> wrote:
There is a function in the unreleased zooExtra package that will
uniquify numbers via linear interpolation:
[1] 1.00 2.00 3.00 5.00 6.00 2.25 8.00 9.00 2.50 2.75
[1] 1.0000 2.0000 3.0000 5.0000 6.0000 2.0025 8.0000 9.0000 2.0050 2.0075