Finding (swapped) repetitions of numbers pairs across two columns

Hi,

You could also use:
apply(cbind(v1,v2),1,function(x) x[order(x)])
#or
unique(t(apply(cbind(v1,v2),1,sort.int,method="quick")))

By comparing different methods:
set.seed(51)
v1<-sample(0:9,1e5,replace=TRUE)
set.seed(49)
v2<-sample(0:9,1e5,replace=TRUE)
system.time(res1<-unique(t(apply(cbind(v1, v2), 1, sort))))
# user? system elapsed 
# 11.373?? 0.188? 11.918 

system.time(res2<-unique(t(apply(cbind(v1,v2),1,sort.int,method="quick"))))
#?? user? system elapsed 
#? 7.088?? 0.120?? 7.446 

?identical(res1,res2)
#[1] TRUE
?system.time(res3 <- unique(t(apply(cbind(v1,v2),1,function(x) x[order(x)])))) #found to be faster
#?? user? system elapsed 
#? 2.693?? 0.072?? 2.857 

?identical(res1,res3)
#[1] TRUE

A.K.

----- Original Message -----
From: Emmanuel Levy <emmanuel.levy at gmail.com>
To: R-help Mailing List <r-help at r-project.org>
Cc: 
Sent: Thursday, December 27, 2012 3:30 PM
Subject: [R] Finding (swapped) repetitions of numbers pairs across two columns

Hi,

I've had this problem for a while and tackled it is a quite dirty way
so I'm wondering is a better solution exists:

If we have two vectors:

v1 = c(0,1,2,3,4)
v2 = c(5,3,2,1,0)

How to remove one instance of the "3,1" / "1,3" double?

At the moment I'm using the following solution, which is quite horrible:

v1 = c(0,1,2,3,4)
v2 = c(5,3,2,1,0)
ft <- cbind(v1, v2)
direction = apply( ft, 1, function(x) return(x[1]>x[2]))
ft.tmp = ft
ft[which(direction),1] = ft.tmp[which(direction),2]
ft[which(direction),2] = ft.tmp[which(direction),1]
uniques? ?  = apply( ft, 1, function(x) paste(x, collapse="%") )
uniques? ?  = unique(uniques)
ft.unique?  = matrix(unlist(strsplit(uniques,"%")), ncol=2, byrow=TRUE)

Any better solution would be very welcome!

All the best,

Emmanuel

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Finding (swapped) repetitions of numbers pairs across two columns

Thread (5 messages)