Intersecting two matrices
In that case, you should be looking at a relational inner join, perhaps with SQLite (see package sqldf).
---------------------------------------------------------------------------
Jeff Newmiller The ..... ..... Go Live...
DCN:<jdnewmil at dcn.davis.ca.us> Basics: ##.#. ##.#. Live Go...
Live: OO#.. Dead: OO#.. Playing
Research Engineer (Solar/Batteries O.O#. #.O#. with
/Software/Embedded Controllers) .OO#. .OO#. rocks...1k
---------------------------------------------------------------------------
Sent from my phone. Please excuse my brevity.
c char <charlie.hsia.us at gmail.com> wrote:
Thanks a lot. Still looking for some super fast and memory efficient solution, as the matrix I have in real world has billions of rows. On Mon, Jul 29, 2013 at 6:24 PM, William Dunlap <wdunlap at tibco.com> wrote:
I haven't looked at the size-time relationship, but im2 (below) is
faster
than your
function on at least one example:
intersectMat <- function(mat1, mat2)
{
#mat1 and mat2 are both deduplicated
nr1 <- nrow(mat1)
nr2 <- nrow(mat2)
mat2[duplicated(rbind(mat1, mat2))[(nr1 + 1):(nr1 + nr2)], ,
drop=FALSE]
}
im2 <- function(mat1, mat2)
{
stopifnot(ncol(mat1)==2, ncol(mat1)==ncol(mat2))
toChar <- function(twoColMat) paste(sep="\1", twoColMat[,1],
twoColMat[,2])
mat1[match(toChar(mat2), toChar(mat1), nomatch=0), , drop=FALSE]
}
m1 <- cbind(1:1e7, rep(1:10, len=1e7)) m2 <- cbind(1:1e7, rep(1:20, len=1e7)) system.time(r1 <- intersectMat(m1,m2))
user system elapsed 430.37 1.96 433.98
system.time(r2 <- im2(m1,m2))
user system elapsed 27.89 0.20 28.13
identical(r1, r2)
[1] TRUE
dim(r1)
[1] 5000000 2 Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com
-----Original Message----- From: r-help-bounces at r-project.org
[mailto:r-help-bounces at r-project.org]
On Behalf
Of c char Sent: Monday, July 29, 2013 4:04 PM To: r-help at r-project.org Subject: [R] Intersecting two matrices Dear all, I am interested to know a faster matrix intersection package for R
handles
intersection of two integer matrices with ncol=2. Currently I am
using my
homemade code adapted from a previous thread:
intersectMat <- function(mat1, mat2){#mat1 and mat2 are both
deduplicated nr1 <- nrow(mat1) nr2 <- nrow(mat2)
mat2[duplicated(rbind(mat1, mat2))[(nr1 + 1):(nr1 + nr2)], ]}
which handles:
size A= 10578373
size B= 9519807
expected intersecting time= 251.2272
intersecting for corssing MPRs took 409.602 seconds.
scale a little bit worse than linearly but atomic operation is not
good.
Wonder if a super fast C/C++ extension exists for this task. Your
ideas
are
appreciated.
Thanks!
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.