Intersecting two matrices
I would appreciate it if you would follow the Posting Guide and give a reproducible example and post all messages using plain text.
Try
m1 <- matrix(sample(0:999,2*1057837,TRUE),ncol=2)
m2 <- matrix(sample(0:999,2*951980,TRUE),ncol=2)
df1 <- as.data.frame(m1)
df2 <- as.data.frame(m2)
library(sqldf)
system.time(df3 <- sqldf("SELECT DISTINCT df1.V1, df1.V2 FROM df1 INNER JOIN df2 ON df1.V1=df2.V1 AND df1.V2=df2.V2") )
The speed seems heavily dependent on how many rows are duplicated within the input data frames... so if the range of values is small then the query runs slower. Note also that moving the data from R to the database and back takes time... you may be able to import the data directly from your source data to the database and save some time. Read ?sqldf and ?read.csv.sql examples for more info.
---------------------------------------------------------------------------
Jeff Newmiller The ..... ..... Go Live...
DCN:<jdnewmil at dcn.davis.ca.us> Basics: ##.#. ##.#. Live Go...
Live: OO#.. Dead: OO#.. Playing
Research Engineer (Solar/Batteries O.O#. #.O#. with
/Software/Embedded Controllers) .OO#. .OO#. rocks...1k
---------------------------------------------------------------------------
Sent from my phone. Please excuse my brevity.
c char <charlie.hsia.us at gmail.com> wrote:
I am not familiar with R's sort and sql libs. appreciate if you can post a code snippet when you got time. Thanks a lot! On Tue, Jul 30, 2013 at 10:36 AM, Jeff Newmiller <jdnewmil at dcn.davis.ca.us>wrote:
In that case, you should be looking at a relational inner join,
perhaps
with SQLite (see package sqldf).
---------------------------------------------------------------------------
Jeff Newmiller The ..... ..... Go
Live...
DCN:<jdnewmil at dcn.davis.ca.us> Basics: ##.#. ##.#. Live
Go...
Live: OO#.. Dead: OO#..
Playing
Research Engineer (Solar/Batteries O.O#. #.O#. with /Software/Embedded Controllers) .OO#. .OO#.
rocks...1k
---------------------------------------------------------------------------
Sent from my phone. Please excuse my brevity. c char <charlie.hsia.us at gmail.com> wrote:
Thanks a lot. Still looking for some super fast and memory efficient solution, as
the
matrix I have in real world has billions of rows. On Mon, Jul 29, 2013 at 6:24 PM, William Dunlap <wdunlap at tibco.com> wrote:
I haven't looked at the size-time relationship, but im2 (below) is
faster
than your
function on at least one example:
intersectMat <- function(mat1, mat2)
{
#mat1 and mat2 are both deduplicated
nr1 <- nrow(mat1)
nr2 <- nrow(mat2)
mat2[duplicated(rbind(mat1, mat2))[(nr1 + 1):(nr1 + nr2)], ,
drop=FALSE]
}
im2 <- function(mat1, mat2)
{
stopifnot(ncol(mat1)==2, ncol(mat1)==ncol(mat2))
toChar <- function(twoColMat) paste(sep="\1", twoColMat[,1],
twoColMat[,2])
mat1[match(toChar(mat2), toChar(mat1), nomatch=0), ,
drop=FALSE]
}
m1 <- cbind(1:1e7, rep(1:10, len=1e7)) m2 <- cbind(1:1e7, rep(1:20, len=1e7)) system.time(r1 <- intersectMat(m1,m2))
user system elapsed 430.37 1.96 433.98
system.time(r2 <- im2(m1,m2))
user system elapsed 27.89 0.20 28.13
identical(r1, r2)
[1] TRUE
dim(r1)
[1] 5000000 2 Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com
-----Original Message----- From: r-help-bounces at r-project.org
[mailto:r-help-bounces at r-project.org]
On Behalf
Of c char Sent: Monday, July 29, 2013 4:04 PM To: r-help at r-project.org Subject: [R] Intersecting two matrices Dear all, I am interested to know a faster matrix intersection package for
R
handles
intersection of two integer matrices with ncol=2. Currently I am
using my
homemade code adapted from a previous thread:
intersectMat <- function(mat1, mat2){#mat1 and mat2 are both
deduplicated nr1 <- nrow(mat1) nr2 <- nrow(mat2)
mat2[duplicated(rbind(mat1, mat2))[(nr1 + 1):(nr1 + nr2)], ]}
which handles:
size A= 10578373
size B= 9519807
expected intersecting time= 251.2272
intersecting for corssing MPRs took 409.602 seconds.
scale a little bit worse than linearly but atomic operation is
not
good.
Wonder if a super fast C/C++ extension exists for this task.
Your
ideas
are
appreciated.
Thanks!
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible
code.
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.