An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20111202/4eee038e/attachment.pl>
Intersection of 2 matrices
6 messages · oluwole oyebamiji, David Winsemius, Michael Kao +2 more
On Dec 2, 2011, at 4:20 AM, oluwole oyebamiji wrote:
Hi all,
I have matrix A of 67420 by 2 and another matrix B of 59199 by
2. I would like to find the number of rows of matrix B that I can
find in matrix A (rows that are common to both matrices with or
without sorting).
I have tried the "intersection" and "is.element" functions in R but
it only working for the vectors and not matrix
i.e, intersection(A,B) and is.element(A,B).
Have you considered the 'duplicated' function?
David Winsemius, MD West Hartford, CT
On 2/12/2011 2:48 p.m., David Winsemius wrote:
On Dec 2, 2011, at 4:20 AM, oluwole oyebamiji wrote:
Hi all,
I have matrix A of 67420 by 2 and another matrix B of 59199 by 2.
I would like to find the number of rows of matrix B that I can find
in matrix A (rows that are common to both matrices with or without
sorting).
I have tried the "intersection" and "is.element" functions in R but
it only working for the vectors and not matrix
i.e, intersection(A,B) and is.element(A,B).
Have you considered the 'duplicated' function?
Here is an example based on the duplicated function
test.mat1 <- matrix(1:20, nc = 5)
test.mat2 <- rbind(test.mat1[sample(1:5, 2), ], matrix(101:120, nc = 5))
compMat <- function(mat1, mat2){
nr1 <- nrow(mat1)
nr2 <- nrow(mat2)
mat2[duplicated(rbind(mat1, mat2))[(nr1 + 1):(nr1 + nr2)], ]
}
compMat(test.mat1, test.mat2)
Michael Kao <mkao006rmail <at> gmail.com> writes:
Your solution is fast, but not completely correct, because you are also
counting possible duplicates within the second matrix. The 'refitted'
function could look as follows:
compMat2 <- function(A, B) { # rows of B present in A
B0 <- B[!duplicated(B), ]
na <- nrow(A); nb <- nrow(B0)
AB <- rbind(A, B0)
ab <- duplicated(AB)[(na+1):(na+nb)]
return(sum(ab))
}
and testing an example the size the OR was asking for:
set.seed(8237)
A <- matrix(sample(1:1000, 2*67420, replace=TRUE), 67420, 2)
B <- matrix(sample(1:1000, 2*59199, replace=TRUE), 59199, 2)
system.time(n <- compMat2(A, B)) # n = 3790
while compMat() will return 5522 rows, with 1732 duplicates within B !
A 3.06 GHz iMac needs about 2 -- 2.5 seconds.
Hans Werner
On 2/12/2011 2:48 p.m., David Winsemius wrote:
On Dec 2, 2011, at 4:20 AM, oluwole oyebamiji wrote:
Hi all,
I have matrix A of 67420 by 2 and another matrix B of 59199 by 2.
I would like to find the number of rows of matrix B that I can find
in matrix A (rows that are common to both matrices with or without
sorting).
I have tried the "intersection" and "is.element" functions in R but
it only working for the vectors and not matrix
i.e, intersection(A,B) and is.element(A,B).
Have you considered the 'duplicated' function?
Here is an example based on the duplicated function
test.mat1 <- matrix(1:20, nc = 5)
test.mat2 <- rbind(test.mat1[sample(1:5, 2), ], matrix(101:120, nc = 5))
compMat <- function(mat1, mat2){
nr1 <- nrow(mat1)
nr2 <- nrow(mat2)
mat2[duplicated(rbind(mat1, mat2))[(nr1 + 1):(nr1 + nr2)], ]
}
compMat(test.mat1, test.mat2)
Michael Kao <mkao006rmail <at> gmail.com> writes:
Well, taking a second look, I'd say it depends on the exact formulation. In the applications I have in mind, I would like to count each occurrence in B only once. Perhaps the OP never thought about duplicates in B Hans Werner
Here is an example based on the duplicated function
test.mat1 <- matrix(1:20, nc = 5)
test.mat2 <- rbind(test.mat1[sample(1:5, 2), ], matrix(101:120, nc = 5))
compMat <- function(mat1, mat2){
nr1 <- nrow(mat1)
nr2 <- nrow(mat2)
mat2[duplicated(rbind(mat1, mat2))[(nr1 + 1):(nr1 + nr2)], ]
}
compMat(test.mat1, test.mat2)
Here is one way of doing it:
compMat2 <- function(A, B) { # rows of B present in A
+ B0 <- B[!duplicated(B), ] + na <- nrow(A); nb <- nrow(B0) + AB <- rbind(A, B0) + ab <- duplicated(AB)[(na+1):(na+nb)] + return(sum(ab)) + }
set.seed(8237)
A <- matrix(sample(1:1000, 2*67420, replace=TRUE), 67420, 2)
B <- matrix(sample(1:1000, 2*59199, replace=TRUE), 59199, 2)
system.time({
+ # convert for comparison + A.1 <- apply(A, 1, function(x) paste(x, collapse = ' ')) + B.1 <- apply(B, 1, function(x) paste(x, collapse = ' ')) + count <- sum(B.1 %in% A.1) + }) user system elapsed 1.77 0.00 1.79
count
[1] 3905
On Fri, Dec 2, 2011 at 2:46 PM, Hans W Borchers
<hwborchers at googlemail.com> wrote:
Michael Kao <mkao006rmail <at> gmail.com> writes:
Well, taking a second look, I'd say it depends on the exact formulation. In the applications I have in mind, I would like to count each occurrence in B only once. Perhaps the OP never thought about duplicates in B Hans Werner
Here is an example based on the duplicated function
test.mat1 <- matrix(1:20, nc = 5)
test.mat2 <- rbind(test.mat1[sample(1:5, 2), ], matrix(101:120, nc = 5))
compMat <- function(mat1, mat2){
? ? ?nr1 <- nrow(mat1)
? ? ?nr2 <- nrow(mat2)
? ? ?mat2[duplicated(rbind(mat1, mat2))[(nr1 + 1):(nr1 + nr2)], ]
}
compMat(test.mat1, test.mat2)
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Jim Holtman Data Munger Guru What is the problem that you are trying to solve? Tell me what you want to do, not how you want to do it.