Intersection of 2 matrices

An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20111202/4eee038e/attachment.pl>

Hi all,
    I have matrix A of 67420 by 2 and another matrix B of 59199 by  
2. I would like to find the number of rows of matrix B that I can  
find in matrix A (rows that are common to both matrices with or  
without sorting).

I have tried the "intersection" and "is.element" functions in R but  
it only working for the vectors and not matrix
i.e,    intersection(A,B) and is.element(A,B).
Have you considered the 'duplicated' function?
David Winsemius, MD
West Hartford, CT
On Dec 2, 2011, at 4:20 AM, oluwole oyebamiji wrote:

Hi all,
    I have matrix A of 67420 by 2 and another matrix B of 59199 by 2. 
I would like to find the number of rows of matrix B that I can find 
in matrix A (rows that are common to both matrices with or without 
sorting).

I have tried the "intersection" and "is.element" functions in R but 
it only working for the vectors and not matrix
i.e,    intersection(A,B) and is.element(A,B).
Have you considered the 'duplicated' function?

Here is an example based on the duplicated function

test.mat1 <- matrix(1:20, nc = 5)

test.mat2 <- rbind(test.mat1[sample(1:5, 2), ], matrix(101:120, nc = 5))

compMat <- function(mat1, mat2){
     nr1 <- nrow(mat1)
     nr2 <- nrow(mat2)
     mat2[duplicated(rbind(mat1, mat2))[(nr1 + 1):(nr1 + nr2)], ]
}

compMat(test.mat1, test.mat2)
Michael Kao <mkao006rmail <at> gmail.com> writes:

Your solution is fast, but not completely correct, because you are also 
counting possible duplicates within the second matrix. The 'refitted'
function could look as follows:

    compMat2 <- function(A, B) {  # rows of B present in A
        B0 <- B[!duplicated(B), ]
        na <- nrow(A); nb <- nrow(B0)
        AB <- rbind(A, B0)
        ab <- duplicated(AB)[(na+1):(na+nb)]
        return(sum(ab))
    }

and testing an example the size the OR was asking for:

    set.seed(8237)
    A  <- matrix(sample(1:1000, 2*67420, replace=TRUE), 67420, 2)
    B  <- matrix(sample(1:1000, 2*59199, replace=TRUE), 59199, 2)

    system.time(n <- compMat2(A, B))  # n = 3790

while compMat() will return 5522 rows, with 1732 duplicates within B !
A 3.06 GHz iMac needs about 2 -- 2.5 seconds.

Hans Werner
On 2/12/2011 2:48 p.m., David Winsemius wrote:
On Dec 2, 2011, at 4:20 AM, oluwole oyebamiji wrote:

Hi all,
    I have matrix A of 67420 by 2 and another matrix B of 59199 by 2. 
I would like to find the number of rows of matrix B that I can find 
in matrix A (rows that are common to both matrices with or without 
sorting).

I have tried the "intersection" and "is.element" functions in R but 
it only working for the vectors and not matrix
i.e,    intersection(A,B) and is.element(A,B).
Have you considered the 'duplicated' function?

Here is an example based on the duplicated function

test.mat1 <- matrix(1:20, nc = 5)

test.mat2 <- rbind(test.mat1[sample(1:5, 2), ], matrix(101:120, nc = 5))

compMat <- function(mat1, mat2){
     nr1 <- nrow(mat1)
     nr2 <- nrow(mat2)
     mat2[duplicated(rbind(mat1, mat2))[(nr1 + 1):(nr1 + nr2)], ]
}

compMat(test.mat1, test.mat2)

Michael Kao <mkao006rmail <at> gmail.com> writes:

Well, taking a second look, I'd say it depends on the exact formulation.

In the applications I have in mind, I would like to count each occurrence
in B only once. Perhaps the OP never thought about duplicates in B

Hans Werner
Here is an example based on the duplicated function

test.mat1 <- matrix(1:20, nc = 5)

test.mat2 <- rbind(test.mat1[sample(1:5, 2), ], matrix(101:120, nc = 5))

compMat <- function(mat1, mat2){
     nr1 <- nrow(mat1)
     nr2 <- nrow(mat2)
     mat2[duplicated(rbind(mat1, mat2))[(nr1 + 1):(nr1 + nr2)], ]
}

compMat(test.mat1, test.mat2)

Here is one way of doing it:
   compMat2 <- function(A, B) {  # rows of B present in A
+        B0 <- B[!duplicated(B), ]
+        na <- nrow(A); nb <- nrow(B0)
+        AB <- rbind(A, B0)
+        ab <- duplicated(AB)[(na+1):(na+nb)]
+        return(sum(ab))
+    }

   set.seed(8237)
   A  <- matrix(sample(1:1000, 2*67420, replace=TRUE), 67420, 2)
   B  <- matrix(sample(1:1000, 2*59199, replace=TRUE), 59199, 2)

   system.time({
+       # convert for comparison
+       A.1 <- apply(A, 1, function(x) paste(x, collapse = ' '))
+       B.1 <- apply(B, 1, function(x) paste(x, collapse = ' '))
+       count <- sum(B.1 %in% A.1)
+    })
   user  system elapsed
   1.77    0.00    1.79

count
[1] 3905

On Fri, Dec 2, 2011 at 2:46 PM, Hans W Borchers
Michael Kao <mkao006rmail <at> gmail.com> writes:

Well, taking a second look, I'd say it depends on the exact formulation.

In the applications I have in mind, I would like to count each occurrence
in B only once. Perhaps the OP never thought about duplicates in B

Hans Werner

Here is an example based on the duplicated function

test.mat1 <- matrix(1:20, nc = 5)

test.mat2 <- rbind(test.mat1[sample(1:5, 2), ], matrix(101:120, nc = 5))

compMat <- function(mat1, mat2){
? ? ?nr1 <- nrow(mat1)
? ? ?nr2 <- nrow(mat2)
? ? ?mat2[duplicated(rbind(mat1, mat2))[(nr1 + 1):(nr1 + nr2)], ]
}

compMat(test.mat1, test.mat2)

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.