Skip to content

Intersection of 2 matrices

6 messages · oluwole oyebamiji, David Winsemius, Michael Kao +2 more

#
On Dec 2, 2011, at 4:20 AM, oluwole oyebamiji wrote:

            
Have you considered the 'duplicated' function?
#
On 2/12/2011 2:48 p.m., David Winsemius wrote:
Here is an example based on the duplicated function

test.mat1 <- matrix(1:20, nc = 5)

test.mat2 <- rbind(test.mat1[sample(1:5, 2), ], matrix(101:120, nc = 5))

compMat <- function(mat1, mat2){
     nr1 <- nrow(mat1)
     nr2 <- nrow(mat2)
     mat2[duplicated(rbind(mat1, mat2))[(nr1 + 1):(nr1 + nr2)], ]
}

compMat(test.mat1, test.mat2)
#
Michael Kao <mkao006rmail <at> gmail.com> writes:
Your solution is fast, but not completely correct, because you are also 
counting possible duplicates within the second matrix. The 'refitted'
function could look as follows:

    compMat2 <- function(A, B) {  # rows of B present in A
        B0 <- B[!duplicated(B), ]
        na <- nrow(A); nb <- nrow(B0)
        AB <- rbind(A, B0)
        ab <- duplicated(AB)[(na+1):(na+nb)]
        return(sum(ab))
    }

and testing an example the size the OR was asking for:

    set.seed(8237)
    A  <- matrix(sample(1:1000, 2*67420, replace=TRUE), 67420, 2)
    B  <- matrix(sample(1:1000, 2*59199, replace=TRUE), 59199, 2)

    system.time(n <- compMat2(A, B))  # n = 3790

while compMat() will return 5522 rows, with 1732 duplicates within B !
A 3.06 GHz iMac needs about 2 -- 2.5 seconds.

Hans Werner
#
Michael Kao <mkao006rmail <at> gmail.com> writes:
Well, taking a second look, I'd say it depends on the exact formulation.

In the applications I have in mind, I would like to count each occurrence
in B only once. Perhaps the OP never thought about duplicates in B

Hans Werner
#
Here is one way of doing it:
+        B0 <- B[!duplicated(B), ]
+        na <- nrow(A); nb <- nrow(B0)
+        AB <- rbind(A, B0)
+        ab <- duplicated(AB)[(na+1):(na+nb)]
+        return(sum(ab))
+    }
+       # convert for comparison
+       A.1 <- apply(A, 1, function(x) paste(x, collapse = ' '))
+       B.1 <- apply(B, 1, function(x) paste(x, collapse = ' '))
+       count <- sum(B.1 %in% A.1)
+    })
   user  system elapsed
   1.77    0.00    1.79
[1] 3905
On Fri, Dec 2, 2011 at 2:46 PM, Hans W Borchers
<hwborchers at googlemail.com> wrote: