pairs

An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20091115/c6a4700e/attachment-0001.pl>
Hi, All,

I have an n by m matrix with each entry between 1 and 15000. I want to
know
the frequency of each pair in 1:15000 that occur together in rows. So for
example, if the matrix is
2 5 1 6
1 7 8 2
3 7 6 2
9 8 5 7
Pair (2,6) (un-ordered) occurs together in rows 1 and 3. I want to return
the value 2 for this pair as well as that for all pairs. Is there a fast
way
to do this avoiding loops? Loops take too long.

Thank you,

Cindy

Use %in% to check for the presence of the numbers in a row and apply() to
efficiently execute the test for each row:

 tstMatrix <- matrix( c(2,5,1,6,
    1,7,8,2,
    3,7,6,2,
    9,8,5,7), nrow=4, byrow=T )

  matches <- apply( tstMatrix, 1, function( row ){

    if( 2 %in% row & 6 %in% row ){

      return( 2 )

    } else {

      return( 0 )

    }

  })

  matches
  [1] 2 0 2 0

If you have more than one pair, it gets a little tricky.  Say you are also
looking for the pair (7,8).  Store them as a list:

  pairList <- list( c(2,6), c(7,8) )

Then use sapply() to efficiently iterate over the pair list and execute the
apply() test:

  matchMatrix <- sapply( pairList, function( pair ){

    matches <- apply( tstMatrix, 1, function( row ){

      if( pair[1] %in% row & pair[2] %in% row ){

        return( pair[1] )

      } else {

        return( 0 )

      }

    })

    return( matches )

  })

  matchMatrix

       [,1] [,2]
  [1,]    2    0
  [2,]    0    7
  [3,]    2    0
  [4,]    0    7

If you're looking to apply the above method to every possible permutation of
2 numbers that may be generated from the range of numbers 1:15000... that's
225,000,000 pairs. expand.grid() can generate the required pair list-- but
that step alone causes a memory allocation of ~6 GB on my machine.

If you don't have a pile of CPU cores and RAM at your disposal, you can
probably:

  1. Restrict the upper end of your range to the maximal entry present in
your matrix since all other combinations have zero occurrences.

  2. Break the list of pairs up into several sublists, run the tests, and
aggregate the results.

Either way, the analysis will take some time despite the efficiencies of the
apply family of functions due to the shear size of the problem.  If you have
more than one CPU, I would recommend taking a look at parallelized apply
functions, perhaps using a package like snowfall,  as the testing of the
pairs is an "embarrassingly parallel" problem.

Hopefully I'm misunderstanding the scope of your problem.

Good luck!

-Charlie

-----
Charlie Sharpsteen
Undergraduate
Environmental Resources Engineering
Humboldt State University
View this message in context: http://old.nabble.com/pairs-tp26364801p26365206.html
Sent from the R help mailing list archive at Nabble.com.
Hope this help:
m <- matrix(c(2,1,3,9,5,7,7,8,1,8,6,5,6,2,2,7),4,4)
p <- c(2, 6)
apply(m == p[1], 1, any) & apply(m == p[2], 1, any)
[1]  TRUE FALSE  TRUE FALSE

If you want the number of rows which contain the pair, sum() could be used:
sum(apply(m == p[1], 1, any) & apply(m == p[2], 1, any))
[1] 2
Hi, All,

I have an n by m matrix with each entry between 1 and 15000. I want to know
the frequency of each pair in 1:15000 that occur together in rows. So for
example, if the matrix is
2 5 1 6
1 7 8 2
3 7 6 2
9 8 5 7
Pair (2,6) (un-ordered) occurs together in rows 1 and 3. I want to return
the value 2 for this pair as well as that for all pairs. Is there a fast way
to do this avoiding loops? Loops take too long.

Thank you,

Cindy

? ? ? ?[[alternative HTML version deleted]]

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20091115/bc920463/attachment-0001.pl>
I could of course be wrong but have you yet specified the number of  
columns for this pairing exercise?

Hi, All,

I have an n by m matrix with each entry between 1 and 15000. I want  
to know
the frequency of each pair in 1:15000 that occur together in rows.  
So for
example, if the matrix is
2 5 1 6
1 7 8 2
3 7 6 2
9 8 5 7
Pair (2,6) (un-ordered) occurs together in rows 1 and 3. I want to  
return
the value 2 for this pair as well as that for all pairs. Is there a  
fast way
to do this avoiding loops? Loops take too long.

and provide commented, minimal, self-contained, reproducible code.
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

David Winsemius, MD
Heritage Laboratories
West Hartford, CT
Assuming that the number of columns is 4, then consider this approach:

 > prs <-scan()
1: 2 5 1 6
5: 1 7 8 2
9: 3 7 6 2
13: 9 8 5 7
17:
Read 16 items
prmtx <- matrix(prs, 4,4, byrow=T)

#Now make copus of x.y and y.x

pair.str <- sapply(1:nrow(prmtx), function(z) c(apply(combn(prmtx[z,],  
2), 2,function(x) paste(x[1],x[2], sep=".")) , apply(combn(prmtx[z,],  
2), 2,function(x) paste(x[2],x[1], sep="."))) )
tpair <-table(pair.str)

# This then gives you a duplicated list
 > tpair[tpair>1]
pair.str
1.2 2.1 2.6 2.7 6.2 7.2 7.8 8.7
   2   2   2   2   2   2   2   2

# So only take the first half of the pairs:
 > head(tpair[tpair>1], sum(tpair>1)/2)

pair.str
1.2 2.1 2.6 2.7
   2   2   2   2
David.

On Nov 15, 2009, at 8:06 PM, David Winsemius wrote:

> I could of course be wrong but have you yet specified the number of  
> columns for this pairing exercise?
>
> On Nov 15, 2009, at 5:26 PM, cindy Guo wrote:
>
>> Hi, All,
>>
>> I have an n by m matrix with each entry between 1 and 15000. I want  
>> to know
>> the frequency of each pair in 1:15000 that occur together in rows.  
>> So for
>> example, if the matrix is
>> 2 5 1 6
>> 1 7 8 2
>> 3 7 6 2
>> 9 8 5 7
>> Pair (2,6) (un-ordered) occurs together in rows 1 and 3. I want to  
>> return
>> the value 2 for this pair as well as that for all pairs. Is there a  
>> fast way
>> to do this avoiding loops? Loops take too long.
>>
>> and provide commented, minimal, self-contained, reproducible code.
>                               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>
> David Winsemius, MD
> Heritage Laboratories
> West Hartford, CT
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius, MD
Heritage Laboratories
West Hartford, CT
An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20091115/990f54f8/attachment-0001.pl>
I'm not convinced it's right. In fact, I'm pretty sure the last step  
taking only the first half of the list is wrong. I also do not know if  
you have considered how you want to count situations like:

3 2 7 4 5 7 ...
7 3 8 6 1 2 9 2 ......

How many "pairs" of 2-7/7-2 would that represent?
David
On Nov 15, 2009, at 11:06 PM, cindy Guo wrote:

> Hi, David,
>
> The matrix has 20 columns.
> Thank you very much for your help. I think it's right, but it seems  
> I need some time to figure it out. I am a green hand. There are so  
> many functions here I never used before. :)
>
> Cindy
>
> On Sun, Nov 15, 2009 at 5:19 PM, David Winsemius <dwinsemius at comcast.net 
> > wrote:
> Assuming that the number of columns is 4, then consider this approach:
>
> > prs <-scan()
> 1: 2 5 1 6
> 5: 1 7 8 2
> 9: 3 7 6 2
> 13: 9 8 5 7
> 17:
> Read 16 items
> prmtx <- matrix(prs, 4,4, byrow=T)
>
> #Now make copus of x.y and y.x
>
> pair.str <- sapply(1:nrow(prmtx), function(z)  
> c(apply(combn(prmtx[z,], 2), 2,function(x) paste(x[1],x[2],  
> sep=".")) , apply(combn(prmtx[z,], 2), 2,function(x)  
> paste(x[2],x[1], sep="."))) )
> tpair <-table(pair.str)
>
> # This then gives you a duplicated list
> > tpair[tpair>1]
> pair.str
> 1.2 2.1 2.6 2.7 6.2 7.2 7.8 8.7
>  2   2   2   2   2   2   2   2
>
> # So only take the first half of the pairs:
> > head(tpair[tpair>1], sum(tpair>1)/2)
>
> pair.str
> 1.2 2.1 2.6 2.7
>  2   2   2   2
>
> -- 
> David.
>
>
>
> On Nov 15, 2009, at 8:06 PM, David Winsemius wrote:
>
> I could of course be wrong but have you yet specified the number of  
> columns for this pairing exercise?
>
> On Nov 15, 2009, at 5:26 PM, cindy Guo wrote:
>
> Hi, All,
>
> I have an n by m matrix with each entry between 1 and 15000. I want  
> to know
> the frequency of each pair in 1:15000 that occur together in rows.  
> So for
> example, if the matrix is
> 2 5 1 6
> 1 7 8 2
> 3 7 6 2
> 9 8 5 7
> Pair (2,6) (un-ordered) occurs together in rows 1 and 3. I want to  
> return
> the value 2 for this pair as well as that for all pairs. Is there a  
> fast way
> to do this avoiding loops? Loops take too long.
>
> and provide commented, minimal, self-contained, reproducible code.
>                              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>
> David Winsemius, MD
> Heritage Laboratories
> West Hartford, CT
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
> David Winsemius, MD
> Heritage Laboratories
> West Hartford, CT
>
>

David Winsemius, MD
Heritage Laboratories
West Hartford, CT
I stuck in another "7" in one of the lines with a 2 and reasoned that  
we could deal with the desire for non-ordered "pair counting" by  
pasting min(x,y) to max(x,y);

 > dput(prmtx)
structure(c(2, 1, 3, 9, 5, 7, 7, 8, 1, 7, 6, 5, 6, 2, 2, 7), .Dim =  
c(4L,
4L))
 > prmtx
      [,1] [,2] [,3] [,4]
[1,]    2    5    1    6
[2,]    1    7    7    2
[3,]    3    7    6    2
[4,]    9    8    5    7

 > pair.str <- sapply(1:nrow(prmtx), function(z)   
apply(combn(prmtx[z,], 2), 2,function(x) paste(min(x[2],x[1]),  
max(x[2],x[1]), sep=".")))

The logic:
sapply(1:nrow(prmtx), ... just loops over the rows of the matrix.
combn(prmtx[z,], 2)  ... returns a two row matrix of combination in a  
single row.
apply(combn(prmtx[z,], 2), 2 ... since combn( , 2)  returns a matrix  
that has two _rows_ I needed to loop over the columns.
paste(min(x[2],x[1]), max(x[2],x[1]), sep=".") ... stick the minimum  
of a pair in front of the max and separates them with a period to  
prevent two+ digits from being non-unique

Then using table() and logical tests in an index for the desired  
multiple pairs:

 > tpair <-table(pair.str)
 > tpair
pair.str
1.2 1.5 1.6 1.7 2.3 2.5 2.6 2.7 3.6 3.7 5.6 5.7 5.8 5.9 6.7 7.7 7.8  
7.9 8.9
   2   1   1   2   1   1   2   3   1   1   1   1   1   1   1   1   1    
1   1
 > tpair[tpair>1]
pair.str
1.2 1.7 2.6 2.7
   2   2   2   3
David.

On Nov 16, 2009, at 7:02 AM, David Winsemius wrote:

> I'm not convinced it's right. In fact, I'm pretty sure the last step  
> taking only the first half of the list is wrong. I also do not know  
> if you have considered how you want to count situations like:
>
> 3 2 7 4 5 7 ...
> 7 3 8 6 1 2 9 2 ......
>
> How many "pairs" of 2-7/7-2 would that represent?
>
> -- 
> David
> On Nov 15, 2009, at 11:06 PM, cindy Guo wrote:
>
>> Hi, David,
>>
>> The matrix has 20 columns.
>> Thank you very much for your help. I think it's right, but it seems  
>> I need some time to figure it out. I am a green hand. There are so  
>> many functions here I never used before. :)
>>
>> Cindy
>>
>> On Sun, Nov 15, 2009 at 5:19 PM, David Winsemius <dwinsemius at comcast.net 
>> > wrote:
>> Assuming that the number of columns is 4, then consider this  
>> approach:
>>
>> > prs <-scan()
>> 1: 2 5 1 6
>> 5: 1 7 8 2
>> 9: 3 7 6 2
>> 13: 9 8 5 7
>> 17:
>> Read 16 items
>> prmtx <- matrix(prs, 4,4, byrow=T)
>>
>> #Now make copus of x.y and y.x
>>
>> pair.str <- sapply(1:nrow(prmtx), function(z)  
>> c(apply(combn(prmtx[z,], 2), 2,function(x) paste(x[1],x[2],  
>> sep=".")) , apply(combn(prmtx[z,], 2), 2,function(x)  
>> paste(x[2],x[1], sep="."))) )
>> tpair <-table(pair.str)
>>
>> # This then gives you a duplicated list
>> > tpair[tpair>1]
>> pair.str
>> 1.2 2.1 2.6 2.7 6.2 7.2 7.8 8.7
>> 2   2   2   2   2   2   2   2
>>
>> # So only take the first half of the pairs:
>> > head(tpair[tpair>1], sum(tpair>1)/2)
>>
>> pair.str
>> 1.2 2.1 2.6 2.7
>> 2   2   2   2
>>
>> -- 
>> David.
>>
>>
>>
>> On Nov 15, 2009, at 8:06 PM, David Winsemius wrote:
>>
>> I could of course be wrong but have you yet specified the number of  
>> columns for this pairing exercise?
>>
>> On Nov 15, 2009, at 5:26 PM, cindy Guo wrote:
>>
>> Hi, All,
>>
>> I have an n by m matrix with each entry between 1 and 15000. I want  
>> to know
>> the frequency of each pair in 1:15000 that occur together in rows.  
>> So for
>> example, if the matrix is
>> 2 5 1 6
>> 1 7 8 2
>> 3 7 6 2
>> 9 8 5 7
>> Pair (2,6) (un-ordered) occurs together in rows 1 and 3. I want to  
>> return
>> the value 2 for this pair as well as that for all pairs. Is there a  
>> fast way
>> to do this avoiding loops? Loops take too long.
>>
>> and provide commented, minimal, self-contained, reproducible code.
>>                             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>>
>> David Winsemius, MD
>> Heritage Laboratories
>> West Hartford, CT
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>> David Winsemius, MD
>> Heritage Laboratories
>> West Hartford, CT
>>
>>
>
> David Winsemius, MD
> Heritage Laboratories
> West Hartford, CT
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius, MD
Heritage Laboratories
West Hartford, CT
An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20091116/326fdd48/attachment-0001.pl>
An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20091116/9b71e81c/attachment-0001.pl>
An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20091116/60baddf3/attachment-0001.pl>
?order

Do you mean if the numbers in each row are ordered? They are not, but if
it's needed, we can order them. The matrix only has 5000 rows.

No, he's suggesting you check out the order() function by calling it's help
page:

  ?order

order() will sort your results into ascending or descending order.  You
could then pick off the top 50 by using head().

Hope that helps!

-Charlie

-----
Charlie Sharpsteen
Undergraduate
Environmental Resources Engineering
Humboldt State University
View this message in context: http://old.nabble.com/pairs-tp26364801p26378236.html
Sent from the R help mailing list archive at Nabble.com.
An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20091116/89b64f16/attachment-0001.pl>
An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20091116/a0f03e24/attachment-0001.pl>