Skip to content

Identify row indices corresponding to each distinct row of a matrix

8 messages · Li Li, Bert Gunter, William Dunlap +1 more

#
Hi all,
   I use the following example to illustrate my question. As you can see,
in matrix C some rows are repeated and I would like to find the indices of
the rows corresponding to each of the distinct rows.
  For example, for the row c(1,9), I have used the "which" function to
identify the row indices corresponding to c(1,9). Using this approach, in
order to cover all distinct rows, I need to use a for loop.
   I am wondering whether there is an easier way where a for loop can be
avoided?
   Thanks very much!
      Hanna
1   1  9
2   2 10
3   3 11
4   5 13
5   7 15
6   6 14
7   4 12
8   3 11
9   8 16
10  5 13
11  7 15
12  2 10
13  1  9
14  8 16
15  1  9
16  3 11
17  7 15
18  4 12
19  2 10
20  6 14
21  4 12
22  8 16
23  5 13
24  6 14> T <- unique(C)> T  V1 V2
1  1  9
2  2 10
3  3 11
4  5 13
5  7 15
6  6 14
7  4 12
9  8 16> > i <- 1                    > which(C[,1]==T[i,1]&
C[,2]==T[i,2])[1]  1 13 15
#
A mess -- due to your continued use of html formatting.

But something like this may do what you want (hard to tell with the mess):
[,1] [,2]
 [1,]    1    9
 [2,]    2   10
 [3,]    3   11
 [4,]    4   12
 [5,]    5   13
 [6,]    6   14
 [7,]    7   15
 [8,]    8   16
 [9,]    1    9
[10,]    2   10
[11,]    3   11
[12,]    4   12
[13,]    5   13
[14,]    6   14
[15,]    7   15
[16,]    8   16
vector
[1] "1-9"  "2-10" "3-11" "4-12" "5-13" "6-14" "7-15" "8-16" "1-9"  "2-10"
"3-11" "4-12" "5-13" "6-14"
[15] "7-15" "8-16"
$`1-9`
[1] 1 9

$`2-10`
[1]  2 10

$`3-11`
[1]  3 11

$`4-12`
[1]  4 12

$`5-13`
[1]  5 13

$`6-14`
[1]  6 14

$`7-15`
[1]  7 15

$`8-16`
[1]  8 16
There may well be slicker ways to do this -- if this is actually what you
want to do.

-- Bert
On Wed, Nov 7, 2018 at 7:56 PM li li <hannah.hlx at gmail.com> wrote:

            

  
  
#
Perhaps

which( ! duplicated( m, MARGIN=1 ) )

? (untested)
On November 7, 2018 9:20:57 PM PST, Bert Gunter <bgunter.4567 at gmail.com> wrote:

  
    
#
Yes -- much better than mine. I didn't know about the MARGIN argument of
duplicated().

-- Bert


On Wed, Nov 7, 2018 at 10:32 PM Jeff Newmiller <jdnewmil at dcn.davis.ca.us>
wrote:

  
  
#
Thanks to all the reply. I will try to use plain text in the future.
One question regarding using "which( ! duplicated( m, MARGIN=1 ) )".
This seems to return the fist row indices corresponding to the distinct
rows but it does not give all the row indices
corresponding to each of the distinct rows. For example, in the my example
below, rows 1, 13 15 are all (1,9).
Thanks.
  Hanna
V1 V2
1   1  9
2   2 10
3   3 11
4   5 13
5   7 15
6   6 14
7   4 12
8   3 11
9   8 16
10  5 13
11  7 15
12  2 10
13  1  9
14  8 16
15  1  9
16  3 11
17  7 15
18  4 12
19  2 10
20  6 14
21  4 12
22  8 16
23  5 13
24  6 14
V1 V2
1  1  9
2  2 10
3  3 11
4  5 13
5  7 15
6  6 14
7  4 12
9  8 16
[1]  1 13 15


Bert Gunter <bgunter.4567 at gmail.com> ?2018?11?8??? ??10:43???

  
  
#
One way, rather clumsy, is to convert your data.frame in a character vector
or list. via an invertible tranformation, and use match on it.  E.g.,
character
[1] 1 2 3 4 5 6 7 3 8 4 5 2 1 8 1 3 5 7 2 6 7 8 4 6
$`1`
[1]  1 13 15

$`2`
[1]  2 12 19
...
$`8`
[1]  9 14 22

Both ways work for nice enough inputs, but converting to text can cause
problems if the 'sep' is in any of input text and match() on lists of lists
used
to have problems when the inner lists were big.


Bill Dunlap
TIBCO Software
wdunlap tibco.com
On Thu, Nov 8, 2018 at 1:42 PM, li li <hannah.hlx at gmail.com> wrote:

            

  
  
#
The duplicated function returns TRUE for rows that have already appeared... exactly one of the rows is not represented in the output of duplicated. For the intended purpose of removing duplicates this behavior is ideal. I have no idea what your intended purpose is, since every row has duplicates elsewhere in the matrix. If you really want every set identified this way then a loop/apply seems inevitable (most opportunities for optimization come about by not visiting every combination).

Cm <- as.matrix( C )
D <- which( !duplicated( Cm, MARGIN=1 ) )
nCm <- nrow( Cm )
F <- lapply( D, function(d) {
   idxrep <- rep( d, nCm )
   which( 0 == unname( rowSums( Cm[idxrep,] != Cm ) ) )
  } )
On November 8, 2018 1:42:40 PM PST, li li <hannah.hlx at gmail.com> wrote:

  
    
#
Thanks. It makes sense.

Jeff Newmiller <jdnewmil at dcn.davis.ca.us> ?2018?11?8??? ??8:05???