Skip to content

matching vectors against vectors

3 messages · Piet van Remortel, Pierre Kleiber, Adaikalavan Ramasamy

#
Hi all.

I have a re-occuring typical problem that I don't know how to solve 
efficiently.

The situation is the following:   I have a number of data-sets 
(A,B,C,...) , consisting of an identifier (e.g. 11,12,13,...,20) and a 
measurement (e.g. in the range 100-120).   I want to compile a large 
table, with all availabe identifiers in all data-sets in the rows, and 
a column for every dataset.

Now, not all datasets have a measurement for every identifier, so I 
want NA if the set does not contain the identifier.

an example for a single dataset:

#all identifiers
 > rep <- c(10:20)

#Identifiers in my dataset (a subset of rep)
 > rep1 <- c(12,13,15,16,17,18)

#measurements in this dataset
 > rep1.r <- c(112,113,115,116,117,118)

#a vector which should become a column in the final table, now 
containing all NAs
 > res <- rep(NA,10)

#the IDs and values of my dataset together
 > data <- cbind(rep1, rep1.r)

data looks like this:
      rep1 rep1.r
[1,]   12    112
[2,]   13    113
[3,]   15    115
[4,]   16    116
[5,]   17    117
[6,]   18    118

Now, I want to put the values 112, 113, 115,... in the correct rows of 
the final table, using the identifiers as an indicator of which row to 
put it in, so that I finally obtain:

rep     res
10    NA
11    NA
12    112
13    113
14    NA
15    115
16    116
17    117
18    118
19    NA
20    NA

I try to avoid repeating 'which' a lot and filling in every 
identifier's observation etc, since I will be doing this for thousands 
of rows at once.    There must be an efficient way using factors, 
tapply etc, but I have trouble finding it.  Ideal would be if this 
could be done in one go, instead of looping.

Any suggestions ?

Thanks,

Piet
#
merge() may be just what you want.

    Cheers, Pierre
Piet van Remortel wrote:

  
    
1 day later
#
You can use merge but to do so you will need to define the common key
first. This can be a rowname in the case of a matrix or names in the
case of a vector.

v1 <- 1:10
names(v1) <- LETTERS[1:10]

v2 <- 101:105
names(v2) <- sample( LETTERS[1:10], 5 )
Row.names  x   y
1          A  1  NA
2          B  2 102
3          C  3 104
4          D  4 103
5          E  5 105
6          F  6  NA
7          G  7  NA
8          H  8 101
9          I  9  NA
10         J 10  NA


Regards, Adai
On Tue, 2005-03-29 at 22:47 +0200, Piet van Remortel wrote: