populating matrix with binary variable after matching data from data frame

Wed, Aug 13, 2014 3:02 PM

Another solution is to use table to generate your x matrix, instead of
trying to make one and adding to it.  If you want the table to have
the same dimnames on both sides, make factors out of the columns of x1
with the same factor levels in both.  E.g., using a *small* example:

V2
V1  A B C
  A 0 0 2
  B 1 0 0
  C 0 0 0

If you don't want counts, but just a TRUE for presence and FALSE for
absence, use X>0.  If you want 1 for presence and 0 for absence you
can use pmin(X, 1).

Bill Dunlap
TIBCO Software
wdunlap tibco.com

On Wed, Aug 13, 2014 at 2:51 PM, William Dunlap <wdunlap at tibco.com> wrote:

I may have missed something, but I didn't see the result you want for
your example.  Also,
none of the entries in the x1 you showed are row or column names in x,
making it hard to show what you want to happen.

Here is a function that gives you the choice of
    *error: stop if any row of x1 is 'bad'
    *omitRows: ignore rows of x1 are 'bad'
    *expandX: expand the x matrix to include all rows or columns named in x1
(Row i of x1 is 'bad' if that x1[,1] is not a rowname of x or x1[,2]
is not a column name of x).

f
function (x, x1, badEntryAction = c("error", "omitRows", "expandX"))
{
    badEntryAction <- match.arg(badEntryAction)
    i <- as.matrix(x1[, c("V1", "V2")])
    if (badEntryAction == "omitRows") {
        i <- i[is.element(i[, 1], dimnames(x)[[1]]) & is.element(i[,
            2], dimnames(x)[[2]]), , drop = FALSE]
    }
    else if (badEntryAction == "expandX") {
        extraDimnames <- lapply(1:2, function(k) setdiff(i[,
            k], dimnames(x)[[k]]))
        # if you want the same dimnames on both axes, take union of
the 2 extraDimnames
        if ((n <- length(extraDimnames[[1]])) > 0) {
            x <- rbind(x, array(0, c(n, ncol(x)), dimnames =
list(extraDimnames[[1]],
                NULL)))
        }
        if ((n <- length(extraDimnames[[2]])) > 0) {
            x <- cbind(x, array(0, c(nrow(x), n), dimnames = list(NULL,
                extraDimnames[[2]])))
        }
    }
    x[i] <- 1
    x
}

Bill Dunlap
TIBCO Software
wdunlap tibco.com


On Wed, Aug 13, 2014 at 2:33 PM, Adrian Johnson
<oriolebaltimore at gmail.com> wrote:

Hello again. sorry for question again.

may be I was not clear in asking before.

 I don't want to remove rows from matrix, since row names and column
names are identical in matrix.


I tried your suggestion and here is what I get:

fx <- function(x,x1){

+ i <- as.matrix(x1[,c("V1","V2")])
+ x[i]<-1
+ x
+ }

fx(x, x1)

Error in `[<-`(`*tmp*`, i, value = 1) : subscript out of bounds

x[1:4,1:4]

       ABCA10 ABCA12 ABCA13 ABCA4
ABCA10      0      0      0     0
ABCA12      0      0      0     0
ABCA13      0      0      0     0
ABCA4       0      0      0     0

x1[1:10,]

      V1       V2
1   AKT3    TCL1A
2  AKTIP    VPS41
3  AKTIP    PDPK1
4  AKTIP   GTF3C1
5  AKTIP    HOOK2
6  AKTIP    POLA2
7  AKTIP KIAA1377
8  AKTIP FAM160A2
9  AKTIP    VPS16
10 AKTIP    VPS18


For instance, now I will loop over x1, I go to first row, I get V1 and
check if if I have a row in x that have item in V1 and then check V2
exist in colnames, if match then I assign 1. If not I go to row 2.

In some rows, it is possible that I will only see element in V2 that
exist in row names  and since element in V1 does not exist in X
matrix, I will give 0. (since matrix X has identical row and column
names, i feel it does not matter to check an element in column names
after we check in row names)



now for instance, If in X1 if I see ABCA10 in x1$V1 and ABCA10 in
x1$V2 then in matrix X column 1 and row 1  should get 1.

dput - follows..

x <- structure(c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), .Dim = c(4L,
4L), .Dimnames = list(c("ABCA10", "ABCA12", "ABCA13", "ABCA4"
), c("ABCA10", "ABCA12", "ABCA13", "ABCA4")))


x1 <- structure(list(V1 = c("AKT3", "AKTIP", "AKTIP", "AKTIP", "AKTIP",
"AKTIP", "AKTIP", "AKTIP", "AKTIP", "AKTIP"), V2 = c("TCL1A",
"VPS41", "PDPK1", "GTF3C1", "HOOK2", "POLA2", "KIAA1377", "FAM160A2",
"VPS16", "VPS18")), .Names = c("V1", "V2"), row.names = c(NA,
10L), class = "data.frame")



Thanks for your time.




On Wed, Aug 13, 2014 at 12:51 PM, William Dunlap <wdunlap at tibco.com> wrote:

You can replace the loop

for (i in nrow(x1)) {
   x[x1$V1[i], x1$V2[i]] <- 1;
}

by
f <- function(x, x1) {
  i <- as.matrix(x1[, c("V1","V2")]) # 2-column matrix to use as a subscript
  x[ i ] <- 1
  x
}
f(x, x1)

You will get an error if not all the strings in the subscript matrix
are in the row or
column names of x.  What do you want to happen in this case.  You can choose
to first omit the bad rows in the subscript matrix
    goodRows <- is.element(i[,1], dimnames(x)[1]) &  is.element(i[,2],
dimnames(x)[2])
    i <- i[goodRows, , drop=FALSE]
    x[ i ] <- 1
or you can choose to expand x to include all the names found in x1.

It would be good if you included some toy data to better illustrate
what you want to do.
E.g., with
  x <- array(0, c(3,3), list(Row=paste0("R",1:3),Col=paste0("C",1:3)))
  x1 <- data.frame(V1=c("R1","R3"), V2=c("C2","C1"))
the above f() gives

f(x, x1)

    Col
Row  C1 C2 C3
  R1  0  1  0
  R2  0  0  0
  R3  1  0  0
Is that what you are looking for?

populating matrix with binary variable after matching data from data frame

Thread (10 messages)