Skip to content

populating matrix with binary variable after matching data from data frame

10 messages · Adrian Johnson, arun, John McKown +1 more

#
Hi:
sorry I have a basic question.

I have a data frame with two columns:
V1       V2
1   AKT3    TCL1A
2  AKTIP    VPS41
3  AKTIP    PDPK1
4  AKTIP   GTF3C1
5  AKTIP    HOOK2
6  AKTIP    POLA2
7  AKTIP KIAA1377
8  AKTIP FAM160A2
9  AKTIP    VPS16
10 AKTIP    VPS18


I have a matrix 1211x1211 (using some elements in x1$V1 and some from
x1$V2). I want to populate for every match for example AKT3 = TCL1A = 1
whereas AKT3 - VPS41 gets 0)
How can i map this binary relations in x.
TCLA1 VPS41 ABCA13 ABCA4
AKT3       0     0      0     0
AKTIP      0     0      0     0
ABCA13     0     0      0     0
ABCA4      0     0      0     0


dput -

x = structure(c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), .Dim =
c(4L,
4L), .Dimnames = list(c("AKT3", "AKTIP", "ABCA13", "ABCA4"
), c("TCLA1", "VPS41", "ABCA13", "ABCA4")))

x1 = structure(list(V1 = c("AKT3", "AKTIP", "AKTIP", "AKTIP", "AKTIP",
"AKTIP", "AKTIP", "AKTIP", "AKTIP", "AKTIP"), V2 = c("TCL1A",
"VPS41", "PDPK1", "GTF3C1", "HOOK2", "POLA2", "KIAA1377", "FAM160A2",
"VPS16", "VPS18")), .Names = c("V1", "V2"), row.names = c(NA,
10L), class = "data.frame")



Thanks
Adrian
#
You could try:
x1$V2[1] <- "TCLA1"


? x[outer(rownames(x), colnames(x), FUN=paste) %in% as.character(interaction(x1, sep=" "))] <- 1
x
?????? TCLA1 VPS41 ABCA13 ABCA4
AKT3?????? 1???? 0????? 0???? 0
AKTIP????? 0???? 1????? 0???? 0
ABCA13???? 0???? 0????? 0???? 0
ABCA4????? 0???? 0????? 0???? 0
A.K.
On Tuesday, August 12, 2014 8:16 PM, Adrian Johnson <oriolebaltimore at gmail.com> wrote:
Hi:
sorry I have a basic question.

I have a data frame with two columns:
? ? ? V1? ? ?  V2
1?  AKT3? ? TCL1A
2? AKTIP? ? VPS41
3? AKTIP? ? PDPK1
4? AKTIP?  GTF3C1
5? AKTIP? ? HOOK2
6? AKTIP? ? POLA2
7? AKTIP KIAA1377
8? AKTIP FAM160A2
9? AKTIP? ? VPS16
10 AKTIP? ? VPS18


I have a matrix 1211x1211 (using some elements in x1$V1 and some from
x1$V2). I want to populate for every match for example AKT3 = TCL1A = 1
whereas AKT3 - VPS41 gets 0)
How can i map this binary relations in x.
? ? ?  TCLA1 VPS41 ABCA13 ABCA4
AKT3? ? ?  0? ?  0? ? ? 0? ?  0
AKTIP? ? ? 0? ?  0? ? ? 0? ?  0
ABCA13? ?  0? ?  0? ? ? 0? ?  0
ABCA4? ? ? 0? ?  0? ? ? 0? ?  0


dput -

x = structure(c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), .Dim =
c(4L,
4L), .Dimnames = list(c("AKT3", "AKTIP", "ABCA13", "ABCA4"
), c("TCLA1", "VPS41", "ABCA13", "ABCA4")))

x1 = structure(list(V1 = c("AKT3", "AKTIP", "AKTIP", "AKTIP", "AKTIP",
"AKTIP", "AKTIP", "AKTIP", "AKTIP", "AKTIP"), V2 = c("TCL1A",
"VPS41", "PDPK1", "GTF3C1", "HOOK2", "POLA2", "KIAA1377", "FAM160A2",
"VPS16", "VPS18")), .Names = c("V1", "V2"), row.names = c(NA,
10L), class = "data.frame")



Thanks
Adrian

??? [[alternative HTML version deleted]]

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
#
On Tue, Aug 12, 2014 at 7:14 PM, Adrian Johnson <oriolebaltimore at gmail.com>
wrote:
<snip>

I'm not totally sure that I understand your data structure. So I will
rephrase a bit so that I can be corrected, if necessary. You have an
1211x1121 matrix already. Every cell in the matrix is initialized to 0. It
has column names such as TCLA1, VPS41, ABCA13, ABCA4, ... and it has row
names such as AKT3 AKTPI, ABCA13, ABCA4. The list "x1" has columns named V1
and V2. V1 values are row names in the matrix. V2 values are column names
in the matrix. The following should do what you want. It is not a _good_
solution because it is iterative. But it is a start

for (i in nrow(x1)) {
   x[x1$V1[i], x1$V2[i]] <- 1;
}
Please post in plain text, per the mailing list "rules".
#
Hi.
Thank you for your help.
yes, thats exactly right - but the 1211x1211 matrix has some
row/column elements that may not be present in x1.
Is that the reason I get this error?

My matrix row names and column names are identical. I changed the
order in my dput code for representational purpose so that they can
have 1 for conveying question easily.

Thanks


    A  B C D E
A
B
C
D
+   x[x1$V1[i], x1$V2[i]] <- 1;
+ }
Error in `[<-`(`*tmp*`, x1[i, ]$V1, x1[i, ]$V2, value = 1) :
  subscript out of bounds







On Wed, Aug 13, 2014 at 8:28 AM, John McKown
<john.archie.mckown at gmail.com> wrote:
#
You can replace the loop
by
f <- function(x, x1) {
  i <- as.matrix(x1[, c("V1","V2")]) # 2-column matrix to use as a subscript
  x[ i ] <- 1
  x
}
f(x, x1)

You will get an error if not all the strings in the subscript matrix
are in the row or
column names of x.  What do you want to happen in this case.  You can choose
to first omit the bad rows in the subscript matrix
    goodRows <- is.element(i[,1], dimnames(x)[1]) &  is.element(i[,2],
dimnames(x)[2])
    i <- i[goodRows, , drop=FALSE]
    x[ i ] <- 1
or you can choose to expand x to include all the names found in x1.

It would be good if you included some toy data to better illustrate
what you want to do.
E.g., with
  x <- array(0, c(3,3), list(Row=paste0("R",1:3),Col=paste0("C",1:3)))
  x1 <- data.frame(V1=c("R1","R3"), V2=c("C2","C1"))
the above f() gives
Col
Row  C1 C2 C3
  R1  0  1  0
  R2  0  0  0
  R3  1  0  0
Is that what you are looking for?
#
Hello again. sorry for question again.

may be I was not clear in asking before.

 I don't want to remove rows from matrix, since row names and column
names are identical in matrix.


I tried your suggestion and here is what I get:
+ i <- as.matrix(x1[,c("V1","V2")])
+ x[i]<-1
+ x
+ }
Error in `[<-`(`*tmp*`, i, value = 1) : subscript out of bounds
ABCA10 ABCA12 ABCA13 ABCA4
ABCA10      0      0      0     0
ABCA12      0      0      0     0
ABCA13      0      0      0     0
ABCA4       0      0      0     0
V1       V2
1   AKT3    TCL1A
2  AKTIP    VPS41
3  AKTIP    PDPK1
4  AKTIP   GTF3C1
5  AKTIP    HOOK2
6  AKTIP    POLA2
7  AKTIP KIAA1377
8  AKTIP FAM160A2
9  AKTIP    VPS16
10 AKTIP    VPS18


For instance, now I will loop over x1, I go to first row, I get V1 and
check if if I have a row in x that have item in V1 and then check V2
exist in colnames, if match then I assign 1. If not I go to row 2.

In some rows, it is possible that I will only see element in V2 that
exist in row names  and since element in V1 does not exist in X
matrix, I will give 0. (since matrix X has identical row and column
names, i feel it does not matter to check an element in column names
after we check in row names)



now for instance, If in X1 if I see ABCA10 in x1$V1 and ABCA10 in
x1$V2 then in matrix X column 1 and row 1  should get 1.

dput - follows..

x <- structure(c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), .Dim = c(4L,
4L), .Dimnames = list(c("ABCA10", "ABCA12", "ABCA13", "ABCA4"
), c("ABCA10", "ABCA12", "ABCA13", "ABCA4")))


x1 <- structure(list(V1 = c("AKT3", "AKTIP", "AKTIP", "AKTIP", "AKTIP",
"AKTIP", "AKTIP", "AKTIP", "AKTIP", "AKTIP"), V2 = c("TCL1A",
"VPS41", "PDPK1", "GTF3C1", "HOOK2", "POLA2", "KIAA1377", "FAM160A2",
"VPS16", "VPS18")), .Names = c("V1", "V2"), row.names = c(NA,
10L), class = "data.frame")



Thanks for your time.
On Wed, Aug 13, 2014 at 12:51 PM, William Dunlap <wdunlap at tibco.com> wrote:
#
I may have missed something, but I didn't see the result you want for
your example.  Also,
none of the entries in the x1 you showed are row or column names in x,
making it hard to show what you want to happen.

Here is a function that gives you the choice of
    *error: stop if any row of x1 is 'bad'
    *omitRows: ignore rows of x1 are 'bad'
    *expandX: expand the x matrix to include all rows or columns named in x1
(Row i of x1 is 'bad' if that x1[,1] is not a rowname of x or x1[,2]
is not a column name of x).

f
function (x, x1, badEntryAction = c("error", "omitRows", "expandX"))
{
    badEntryAction <- match.arg(badEntryAction)
    i <- as.matrix(x1[, c("V1", "V2")])
    if (badEntryAction == "omitRows") {
        i <- i[is.element(i[, 1], dimnames(x)[[1]]) & is.element(i[,
            2], dimnames(x)[[2]]), , drop = FALSE]
    }
    else if (badEntryAction == "expandX") {
        extraDimnames <- lapply(1:2, function(k) setdiff(i[,
            k], dimnames(x)[[k]]))
        # if you want the same dimnames on both axes, take union of
the 2 extraDimnames
        if ((n <- length(extraDimnames[[1]])) > 0) {
            x <- rbind(x, array(0, c(n, ncol(x)), dimnames =
list(extraDimnames[[1]],
                NULL)))
        }
        if ((n <- length(extraDimnames[[2]])) > 0) {
            x <- cbind(x, array(0, c(nrow(x), n), dimnames = list(NULL,
                extraDimnames[[2]])))
        }
    }
    x[i] <- 1
    x
}

Bill Dunlap
TIBCO Software
wdunlap tibco.com


On Wed, Aug 13, 2014 at 2:33 PM, Adrian Johnson
<oriolebaltimore at gmail.com> wrote:
#
Another solution is to use table to generate your x matrix, instead of
trying to make one and adding to it.  If you want the table to have
the same dimnames on both sides, make factors out of the columns of x1
with the same factor levels in both.  E.g., using a *small* example:
V2
V1  A B C
  A 0 0 2
  B 1 0 0
  C 0 0 0

If you don't want counts, but just a TRUE for presence and FALSE for
absence, use X>0.  If you want 1 for presence and 0 for absence you
can use pmin(X, 1).

Bill Dunlap
TIBCO Software
wdunlap tibco.com
On Wed, Aug 13, 2014 at 2:51 PM, William Dunlap <wdunlap at tibco.com> wrote:
#
Hi Bill,
sorry for trouble. It did not work both solutions.
Error in `[<-`(`*tmp*`, i, value = 1) : subscript out of bounds


my x matrix is may not have  items that x1 has.

say x only has A,B, C, D  , whereas x1 has K, L, M , A and D.  However
x1 does not have any relationship between B and C thus B-C will be a
zero anyway.

x1 :

K   L
D  A
K  M
M  A
Although M associates with A, since M is not present in X - we will
not map this association with 1. Since A and D are present in X - we
will assign 1.



   A  B  C  D

A 0  0  0  0

B 0  0  0  0

C 0  0  0  0

D  1 0  0  0


I tried this simple for loop but I get same subset error:


for(k in nrow(x1)){
x[x1[k,]$V1,x1[k,]$V2] <- 1
x[x1[,k]$V1,x1[,k]$V2] <- 1
x[x1[,k]$V2,x1[,k]$V1] <- 1
}

Error in `[<-`(`*tmp*`, hprd[x, ]$V1, hprd[x, ]$V2, value = 1) :
  subscript out of bounds

Thanks again.
On Wed, Aug 13, 2014 at 6:02 PM, William Dunlap <wdunlap at tibco.com> wrote:
#
This is what I got:
A B C D
A 0 0 0 0
B 0 0 0 0
C 0 0 0 0
D 1 0 0 0
V2
V1  A B C D
  A 0 0 0 0
  B 0 0 0 0
  C 0 0 0 0
  D 1 0 0 0

I think you should sort out how your attempts went wrong.

My original 'f' assumed, perhaps foolishly, that x1 had columns names
"V1" and "V2",
perhaps it should have said just i<-as.matrix(x1) and checked that the result
was a 2-column matrix of character data.  E.g.,
f <- function (x, x1, badEntryAction = c("error", "omitRows", "expandX"))
{
    badEntryAction <- match.arg(badEntryAction)
    i <- as.matrix(x1)
    stopifnot(is.character(i), ncol(i)==2)
    if (badEntryAction == "omitRows") {
        i <- i[is.element(i[, 1], dimnames(x)[[1]]) &
               is.element(i[, 2], dimnames(x)[[2]]), , drop = FALSE]
    }
    else if (badEntryAction == "expandX") {
        extraDimnames <- lapply(1:2, function(k) setdiff(i[,
            k], dimnames(x)[[k]]))
        # if you want the same dimnames on both axes,
        # take union of the 2 extraDimnames
        if ((n <- length(extraDimnames[[1]])) > 0) {
            x <- rbind(x, array(0, c(n, ncol(x)),
                       dimnames = list(extraDimnames[[1]], NULL)))
        }
        if ((n <- length(extraDimnames[[2]])) > 0) {
            x <- cbind(x, array(0, c(nrow(x), n), dimnames = list(NULL,
                extraDimnames[[2]])))
        }
    }
    x[i] <- 1
    x
}

Bill Dunlap
TIBCO Software
wdunlap tibco.com


On Thu, Aug 14, 2014 at 8:15 AM, Adrian Johnson
<oriolebaltimore at gmail.com> wrote: