populating matrix with binary variable after matching data from data frame
Another solution is to use table to generate your x matrix, instead of trying to make one and adding to it. If you want the table to have the same dimnames on both sides, make factors out of the columns of x1 with the same factor levels in both. E.g., using a *small* example:
X1 <- data.frame(V1=c("A","A","B"), V2=c("C","C","A"))
X <- table(lapply(X1, factor, levels=union(levels(X1[[1]]), levels(X1[[2]]))))
X
V2 V1 A B C A 0 0 2 B 1 0 0 C 0 0 0 If you don't want counts, but just a TRUE for presence and FALSE for absence, use X>0. If you want 1 for presence and 0 for absence you can use pmin(X, 1). Bill Dunlap TIBCO Software wdunlap tibco.com
On Wed, Aug 13, 2014 at 2:51 PM, William Dunlap <wdunlap at tibco.com> wrote:
I may have missed something, but I didn't see the result you want for
your example. Also,
none of the entries in the x1 you showed are row or column names in x,
making it hard to show what you want to happen.
Here is a function that gives you the choice of
*error: stop if any row of x1 is 'bad'
*omitRows: ignore rows of x1 are 'bad'
*expandX: expand the x matrix to include all rows or columns named in x1
(Row i of x1 is 'bad' if that x1[,1] is not a rowname of x or x1[,2]
is not a column name of x).
f
function (x, x1, badEntryAction = c("error", "omitRows", "expandX"))
{
badEntryAction <- match.arg(badEntryAction)
i <- as.matrix(x1[, c("V1", "V2")])
if (badEntryAction == "omitRows") {
i <- i[is.element(i[, 1], dimnames(x)[[1]]) & is.element(i[,
2], dimnames(x)[[2]]), , drop = FALSE]
}
else if (badEntryAction == "expandX") {
extraDimnames <- lapply(1:2, function(k) setdiff(i[,
k], dimnames(x)[[k]]))
# if you want the same dimnames on both axes, take union of
the 2 extraDimnames
if ((n <- length(extraDimnames[[1]])) > 0) {
x <- rbind(x, array(0, c(n, ncol(x)), dimnames =
list(extraDimnames[[1]],
NULL)))
}
if ((n <- length(extraDimnames[[2]])) > 0) {
x <- cbind(x, array(0, c(nrow(x), n), dimnames = list(NULL,
extraDimnames[[2]])))
}
}
x[i] <- 1
x
}
Bill Dunlap
TIBCO Software
wdunlap tibco.com
On Wed, Aug 13, 2014 at 2:33 PM, Adrian Johnson
<oriolebaltimore at gmail.com> wrote:
Hello again. sorry for question again. may be I was not clear in asking before. I don't want to remove rows from matrix, since row names and column names are identical in matrix. I tried your suggestion and here is what I get:
fx <- function(x,x1){
+ i <- as.matrix(x1[,c("V1","V2")])
+ x[i]<-1
+ x
+ }
fx(x, x1)
Error in `[<-`(`*tmp*`, i, value = 1) : subscript out of bounds
x[1:4,1:4]
ABCA10 ABCA12 ABCA13 ABCA4 ABCA10 0 0 0 0 ABCA12 0 0 0 0 ABCA13 0 0 0 0 ABCA4 0 0 0 0
x1[1:10,]
V1 V2
1 AKT3 TCL1A
2 AKTIP VPS41
3 AKTIP PDPK1
4 AKTIP GTF3C1
5 AKTIP HOOK2
6 AKTIP POLA2
7 AKTIP KIAA1377
8 AKTIP FAM160A2
9 AKTIP VPS16
10 AKTIP VPS18
For instance, now I will loop over x1, I go to first row, I get V1 and
check if if I have a row in x that have item in V1 and then check V2
exist in colnames, if match then I assign 1. If not I go to row 2.
In some rows, it is possible that I will only see element in V2 that
exist in row names and since element in V1 does not exist in X
matrix, I will give 0. (since matrix X has identical row and column
names, i feel it does not matter to check an element in column names
after we check in row names)
now for instance, If in X1 if I see ABCA10 in x1$V1 and ABCA10 in
x1$V2 then in matrix X column 1 and row 1 should get 1.
dput - follows..
x <- structure(c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), .Dim = c(4L,
4L), .Dimnames = list(c("ABCA10", "ABCA12", "ABCA13", "ABCA4"
), c("ABCA10", "ABCA12", "ABCA13", "ABCA4")))
x1 <- structure(list(V1 = c("AKT3", "AKTIP", "AKTIP", "AKTIP", "AKTIP",
"AKTIP", "AKTIP", "AKTIP", "AKTIP", "AKTIP"), V2 = c("TCL1A",
"VPS41", "PDPK1", "GTF3C1", "HOOK2", "POLA2", "KIAA1377", "FAM160A2",
"VPS16", "VPS18")), .Names = c("V1", "V2"), row.names = c(NA,
10L), class = "data.frame")
Thanks for your time.
On Wed, Aug 13, 2014 at 12:51 PM, William Dunlap <wdunlap at tibco.com> wrote:
You can replace the loop
for (i in nrow(x1)) {
x[x1$V1[i], x1$V2[i]] <- 1;
}
by
f <- function(x, x1) {
i <- as.matrix(x1[, c("V1","V2")]) # 2-column matrix to use as a subscript
x[ i ] <- 1
x
}
f(x, x1)
You will get an error if not all the strings in the subscript matrix
are in the row or
column names of x. What do you want to happen in this case. You can choose
to first omit the bad rows in the subscript matrix
goodRows <- is.element(i[,1], dimnames(x)[1]) & is.element(i[,2],
dimnames(x)[2])
i <- i[goodRows, , drop=FALSE]
x[ i ] <- 1
or you can choose to expand x to include all the names found in x1.
It would be good if you included some toy data to better illustrate
what you want to do.
E.g., with
x <- array(0, c(3,3), list(Row=paste0("R",1:3),Col=paste0("C",1:3)))
x1 <- data.frame(V1=c("R1","R3"), V2=c("C2","C1"))
the above f() gives
f(x, x1)
Col Row C1 C2 C3 R1 0 1 0 R2 0 0 0 R3 1 0 0 Is that what you are looking for?