fast way to compare two matrices of combinations
Thanks to all for their suggestions. I apologize for not supplying a self-contained example, I should not post questions when I'm on the way out the door. Martin's suggestion should work, but I need to put in on our high-performance system next week. On my local 64-bit Linux box with 4GB of RAM it blew up when a vector reached 2.6GB. I may also get something to work using Charles' suggestion to use R's intrinsic table functions. I initially could not see how to do this with a vector of 3 elements, but I believe I can if I sort each vector, to obviate effects of order, and paste them together to make one unique string. Once I get something that works and is an optimized as I can make it, I'll post for future reference and for suggestions on further optimization. Mark Mark W. Kimpel MD ** Neuroinformatics ** Dept. of Psychiatry Indiana University School of Medicine 15032 Hunter Court, Westfield, IN 46074 (317) 490-5129 Work, & Mobile & VoiceMail (317) 204-4202 Home (no voice mail please) mwkimpel<at>gmail<dot>com ******************************************************************
Charles C. Berry wrote:
On Thu, 13 Mar 2008, Mark W Kimpel wrote:
I have a list (length 750), each element containing a vector of unique strings (unique gene ids), with length up to ~40 (median 15). I want to compile a matrix of all possible triplets and their frequency within gene elements. Using combn and a lot of looping, I am accomplishing this but it is VERY slow. I've tried to figure out a way to vectorize this, using "match" and "%in%", but can't get my mind around it. Below is my code. sig.tf.pairs is the list. Suggestions?
First, be sure that your code does what you really intend for it to do.
Does this really do what you wanted?
if (length(intersect(triplets[,m], all.triplets[,k] == M))){
If so, then why does the first line below never produce an error?
count.vec <- count.vec[,-redundant.vec]
is.null(dim(count.vec)) ## TRUE
You are basically tabulating. Use the functions that are built for that.
It looks like what you want is along these lines:
tab.combns <- function(x) apply( combn( sort(x), M ),2,
function(x) paste(x,collapse=''))
tab.all <- table( unlist( lapply(sig.tf.pairs,tab.combns) ) )
Chuck
Mark
############################################################
M <- 3 # 3 for triplets, etc.
##########################################################
# count all triplets
all.triplets <- NULL
all.count.vec <- NULL
for (i in 1:length(sig.tf.pairs)){
if (length(sig.tf.pairs[[i]] >= M)){
triplets <- combn(sig.tf.pairs[[i]], M, simplify = TRUE)
for (j in 1:ncol(triplets)){
o <- order(triplets[,j])
triplets[,j] <- triplets[o,j]
count.vec <- rep(1, ncol(triplets))
}
if (is.null(all.count.vec)){
all.count.vec <- count.vec
all.triplets <- triplets
} else {
redundant.vec <- NULL
for (k in 1:ncol(all.triplets)){
for (m in 1:ncol(triplets)){
if (length(intersect(triplets[,m], all.triplets[,k] == M))){
all.count.vec[k] <- all.count.vec[k] + 1
redundant.vec <- c(redundant.vec, m)
}
}
}
if(!is.null(redundant.vec)){
triplets <- triplets[,-redundant.vec]
count.vec <- count.vec[,-redundant.vec]
}
all.triplets <- cbind(all.triplets, triplets)
all.count.vec <- c(all.count.vec, count.vec)
}
}
}
###################################
--
Mark W. Kimpel MD ** Neuroinformatics ** Dept. of Psychiatry
Indiana University School of Medicine
15032 Hunter Court, Westfield, IN 46074
(317) 490-5129 Work, & Mobile & VoiceMail
(317) 204-4202 Home (no voice mail please)
mwkimpel<at>gmail<dot>com
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Charles C. Berry (858) 534-2098
Dept of Family/Preventive
Medicine
E mailto:cberry at tajo.ucsd.edu UC San Diego
http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego 92093-0901