how to replace my double for loop which is little efficient! - R-help

bbslover

Sun, Dec 26, 2010 4:18 AM #

Dear all,

My double for loop as follows, but it is little efficient, I hope all
friends can give me a "vectorized" program to replace my code. thanks

x: is a matrix  202*263,  that is 202 samples, and 263 independent variables

num.compd<-nrow(x); # number of compounds
diss.all<-0
for( i in 1:num.compd)
   for (j in 1:num.compd)
      if (i!=j) {
        S1<-sum(x[i,]*x[j,])
        S2<-sum(x[i,]^2)
        S3<-sum(x[j,]^2)
        sim2<-S1/(S2+S3-S1)
        diss2<-1-sim2
        diss.all<-diss.all+diss2}

it will cost a long time to finish this computation! i really need "rapid"
code to replace my code.

thanks

kevin

View this message in context: http://r.789695.n4.nabble.com/how-to-replace-my-double-for-loop-which-is-little-efficient-tp3164222p3164222.html
Sent from the R help mailing list archive at Nabble.com.

Berend Hasselman

Sun, Dec 26, 2010 6:13 AM #

bbslover wrote:

Alternative 1:  j-loop only needs to start at i+1 so

for( i in 1:num.compd) {
    for (j in seq(from=i+1,to=num.compd,length.out=max(0,num.compd-i))) {
            S1<-sum(x[i,]*x[j,])
            S2<-sum(x[i,]^2)
            S3<-sum(x[j,]^2)
            sim2<-S1/(S2+S3-S1)
            diss2<-1-sim2
            diss2.all<-diss2.all+diss2
    }
}
diss2.all <- 2 * diss2.all

On my pc this is about twice as fast as your version (with 202 samples and
263 variables)

Alternative 2: all sum() are not necessary. Use some matrix algebra:

xtx <- x %*% t(x)
diss3.all <- 0
for( i in 1:num.compd) {
    for (j in seq(from=i+1,to=num.compd,length.out=max(0,num.compd-i))) {
            S1 <- xtx[i,j]
            S2 <- xtx[i,i]
            S3 <- xtx[j,j]
            sim2<-S1/(S2+S3-S1)
            diss2<-1-sim2
            diss3.all<-diss3.all+diss2
    }
}
diss3.all <- 2 * diss3.all

This is about four times as fast as alternative 1.

I'm quite sure that more expert R gurus can get some more speed up.

Note: I generated the x matrix with:
set.seed(1);x<-matrix(runif(202*263),nrow=202)
(Timings on iMac 2.16Ghz and using 64-bit R)

Berend

View this message in context: http://r.789695.n4.nabble.com/how-to-replace-my-double-for-loop-which-is-little-efficient-tp3164222p3164262.html
Sent from the R help mailing list archive at Nabble.com.

Dennis Murphy

Sun, Dec 26, 2010 10:06 PM #

An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20101226/afaf4828/attachment.pl>

bbslover

Sun, Dec 26, 2010 10:33 PM #

thanks for your help, it is great. In addition, In the beginning, the format
of x is dataframe, and i run my code, it is so slow, after your help, I
change x for matirx, it is so quick. I am very grateful your kind help, and
your code is so good!

kevin

View this message in context: http://r.789695.n4.nabble.com/how-to-replace-my-double-for-loop-which-is-little-efficient-tp3164222p3164732.html
Sent from the R help mailing list archive at Nabble.com.

bbslover

Sun, Dec 26, 2010 11:10 PM #

thanks for your help. I am sorry I do not full understand your code, so i can
not correct using your code to my data. here is the attachment of my data,
and what I want to compute is the equation in the word document of the
attachment:

the code form Berend can get the answer i want to get.

http://r.789695.n4.nabble.com/file/n3164741/my_data.rar my_data.rar

View this message in context: http://r.789695.n4.nabble.com/how-to-replace-my-double-for-loop-which-is-little-efficient-tp3164222p3164741.html
Sent from the R help mailing list archive at Nabble.com.

Berend Hasselman

Sun, Dec 26, 2010 11:39 PM #

djmuseR wrote:

I did some more work along Dennis' lines

xtx <- tcrossprod(x)
xtd <- diag(xtx)
xzz <- outer(xtd,xtd,'+')
zz  <- 1 - xtx/(xzz-xtx)
diss.all <- sum(zz)

this appears to give the desired result and it's quite a bit faster than my
alternative 2.
It would indeed be nice to know what is being computed.

Berend

View this message in context: http://r.789695.n4.nabble.com/how-to-replace-my-double-for-loop-which-is-little-efficient-tp3164222p3164755.html
Sent from the R help mailing list archive at Nabble.com.

Berend Hasselman

Mon, Dec 27, 2010 12:05 AM #

bbslover wrote:

I've seen what you want to do.
OpenOffice mangled the equations so I used an online conversion to pdf.
Thanks.

See my previous post for the fastest version. It's just converting your
formulas with some matrix algebra.

Berend

View this message in context: http://r.789695.n4.nabble.com/how-to-replace-my-double-for-loop-which-is-little-efficient-tp3164222p3164767.html
Sent from the R help mailing list archive at Nabble.com.

Berend Hasselman

Mon, Dec 27, 2010 12:33 AM #

Found this:
http://en.wikipedia.org/wiki/Jaccard_index#Tanimoto_coefficient_.28extended_Jaccard_coefficient.29

and an R site search with "tanimoto" yielded some more interesting stuff.

The rest is up to you.

Berend

View this message in context: http://r.789695.n4.nabble.com/how-to-replace-my-double-for-loop-which-is-little-efficient-tp3164222p3164785.html
Sent from the R help mailing list archive at Nabble.com.

bbslover

Mon, Dec 27, 2010 4:16 AM #

Thank Berend,

It seems like that it is better to attach a PDF file for avoiding messy
code.

Yes, I want to obtain is Tanimoto coefficient and your web site "wikipedia"
is about this coefficient. I also search R site about tanimoto coefficient
and learn it more. 

About your code, I has saved and learned it. 

Thanks again


Kevin

View this message in context: http://r.789695.n4.nabble.com/how-to-replace-my-double-for-loop-which-is-little-efficient-tp3164222p3164920.html
Sent from the R help mailing list archive at Nabble.com.