self-defined distance function to be computed on matrix
On Thu, Aug 30, 2012 at 10:48 AM, zz <czhang at uams.edu> wrote:
Hello, I have a self-defined function to be computed on each column in a matrix. The basic idea is to ignore the elements that have value of 0 during computation. I should be able to write my own function but it could be computational expensive, so I'd love to ask if anyone may have suggestions on how to implement it more efficiently. Thanks in advance. For example, there are three vectors in the matrix, which are A B C 1 0 1 -1 1 1 -1 -1 1 1 0 -1 Distance(AB) = (-1X1+(-1)X(-1))/de(AB) , and de(AB) = sqrt(square(-1)+square(-1)) X sqrt(square(1)+square(-1)) Distance(BC) = (1X1+(-1)X1)/de(BC) ,and de(BC) = sqrt(square(1)+square(-1)) X sqrt(square(1)+square(1)) Distance(AC) = (1X1+(-1)X1+(-1)X1+1X(-1))/de(AC), and de(BC) = sqrt(square(1)+square(-1)+square(-1)+square(1)) X sqrt(square(1)+square(1)+square(1)+square(-1)) As you may see, the numerator is basically the dot product of the two vectors; this function actually is more like the cosine function in R, but with some variations.
If I understand it correctly, you are trying to calculate the "cosine correlation" while excluding all rows where one of the wto columns has a zero? There may be other ways to do it, but (shameless plug) my package WGCNA defines a replacement for the usual correlation function cor() that lets you specify the argument cosine = TRUE to calculate cosine correlation (i.e., Pearson correlation without centering). To ignore the zeroes, turn them into NA, and specify argument use = "pairwise.complete.obs" (or just use = "p") to the function cor. So define a matrix (say ABC), set all zero values to NA ABC[ABC==0] = NA then issue library(WGCNA) sim = cor(ABC, cosine = TRUE, use = 'p') Note that the correlation gives you a similarity; to turn it into a dissimilarity or distance you have to subtract it from 1 dissim = 1-sim HTH, Peter