Skip to content

Pearson corelation and p-value for matrix

6 messages · Dren Scott, John Fox, Marc Schwartz

#
Dear Dren,

How about the following?

 cor.pvalues <- function(X){
    nc <- ncol(X)
    res <- matrix(0, nc, nc)
    for (i in 2:nc){
        for (j in 1:(i - 1)){
            res[i, j] <- res[j, i] <- cor.test(X[,i], X[,j])$p.value
            }
        }
    res
    }

What one then does with all of those non-independent test is another
question, I guess.

I hope this helps,
 John

--------------------------------
John Fox
Department of Sociology
McMaster University
Hamilton, Ontario
Canada L8S 4M4
905-525-9140x23604
http://socserv.mcmaster.ca/jfox 
--------------------------------
#
Here is what might be a slightly more efficient way to get to John's
question:

cor.pvals <- function(mat)
{
  rows <- expand.grid(1:nrow(mat), 1:nrow(mat))
  matrix(apply(rows, 1,
               function(x) cor.test(mat[x[1], ], mat[x[2], ])$p.value),
         ncol = nrow(mat))
}

HTH,

Marc Schwartz
On Fri, 2005-04-15 at 18:26 -0400, John Fox wrote:
#
Dear Mark,

I think that the reflex of trying to avoid loops in R is often mistaken, and
so I decided to try to time the two approaches (on a 3GHz Windows XP
system).

I discovered, first, that there is a bug in your function -- you appear to
have indexed rows instead of columns; fixing that:

cor.pvals <- function(mat)
{
  cols <- expand.grid(1:ncol(mat), 1:ncol(mat))
  matrix(apply(cols, 1,
               function(x) cor.test(mat[, x[1]], mat[, x[2]])$p.value),
         ncol = ncol(mat))
}


My function is cor.pvalues and yours cor.pvals. This is for a data matrix
with 1000 observations on 100 variables:
[1] 1000  100
[1] 5.53 0.00 5.53   NA   NA
[1] 12.66  0.00 12.66    NA    NA
I frankly didn't expect the advantage of my approach to be this large, but
there it is.

Regards,
 John

--------------------------------
John Fox
Department of Sociology
McMaster University
Hamilton, Ontario
Canada L8S 4M4
905-525-9140x23604
http://socserv.mcmaster.ca/jfox 
--------------------------------
#
John,

Interesting test. Thanks for pointing that out.

You are right, there is a knee-jerk reaction to avoid loops, especially
nested loops.

On the indexing of rows, I did that because Dren had indicated in his
initial post:

 "I was trying to evaluate the pearson correlation and the p-values 
  for an nxm matrix, where each row represents a vector.
  One way to do it would be to iterate through each row, and find its
  correlation value( and the p-value) with respect to the other rows."

So I ran the correlations by row, rather than by column.

Thanks again. Good lesson.

Marc
On Fri, 2005-04-15 at 21:36 -0400, John Fox wrote:
#
Dear Mark,
That's the second time yesterday that I responded to a posting without
reading it carefully enough -- a good lesson for me. I guess that Dren could
just apply my solution to the transpose of his matrix -- i.e.,
cor.pvalues(t(X)).

Sorry,
 John