-----Original Message-----
From: r-help-bounces at stat.math.ethz.ch
[mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Dren Scott
Sent: Monday, April 18, 2005 12:41 PM
To: 'R-Help'
Subject: RE: [R] Pearson corelation and p-value for matrix
Hi all,
Thanks Andy, Mark and John for all the help. I really
appreciate it. I'm new to both R and statistics, so please
excuse any gaffes on my part.
Essentially what I'm trying to do, is to evaluate for each
row, how many other rows would have a p-value < 0.05. So,
after I get my N x N p-value matrix, I'll just filter out
values that are > 0.05.
Each of my datasets (6000 rows x 100 columns) would consist
of some NA's. The iterative procedure (cor.pvalues) proposed
by John would yield the values, but it would take an
inordinately long time (I have 50 of these datasets to
process). The solution proposed by Andy, although fast, would
not be able to incorporate the NA's.
Is there any workaround for the NA's? Or possibly do you
think I could try something else?
Thanks very much.
Dren
"Liaw, Andy" <andy_liaw at merck.com> wrote:
From: John Fox
Dear Andy,
That's clearly much better -- and illustrates an effective strategy
for vectorizing (or "matricizing") a computation. I think I'll add
this to my list of programming examples. It might be a little
dangerous to pass ...
through to cor(), since someone could specify type="spearman", for
example.
Ah, yes, the "..." isn't likely to help here! Also, it will
only work correctly if there are no NA's, for example (or
else the degree of freedom would be wrong).
Best,
Andy
Thanks,
John
--------------------------------
John Fox
Department of Sociology
McMaster University
Hamilton, Ontario
Canada L8S 4M4
905-525-9140x23604
http://socserv.mcmaster.ca/jfox
--------------------------------
-----Original Message-----
From: Liaw, Andy [mailto:andy_liaw at merck.com]
Sent: Friday, April 15, 2005 9:51 PM
To: 'John Fox'; MSchwartz at medanalytics.com
Cc: 'R-Help'; 'Dren Scott'
Subject: RE: [R] Pearson corelation and p-value for matrix
We can be a bit sneaky and `borrow' code from cor.test.default:
cor.pval <- function(x, alternative="two-sided", ...) { corMat <-
cor(x, ...) n <- nrow(x) df <- n - 2 STATISTIC <-
/ sqrt(1 - corMat^2) p <- pt(STATISTIC, df) p <- if
"less") { p } else if (alternative == "greater") {
1 - p
} else 2 * pmin(p, 1 - p)
p
}
The test:
system.time(c1 <- cor.pvals(X), gcFirst=TRUE)
[1] 13.19 0.01 13.58 NA NA
system.time(c2 <- cor.pvalues(X), gcFirst=TRUE)
system.time(c3 <- cor.pval(X), gcFirst=TRUE)
[1] 0.07 0.00 0.07 NA NA
Cheers,
Andy
From: John Fox
Dear Mark,
I think that the reflex of trying to avoid loops in R is often
mistaken, and so I decided to try to time the two
3GHz Windows XP system).
I discovered, first, that there is a bug in your
appear to have indexed rows instead of columns; fixing that:
cor.pvals <- function(mat)
{
cols <- expand.grid(1:ncol(mat), 1:ncol(mat))
1,
function(x) cor.test(mat[, x[1]], mat[, x[2]])$p.value), ncol =
ncol(mat)) }
My function is cor.pvalues and yours cor.pvals. This is
matrix with 1000 observations on 100 variables:
R <- diag(100)
R[upper.tri(R)] <- R[lower.tri(R)] <- .5
library(mvtnorm)
X <- rmvnorm(1000, sigma=R)
dim(X)
system.time(cor.pvalues(X))
system.time(cor.pvals(X))
[1] 12.66 0.00 12.66 NA NA
I frankly didn't expect the advantage of my approach to be
but there it is.
Regards,
John
--------------------------------
John Fox
Department of Sociology
McMaster University
Hamilton, Ontario
Canada L8S 4M4
905-525-9140x23604
http://socserv.mcmaster.ca/jfox
--------------------------------
-----Original Message-----
From: Marc Schwartz [mailto:MSchwartz at MedAnalytics.com]
Sent: Friday, April 15, 2005 7:08 PM
To: John Fox
Cc: 'Dren Scott'; R-Help
Subject: RE: [R] Pearson corelation and p-value for matrix
Here is what might be a slightly more efficient way to
question:
cor.pvals <- function(mat)
{
rows <- expand.grid(1:nrow(mat), 1:nrow(mat))
1,
function(x) cor.test(mat[x[1], ], mat[x[2],
nrow(mat)) }
HTH,
Marc Schwartz
On Fri, 2005-04-15 at 18:26 -0400, John Fox wrote:
Dear Dren,
How about the following?
cor.pvalues <- function(X){
nc <- ncol(X)
res <- matrix(0, nc, nc)
for (i in 2:nc){
for (j in 1:(i - 1)){
res[i, j] <- res[j, i] <- cor.test(X[,i],
}
}
res
}
What one then does with all of those non-independent test
question, I guess.
I hope this helps,
John
-----Original Message-----
From: r-help-bounces at stat.math.ethz.ch
[mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of
Sent: Friday, April 15, 2005 4:33 PM
To: r-help at stat.math.ethz.ch
Subject: [R] Pearson corelation and p-value for matrix
Hi,
I was trying to evaluate the pearson correlation and
for an nxm matrix, where each row represents a vector.
it would be to iterate through each row, and find its
value( and the p-value) with respect to the other rows.
some function by which I can use the matrix as input?
output would be an nxn matrix, containing the p-values
respective vectors.
I have tried cor.test for the iterations, but
function that would take the matrix as input.
Thanks for the help.
Dren