Regards,
John
--------------------------------
John Fox
Department of Sociology
McMaster University
Hamilton, Ontario
Canada L8S 4M4
905-525-9140x23604
http://socserv.mcmaster.ca/jfox
--------------------------------
-----Original Message-----
From: Liaw, Andy [mailto:andy_liaw at merck.com]
Sent: Monday, April 18, 2005 2:40 PM
To: 'John Fox'; 'Dren Scott'
Cc: 'R-Help'
Subject: RE: [R] Pearson corelation and p-value for matrix
I believe this will do:
cor.pval2 <- function(x, alternative="two-sided") {
corMat <- cor(x, use=if (any(is.na(x)))
"pairwise.complete.obs" else
"all")
df <- crossprod(!is.na(x)) - 2
STATISTIC <- sqrt(df) * corMat / sqrt(1 - corMat^2)
p <- pt(STATISTIC, df)
p <- if (alternative == "less") {
p
} else if (alternative == "greater") {
1 - p
} else 2 * pmin(p, 1 - p)
p
}
Some test:
set.seed(1)
x <- matrix(runif(2e3 * 1e2), 2e3)
system.time(res1 <- cor.pval(t(x)), gcFirst=TRUE)
[1] 17.28 0.77 18.16 NA NA
system.time(res2 <- cor.pval2(t(x)), gcFirst=TRUE)
[1] 19.51 1.05 20.70 NA NA
x[c(1, 3, 6), c(2, 4)] <- NA
x[30, 61] <- NA
system.time(res2 <- cor.pval2(t(x)), gcFirst=TRUE)
[1] 24.48 0.71 25.28 NA NA
This is a bit slower because of the extra computation for
"df". One can try to save some computation by only computing
with the lower (or upper) triangular part.
Cheers,
Andy
From: John Fox
Dear Dren,
Since cor(), on which Andy's solution is based, can compute
pairwise-present correlations, you could adapt his function
have to adjust the df for each pair. Alternatively, you
save some time (programming time + computer time) by just
R <- diag(100)
R[upper.tri(R)] <- R[lower.tri(R)] <- .5
library(mvtnorm)
X <- rmvnorm(6000, sigma=R)
system.time(for (i in 1:50) cor.pvalues(X), gc=TRUE)
[1] 518.19 1.11 520.23 NA NA
I know that time is money, but nine minutes (on my machine)
won't bankrupt anyone.
Regards,
John
--------------------------------
John Fox
Department of Sociology
McMaster University
Hamilton, Ontario
Canada L8S 4M4
905-525-9140x23604
http://socserv.mcmaster.ca/jfox
--------------------------------
-----Original Message-----
From: r-help-bounces at stat.math.ethz.ch
[mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of
Sent: Monday, April 18, 2005 12:41 PM
To: 'R-Help'
Subject: RE: [R] Pearson corelation and p-value for matrix
Hi all,
Thanks Andy, Mark and John for all the help. I really
I'm new to both R and statistics, so please excuse any
part.
Essentially what I'm trying to do, is to evaluate for
many other rows would have a p-value < 0.05. So, after I
N p-value matrix, I'll just filter out values that are > 0.05.
Each of my datasets (6000 rows x 100 columns) would
NA's. The iterative procedure (cor.pvalues) proposed by
yield the values, but it would take an inordinately
have 50 of these datasets to process). The solution proposed by
Andy, although fast, would not be able to incorporate the NA's.
Is there any workaround for the NA's? Or possibly do
could try something else?
Thanks very much.
Dren
"Liaw, Andy" <andy_liaw at merck.com> wrote:
From: John Fox
Dear Andy,
That's clearly much better -- and illustrates an
for vectorizing (or "matricizing") a computation. I think
this to my list of programming examples. It might be a little
dangerous to pass ...
through to cor(), since someone could specify
Ah, yes, the "..." isn't likely to help here! Also, it
work correctly if there are no NA's, for example (or else
of freedom would be wrong).
Best,
Andy
Thanks,
John
--------------------------------
John Fox
Department of Sociology
McMaster University
Hamilton, Ontario
Canada L8S 4M4
905-525-9140x23604
http://socserv.mcmaster.ca/jfox
--------------------------------
-----Original Message-----
From: Liaw, Andy [mailto:andy_liaw at merck.com]
Sent: Friday, April 15, 2005 9:51 PM
To: 'John Fox'; MSchwartz at medanalytics.com
Cc: 'R-Help'; 'Dren Scott'
Subject: RE: [R] Pearson corelation and p-value for matrix
We can be a bit sneaky and `borrow' code from
cor.pval <- function(x, alternative="two-sided", ...) {
cor(x, ...) n <- nrow(x) df <- n - 2 STATISTIC <-
/ sqrt(1 - corMat^2) p <- pt(STATISTIC, df) p <- if
"less") { p } else if (alternative == "greater") {
1 - p
} else 2 * pmin(p, 1 - p)
p
}
The test:
system.time(c1 <- cor.pvals(X), gcFirst=TRUE)
[1] 13.19 0.01 13.58 NA NA
system.time(c2 <- cor.pvalues(X), gcFirst=TRUE)
system.time(c3 <- cor.pval(X), gcFirst=TRUE)
[1] 0.07 0.00 0.07 NA NA
Cheers,
Andy
From: John Fox
Dear Mark,
I think that the reflex of trying to avoid loops in R
mistaken, and so I decided to try to time the two
3GHz Windows XP system).
I discovered, first, that there is a bug in your
appear to have indexed rows instead of columns;
cor.pvals <- function(mat)
{
cols <- expand.grid(1:ncol(mat), 1:ncol(mat))
1,
function(x) cor.test(mat[, x[1]], mat[,
ncol(mat)) }
My function is cor.pvalues and yours cor.pvals. This is
matrix with 1000 observations on 100 variables:
R <- diag(100)
R[upper.tri(R)] <- R[lower.tri(R)] <- .5
library(mvtnorm)
X <- rmvnorm(1000, sigma=R)
dim(X)
system.time(cor.pvalues(X))
system.time(cor.pvals(X))
[1] 12.66 0.00 12.66 NA NA
I frankly didn't expect the advantage of my approach to be
but there it is.
Regards,
John
-------------------------------- John Fox Department of
Sociology McMaster University Hamilton, Ontario
-----Original Message-----
From: Marc Schwartz [mailto:MSchwartz at MedAnalytics.com]
Sent: Friday, April 15, 2005 7:08 PM
To: John Fox
Cc: 'Dren Scott'; R-Help
Subject: RE: [R] Pearson corelation and p-value
Here is what might be a slightly more efficient way to
question:
cor.pvals <- function(mat)
{
rows <- expand.grid(1:nrow(mat), 1:nrow(mat))
1,
function(x) cor.test(mat[x[1], ], mat[x[2],
nrow(mat)) }
HTH,
Marc Schwartz
On Fri, 2005-04-15 at 18:26 -0400, John Fox wrote:
Dear Dren,
How about the following?
cor.pvalues <- function(X){
nc <- ncol(X)
res <- matrix(0, nc, nc)
for (i in 2:nc){
for (j in 1:(i - 1)){
res[i, j] <- res[j, i] <- cor.test(X[,i],
}
}
res
}
What one then does with all of those
question, I guess.
I hope this helps,
John
-----Original Message-----
From: r-help-bounces at stat.math.ethz.ch
[mailto:r-help-bounces at stat.math.ethz.ch]
Sent: Friday, April 15, 2005 4:33 PM
To: r-help at stat.math.ethz.ch
Subject: [R] Pearson corelation and p-value
Hi,
I was trying to evaluate the pearson correlation and
for an nxm matrix, where each row represents
it would be to iterate through each row,
value( and the p-value) with respect to the
some function by which I can use the matrix
output would be an nxn matrix, containing
respective vectors.
I have tried cor.test for the iterations, but
function that would take the matrix as input.
Thanks for the help.
Dren