Correlate

Dear Val,
For the time being  I am assuming the relationship across  variables
is linear.  I want get the values first  and detailed examining  of
the relationship will follow later.
This seems backwards to me, but I'll refrain from commenting further on 
whether what you want to do makes sense and instead address how to do it 
(not, BTW, because I disagree with Bert's and Tim's remarks).

Please see below:
On Mon, Aug 22, 2022 at 12:23 PM Ebert,Timothy Aaron <tebert at ufl.edu> wrote:
I (maybe) agree, but I would go further than that. There are assumptions associated with the test that are missing. It is not clear that the relationships are all linear. Regardless of a "significant outcome" all of the relationships need to be explored in more detail than what is provided in the correlation test.

Multiplicity adjustment as in : https://www.sciencedirect.com/science/article/pii/S0197245600001069 is not an issue that I can see in these data from the information provided. At least not in the same sense as used in the link.

My first guess at the meaning of "multiplicity adjustment" was closer to the experimentwise error rate in a multiple comparison procedure. https://dictionary.apa.org/experiment-wise-error-rateEssentially, the type 1 error rate is inflated the more test you do and if you perform enough tests you find significant outcomes by chance alone. There is great significance in the Redskins rule: https://en.wikipedia.org/wiki/Redskins_Rule.

A simple solution is to apply a Bonferroni correction where alpha is divided by the number of comparisons. If there are 250, then 0.05/250 = 0.0002. Another approach is to try to discuss the outcomes in a way that makes sense. What is the connection between a football team's last home game an the election result that would enable me to take another team and apply their last home game result to the outcome of a different election?

Another complication is if variables x2 through x250 are themselves correlated. Not enough information was provided in the problem to know if this is an issue, but 250 orthogonal variables in a real dataset would be a bit unusual considering the experimentwise error rate previously mentioned.

Large datasets can be very messy.

Tim

-----Original Message-----
From: Bert Gunter <bgunter.4567 at gmail.com>
Sent: Monday, August 22, 2022 12:07 PM
To: Ebert,Timothy Aaron <tebert at ufl.edu>
Cc: Val <valkremk at gmail.com>; r-help at R-project.org (r-help at r-project.org) <r-help at r-project.org>
Subject: Re: [R] Correlate

[External Email]

... But of course the p-values are essentially meaningless without some sort of multiplicity adjustment.
(search on "multiplicity adjustment" for details). :-(

-- Bert

On Mon, Aug 22, 2022 at 8:59 AM Ebert,Timothy Aaron <tebert at ufl.edu> wrote:
A somewhat clunky solution:
for(i in colnames(dat)){
   print(cor.test(dat[,i], dat$x1, method = "pearson", use = "complete.obs")$estimate)
   print(cor.test(dat[,i], dat$x1, method = "pearson", use =
"complete.obs")$p.value) }
Because of missing data, this computes the correlations on different 
subsets of the data. A simple solution is to filter the data for NAs:

D <- na.omit(dat)

More comments below:
Rather than printing you could set up an array or list to save the results.

Tim

-----Original Message-----
From: R-help <r-help-bounces at r-project.org> On Behalf Of Val
Sent: Monday, August 22, 2022 11:09 AM
To: r-help at R-project.org (r-help at r-project.org) <r-help at r-project.org>
Subject: [R] Correlate

[External Email]

Hi all,

I have a data set with  ~250  variables(columns).  I want to calculate
the correlation of  one variable with the rest of the other variables
and also want  the p-values  for each correlation.  Please see the
sample data and my attempt.  I  have got the correlation but unable to
get the p-values

dat <- read.table(text="x1 x2 x3 x4
            1.68 -0.96 -1.25  0.61
           -0.06  0.41  0.06 -0.96
               .    0.08  1.14  1.42
            0.80 -0.67  0.53 -0.68
            0.23 -0.97 -1.18 -0.78
           -1.03  1.11 -0.61    .
            2.15     .    0.02  0.66
            0.35 -0.37 -0.26  0.39
           -0.66  0.89   .    -1.49
            0.11  1.52  0.73  -1.03",header=TRUE)

#change all to numeric
     dat[] <- lapply(dat, function(x) as.numeric(as.character(x)))
This data manipulation is unnecessary. Just specify the argument 
na.strings="." to read.table().
     data_cor <- cor(dat[ , colnames(dat) != "x1"],  dat$x1, method =
"pearson", use = "complete.obs")

Result
               [,1]
x2 -0.5845835
x3 -0.4664220
x4  0.7202837

How do I get the p-values ?
Taking a somewhat different approach from cor.test(), you can apply 
Fisher's z-transformation (recall that D is the data filtered for NAs):

 > 2*pnorm(abs(atanh(data_cor)), sd=1/sqrt(nrow(D) - 3), lower.tail=FALSE)
         [,1]
x2 0.2462807
x3 0.3812854
x4 0.1156939

I hope this helps,
  John
Thank you,

______________________________________________
R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fstat
.ethz.ch%2Fmailman%2Flistinfo%2Fr-help&amp;data=05%7C01%7Ctebert%40ufl
.edu%7C871d5009dd3c455f398f08da84585e4a%7C0d4da0f84a314d76ace60a62331e
1b84%7C0%7C0%7C637967812337328788%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4w
LjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C
&amp;sdata=3iAfMs1QzQARKF3lqUI8s43PX4IIkgEuQ9PUDyUtpqY%3D&amp;reserved
=0 PLEASE do read the posting guide
https://nam10.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.r
-project.org%2Fposting-guide.html&amp;data=05%7C01%7Ctebert%40ufl.edu%
7C871d5009dd3c455f398f08da84585e4a%7C0d4da0f84a314d76ace60a62331e1b84%
7C0%7C0%7C637967812337328788%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwM
DAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;
sdata=v3IEonnPgg1xTKUzLK4rJc3cfMFxw5p%2FW6puha5CFz0%3D&amp;reserved=0
and provide commented, minimal, self-contained, reproducible code.

______________________________________________
R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fstat
.ethz.ch%2Fmailman%2Flistinfo%2Fr-help&amp;data=05%7C01%7Ctebert%40ufl
.edu%7C871d5009dd3c455f398f08da84585e4a%7C0d4da0f84a314d76ace60a62331e
1b84%7C0%7C0%7C637967812337328788%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4w
LjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C
&amp;sdata=3iAfMs1QzQARKF3lqUI8s43PX4IIkgEuQ9PUDyUtpqY%3D&amp;reserved
=0 PLEASE do read the posting guide
https://nam10.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.r
-project.org%2Fposting-guide.html&amp;data=05%7C01%7Ctebert%40ufl.edu%
7C871d5009dd3c455f398f08da84585e4a%7C0d4da0f84a314d76ace60a62331e1b84%
7C0%7C0%7C637967812337328788%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwM
DAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;
sdata=v3IEonnPgg1xTKUzLK4rJc3cfMFxw5p%2FW6puha5CFz0%3D&amp;reserved=0
and provide commented, minimal, self-contained, reproducible code.

______________________________________________
R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
John Fox, Professor Emeritus
McMaster University
Hamilton, Ontario, Canada
web: https://socialsciences.mcmaster.ca/jfox/
Hi John and Timothy

Thank you for your suggestion and help. Using the sample data, I did
carry out a test run and found a difference in the correlation result.

Option 1.
data_cor <- cor(dat[ , colnames(dat) != "x1"],  # Calculate correlations
                    dat$x1, method = "pearson", use = "complete.obs")
resulted
                 [,1]
    x2 -0.5845835
    x3 -0.4664220
    x4  0.7202837

Option 2.
 for(i in colnames(dat)){
      print(cor.test(dat[,i], dat$x1, method = "pearson", use =
"complete.obs")$estimate)
    }
           [,1]
x2  -0.7362030
x3  -0.04935132
x4   0.85766290

This was crosschecked  using Excel and other softwares and all matches
with option 2.
One of the factors that contributed for this difference  is loss of
information when we are using na.rm(). This is because that if x2 has
missing value but x3 and x4 don?t have then  na.rm()  removed  entire
row information including x3 and x4.

My question is there  a way to extract the number of rows (N)  used in
the correlation analysis?.
Thank you,
Dear Val,

On 2022-08-22 1:33 p.m., Val wrote:
For the time being  I am assuming the relationship across  variables
is linear.  I want get the values first  and detailed examining  of
the relationship will follow later.
This seems backwards to me, but I'll refrain from commenting further on
whether what you want to do makes sense and instead address how to do it
(not, BTW, because I disagree with Bert's and Tim's remarks).

Please see below:

On Mon, Aug 22, 2022 at 12:23 PM Ebert,Timothy Aaron <tebert at ufl.edu> wrote:
I (maybe) agree, but I would go further than that. There are assumptions associated with the test that are missing. It is not clear that the relationships are all linear. Regardless of a "significant outcome" all of the relationships need to be explored in more detail than what is provided in the correlation test.

Multiplicity adjustment as in : https://www.sciencedirect.com/science/article/pii/S0197245600001069 is not an issue that I can see in these data from the information provided. At least not in the same sense as used in the link.

My first guess at the meaning of "multiplicity adjustment" was closer to the experimentwise error rate in a multiple comparison procedure. https://dictionary.apa.org/experiment-wise-error-rateEssentially, the type 1 error rate is inflated the more test you do and if you perform enough tests you find significant outcomes by chance alone. There is great significance in the Redskins rule: https://en.wikipedia.org/wiki/Redskins_Rule.

A simple solution is to apply a Bonferroni correction where alpha is divided by the number of comparisons. If there are 250, then 0.05/250 = 0.0002. Another approach is to try to discuss the outcomes in a way that makes sense. What is the connection between a football team's last home game an the election result that would enable me to take another team and apply their last home game result to the outcome of a different election?

Another complication is if variables x2 through x250 are themselves correlated. Not enough information was provided in the problem to know if this is an issue, but 250 orthogonal variables in a real dataset would be a bit unusual considering the experimentwise error rate previously mentioned.

Large datasets can be very messy.

Tim

-----Original Message-----
From: Bert Gunter <bgunter.4567 at gmail.com>
Sent: Monday, August 22, 2022 12:07 PM
To: Ebert,Timothy Aaron <tebert at ufl.edu>
Cc: Val <valkremk at gmail.com>; r-help at R-project.org (r-help at r-project.org) <r-help at r-project.org>
Subject: Re: [R] Correlate

[External Email]

... But of course the p-values are essentially meaningless without some sort of multiplicity adjustment.
(search on "multiplicity adjustment" for details). :-(

-- Bert

On Mon, Aug 22, 2022 at 8:59 AM Ebert,Timothy Aaron <tebert at ufl.edu> wrote:
A somewhat clunky solution:
for(i in colnames(dat)){
   print(cor.test(dat[,i], dat$x1, method = "pearson", use = "complete.obs")$estimate)
   print(cor.test(dat[,i], dat$x1, method = "pearson", use =
"complete.obs")$p.value) }
Because of missing data, this computes the correlations on different
subsets of the data. A simple solution is to filter the data for NAs:

D <- na.omit(dat)

More comments below:

Rather than printing you could set up an array or list to save the results.

Tim

-----Original Message-----
From: R-help <r-help-bounces at r-project.org> On Behalf Of Val
Sent: Monday, August 22, 2022 11:09 AM
To: r-help at R-project.org (r-help at r-project.org) <r-help at r-project.org>
Subject: [R] Correlate

[External Email]

Hi all,

I have a data set with  ~250  variables(columns).  I want to calculate
the correlation of  one variable with the rest of the other variables
and also want  the p-values  for each correlation.  Please see the
sample data and my attempt.  I  have got the correlation but unable to
get the p-values

dat <- read.table(text="x1 x2 x3 x4
            1.68 -0.96 -1.25  0.61
           -0.06  0.41  0.06 -0.96
               .    0.08  1.14  1.42
            0.80 -0.67  0.53 -0.68
            0.23 -0.97 -1.18 -0.78
           -1.03  1.11 -0.61    .
            2.15     .    0.02  0.66
            0.35 -0.37 -0.26  0.39
           -0.66  0.89   .    -1.49
            0.11  1.52  0.73  -1.03",header=TRUE)

#change all to numeric
     dat[] <- lapply(dat, function(x) as.numeric(as.character(x)))
This data manipulation is unnecessary. Just specify the argument
na.strings="." to read.table().

     data_cor <- cor(dat[ , colnames(dat) != "x1"],  dat$x1, method =
"pearson", use = "complete.obs")

Result
               [,1]
x2 -0.5845835
x3 -0.4664220
x4  0.7202837

How do I get the p-values ?
Taking a somewhat different approach from cor.test(), you can apply
Fisher's z-transformation (recall that D is the data filtered for NAs):

 > 2*pnorm(abs(atanh(data_cor)), sd=1/sqrt(nrow(D) - 3), lower.tail=FALSE)
         [,1]
x2 0.2462807
x3 0.3812854
x4 0.1156939

I hope this helps,
  John

Thank you,

______________________________________________
R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fstat
.ethz.ch%2Fmailman%2Flistinfo%2Fr-help&amp;data=05%7C01%7Ctebert%40ufl
.edu%7C871d5009dd3c455f398f08da84585e4a%7C0d4da0f84a314d76ace60a62331e
1b84%7C0%7C0%7C637967812337328788%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4w
LjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C
&amp;sdata=3iAfMs1QzQARKF3lqUI8s43PX4IIkgEuQ9PUDyUtpqY%3D&amp;reserved
=0 PLEASE do read the posting guide
https://nam10.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.r
-project.org%2Fposting-guide.html&amp;data=05%7C01%7Ctebert%40ufl.edu%
7C871d5009dd3c455f398f08da84585e4a%7C0d4da0f84a314d76ace60a62331e1b84%
7C0%7C0%7C637967812337328788%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwM
DAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;
sdata=v3IEonnPgg1xTKUzLK4rJc3cfMFxw5p%2FW6puha5CFz0%3D&amp;reserved=0
and provide commented, minimal, self-contained, reproducible code.

______________________________________________
R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fstat
.ethz.ch%2Fmailman%2Flistinfo%2Fr-help&amp;data=05%7C01%7Ctebert%40ufl
.edu%7C871d5009dd3c455f398f08da84585e4a%7C0d4da0f84a314d76ace60a62331e
1b84%7C0%7C0%7C637967812337328788%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4w
LjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C
&amp;sdata=3iAfMs1QzQARKF3lqUI8s43PX4IIkgEuQ9PUDyUtpqY%3D&amp;reserved
=0 PLEASE do read the posting guide
https://nam10.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.r
-project.org%2Fposting-guide.html&amp;data=05%7C01%7Ctebert%40ufl.edu%
7C871d5009dd3c455f398f08da84585e4a%7C0d4da0f84a314d76ace60a62331e1b84%
7C0%7C0%7C637967812337328788%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwM
DAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;
sdata=v3IEonnPgg1xTKUzLK4rJc3cfMFxw5p%2FW6puha5CFz0%3D&amp;reserved=0
and provide commented, minimal, self-contained, reproducible code.

______________________________________________
R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
--
John Fox, Professor Emeritus
McMaster University
Hamilton, Ontario, Canada
web: https://socialsciences.mcmaster.ca/jfox/

Dear Val,
Hi John and Timothy

Thank you for your suggestion and help. Using the sample data, I did
carry out a test run and found a difference in the correlation result.

Option 1.
data_cor <- cor(dat[ , colnames(dat) != "x1"],  # Calculate correlations
                     dat$x1, method = "pearson", use = "complete.obs")
resulted
                  [,1]
     x2 -0.5845835
     x3 -0.4664220
     x4  0.7202837

Option 2.
  for(i in colnames(dat)){
       print(cor.test(dat[,i], dat$x1, method = "pearson", use =
"complete.obs")$estimate)
     }
            [,1]
x2  -0.7362030
x3  -0.04935132
x4   0.85766290

This was crosschecked  using Excel and other softwares and all matches
with option 2.
One of the factors that contributed for this difference  is loss of
information when we are using na.rm(). This is because that if x2 has
missing value but x3 and x4 don?t have then  na.rm()  removed  entire
row information including x3 and x4.
Yes, I already explained that in my previous message.

As well, cor() is capable of computing pairwise-complete correlations -- 
see ?cor.

There's not an obvious right answer here, however. Using 
pairwise-complete correlations can produce inconsistent (i.e., 
non-positive semi-definite) correlation matrices because correlations 
are computed on different subsets of the data.

There are much better ways to deal with missing data.
My question is there  a way to extract the number of rows (N)  used in
the correlation analysis?.
I'm sure that there are many ways, but here is one that is very 
simple-minded and should be reasonably efficient for ~250 variables:

 > (nc <- ncol(dat))
[1] 4

 > R <- N <- matrix(NA, nc, nc)
 > diag(R) <- 1
 > for (i in 1:(nc - 1)){
+   for (j in (i + 1):nc){
+     R[i, j] <- R[j, i] <-cor(dat[, i], dat[, j], use="complete.obs")
+     N[i, j] <- N[j, i] <- nrow(na.omit(dat[, c(i, j)]))
+   }
+ }

 > round(R, 3)
        [,1]   [,2]   [,3]   [,4]
[1,]  1.000 -0.736 -0.049  0.858
[2,] -0.736  1.000  0.458 -0.428
[3,] -0.049  0.458  1.000  0.092
[4,]  0.858 -0.428  0.092  1.000

 > N
      [,1] [,2] [,3] [,4]
[1,]   NA    8    8    8
[2,]    8   NA    8    8
[3,]    8    8   NA    8
[4,]    8    8    8   NA

 > round(cor(dat, use="pairwise.complete.obs"), 3) # check
        x1     x2     x3     x4
x1  1.000 -0.736 -0.049  0.858
x2 -0.736  1.000  0.458 -0.428
x3 -0.049  0.458  1.000  0.092
x4  0.858 -0.428  0.092  1.000

More generally, I think that it's a good idea to learn a little bit 
about R programming if you intend to use R in your work. You'll then be 
able to solve problems like this yourself.

I hope this helps,
  John
Thank you,

On Mon, Aug 22, 2022 at 1:00 PM John Fox <jfox at mcmaster.ca> wrote:
Dear Val,

On 2022-08-22 1:33 p.m., Val wrote:
For the time being  I am assuming the relationship across  variables
is linear.  I want get the values first  and detailed examining  of
the relationship will follow later.
This seems backwards to me, but I'll refrain from commenting further on
whether what you want to do makes sense and instead address how to do it
(not, BTW, because I disagree with Bert's and Tim's remarks).

Please see below:

On Mon, Aug 22, 2022 at 12:23 PM Ebert,Timothy Aaron <tebert at ufl.edu> wrote:
I (maybe) agree, but I would go further than that. There are assumptions associated with the test that are missing. It is not clear that the relationships are all linear. Regardless of a "significant outcome" all of the relationships need to be explored in more detail than what is provided in the correlation test.

Multiplicity adjustment as in : https://www.sciencedirect.com/science/article/pii/S0197245600001069 is not an issue that I can see in these data from the information provided. At least not in the same sense as used in the link.

My first guess at the meaning of "multiplicity adjustment" was closer to the experimentwise error rate in a multiple comparison procedure. https://dictionary.apa.org/experiment-wise-error-rateEssentially, the type 1 error rate is inflated the more test you do and if you perform enough tests you find significant outcomes by chance alone. There is great significance in the Redskins rule: https://en.wikipedia.org/wiki/Redskins_Rule.

A simple solution is to apply a Bonferroni correction where alpha is divided by the number of comparisons. If there are 250, then 0.05/250 = 0.0002. Another approach is to try to discuss the outcomes in a way that makes sense. What is the connection between a football team's last home game an the election result that would enable me to take another team and apply their last home game result to the outcome of a different election?

Another complication is if variables x2 through x250 are themselves correlated. Not enough information was provided in the problem to know if this is an issue, but 250 orthogonal variables in a real dataset would be a bit unusual considering the experimentwise error rate previously mentioned.

Large datasets can be very messy.

Tim

-----Original Message-----
From: Bert Gunter <bgunter.4567 at gmail.com>
Sent: Monday, August 22, 2022 12:07 PM
To: Ebert,Timothy Aaron <tebert at ufl.edu>
Cc: Val <valkremk at gmail.com>; r-help at R-project.org (r-help at r-project.org) <r-help at r-project.org>
Subject: Re: [R] Correlate

[External Email]

... But of course the p-values are essentially meaningless without some sort of multiplicity adjustment.
(search on "multiplicity adjustment" for details). :-(

-- Bert

On Mon, Aug 22, 2022 at 8:59 AM Ebert,Timothy Aaron <tebert at ufl.edu> wrote:
A somewhat clunky solution:
for(i in colnames(dat)){
    print(cor.test(dat[,i], dat$x1, method = "pearson", use = "complete.obs")$estimate)
    print(cor.test(dat[,i], dat$x1, method = "pearson", use =
"complete.obs")$p.value) }
Because of missing data, this computes the correlations on different
subsets of the data. A simple solution is to filter the data for NAs:

D <- na.omit(dat)

More comments below:

Rather than printing you could set up an array or list to save the results.

Tim

-----Original Message-----
From: R-help <r-help-bounces at r-project.org> On Behalf Of Val
Sent: Monday, August 22, 2022 11:09 AM
To: r-help at R-project.org (r-help at r-project.org) <r-help at r-project.org>
Subject: [R] Correlate

[External Email]

Hi all,

I have a data set with  ~250  variables(columns).  I want to calculate
the correlation of  one variable with the rest of the other variables
and also want  the p-values  for each correlation.  Please see the
sample data and my attempt.  I  have got the correlation but unable to
get the p-values

dat <- read.table(text="x1 x2 x3 x4
             1.68 -0.96 -1.25  0.61
            -0.06  0.41  0.06 -0.96
                .    0.08  1.14  1.42
             0.80 -0.67  0.53 -0.68
             0.23 -0.97 -1.18 -0.78
            -1.03  1.11 -0.61    .
             2.15     .    0.02  0.66
             0.35 -0.37 -0.26  0.39
            -0.66  0.89   .    -1.49
             0.11  1.52  0.73  -1.03",header=TRUE)

#change all to numeric
      dat[] <- lapply(dat, function(x) as.numeric(as.character(x)))
This data manipulation is unnecessary. Just specify the argument
na.strings="." to read.table().

      data_cor <- cor(dat[ , colnames(dat) != "x1"],  dat$x1, method =
"pearson", use = "complete.obs")

Result
                [,1]
x2 -0.5845835
x3 -0.4664220
x4  0.7202837

How do I get the p-values ?
Taking a somewhat different approach from cor.test(), you can apply
Fisher's z-transformation (recall that D is the data filtered for NAs):

  > 2*pnorm(abs(atanh(data_cor)), sd=1/sqrt(nrow(D) - 3), lower.tail=FALSE)
          [,1]
x2 0.2462807
x3 0.3812854
x4 0.1156939

I hope this helps,
   John

Thank you,

______________________________________________
R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fstat
.ethz.ch%2Fmailman%2Flistinfo%2Fr-help&amp;data=05%7C01%7Ctebert%40ufl
.edu%7C871d5009dd3c455f398f08da84585e4a%7C0d4da0f84a314d76ace60a62331e
1b84%7C0%7C0%7C637967812337328788%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4w
LjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C
&amp;sdata=3iAfMs1QzQARKF3lqUI8s43PX4IIkgEuQ9PUDyUtpqY%3D&amp;reserved
=0 PLEASE do read the posting guide
https://nam10.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.r
-project.org%2Fposting-guide.html&amp;data=05%7C01%7Ctebert%40ufl.edu%
7C871d5009dd3c455f398f08da84585e4a%7C0d4da0f84a314d76ace60a62331e1b84%
7C0%7C0%7C637967812337328788%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwM
DAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;
sdata=v3IEonnPgg1xTKUzLK4rJc3cfMFxw5p%2FW6puha5CFz0%3D&amp;reserved=0
and provide commented, minimal, self-contained, reproducible code.

______________________________________________
R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fstat
.ethz.ch%2Fmailman%2Flistinfo%2Fr-help&amp;data=05%7C01%7Ctebert%40ufl
.edu%7C871d5009dd3c455f398f08da84585e4a%7C0d4da0f84a314d76ace60a62331e
1b84%7C0%7C0%7C637967812337328788%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4w
LjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C
&amp;sdata=3iAfMs1QzQARKF3lqUI8s43PX4IIkgEuQ9PUDyUtpqY%3D&amp;reserved
=0 PLEASE do read the posting guide
https://nam10.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.r
-project.org%2Fposting-guide.html&amp;data=05%7C01%7Ctebert%40ufl.edu%
7C871d5009dd3c455f398f08da84585e4a%7C0d4da0f84a314d76ace60a62331e1b84%
7C0%7C0%7C637967812337328788%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwM
DAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;
sdata=v3IEonnPgg1xTKUzLK4rJc3cfMFxw5p%2FW6puha5CFz0%3D&amp;reserved=0
and provide commented, minimal, self-contained, reproducible code.

______________________________________________
R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
--
John Fox, Professor Emeritus
McMaster University
Hamilton, Ontario, Canada
web: https://socialsciences.mcmaster.ca/jfox/

John Fox, Professor Emeritus
McMaster University
Hamilton, Ontario, Canada
web: https://socialsciences.mcmaster.ca/jfox/
Thank you John for your help and advice.
Dear Val,

On 2022-08-26 10:41 a.m., Val wrote:
Hi John and Timothy

Thank you for your suggestion and help. Using the sample data, I did
carry out a test run and found a difference in the correlation result.

Option 1.
data_cor <- cor(dat[ , colnames(dat) != "x1"],  # Calculate correlations
                     dat$x1, method = "pearson", use = "complete.obs")
resulted
                  [,1]
     x2 -0.5845835
     x3 -0.4664220
     x4  0.7202837

Option 2.
  for(i in colnames(dat)){
       print(cor.test(dat[,i], dat$x1, method = "pearson", use =
"complete.obs")$estimate)
     }
            [,1]
x2  -0.7362030
x3  -0.04935132
x4   0.85766290

This was crosschecked  using Excel and other softwares and all matches
with option 2.
One of the factors that contributed for this difference  is loss of
information when we are using na.rm(). This is because that if x2 has
missing value but x3 and x4 don?t have then  na.rm()  removed  entire
row information including x3 and x4.
Yes, I already explained that in my previous message.

As well, cor() is capable of computing pairwise-complete correlations --
see ?cor.

There's not an obvious right answer here, however. Using
pairwise-complete correlations can produce inconsistent (i.e.,
non-positive semi-definite) correlation matrices because correlations
are computed on different subsets of the data.

There are much better ways to deal with missing data.

My question is there  a way to extract the number of rows (N)  used in
the correlation analysis?.
I'm sure that there are many ways, but here is one that is very
simple-minded and should be reasonably efficient for ~250 variables:

 > (nc <- ncol(dat))
[1] 4

 > R <- N <- matrix(NA, nc, nc)
 > diag(R) <- 1
 > for (i in 1:(nc - 1)){
+   for (j in (i + 1):nc){
+     R[i, j] <- R[j, i] <-cor(dat[, i], dat[, j], use="complete.obs")
+     N[i, j] <- N[j, i] <- nrow(na.omit(dat[, c(i, j)]))
+   }
+ }

 > round(R, 3)
        [,1]   [,2]   [,3]   [,4]
[1,]  1.000 -0.736 -0.049  0.858
[2,] -0.736  1.000  0.458 -0.428
[3,] -0.049  0.458  1.000  0.092
[4,]  0.858 -0.428  0.092  1.000

 > N
      [,1] [,2] [,3] [,4]
[1,]   NA    8    8    8
[2,]    8   NA    8    8
[3,]    8    8   NA    8
[4,]    8    8    8   NA

 > round(cor(dat, use="pairwise.complete.obs"), 3) # check
        x1     x2     x3     x4
x1  1.000 -0.736 -0.049  0.858
x2 -0.736  1.000  0.458 -0.428
x3 -0.049  0.458  1.000  0.092
x4  0.858 -0.428  0.092  1.000

More generally, I think that it's a good idea to learn a little bit
about R programming if you intend to use R in your work. You'll then be
able to solve problems like this yourself.

I hope this helps,
  John

Thank you,

On Mon, Aug 22, 2022 at 1:00 PM John Fox <jfox at mcmaster.ca> wrote:
Dear Val,

On 2022-08-22 1:33 p.m., Val wrote:
For the time being  I am assuming the relationship across  variables
is linear.  I want get the values first  and detailed examining  of
the relationship will follow later.
This seems backwards to me, but I'll refrain from commenting further on
whether what you want to do makes sense and instead address how to do it
(not, BTW, because I disagree with Bert's and Tim's remarks).

Please see below:

On Mon, Aug 22, 2022 at 12:23 PM Ebert,Timothy Aaron <tebert at ufl.edu> wrote:
I (maybe) agree, but I would go further than that. There are assumptions associated with the test that are missing. It is not clear that the relationships are all linear. Regardless of a "significant outcome" all of the relationships need to be explored in more detail than what is provided in the correlation test.

Multiplicity adjustment as in : https://www.sciencedirect.com/science/article/pii/S0197245600001069 is not an issue that I can see in these data from the information provided. At least not in the same sense as used in the link.

My first guess at the meaning of "multiplicity adjustment" was closer to the experimentwise error rate in a multiple comparison procedure. https://dictionary.apa.org/experiment-wise-error-rateEssentially, the type 1 error rate is inflated the more test you do and if you perform enough tests you find significant outcomes by chance alone. There is great significance in the Redskins rule: https://en.wikipedia.org/wiki/Redskins_Rule.

A simple solution is to apply a Bonferroni correction where alpha is divided by the number of comparisons. If there are 250, then 0.05/250 = 0.0002. Another approach is to try to discuss the outcomes in a way that makes sense. What is the connection between a football team's last home game an the election result that would enable me to take another team and apply their last home game result to the outcome of a different election?

Another complication is if variables x2 through x250 are themselves correlated. Not enough information was provided in the problem to know if this is an issue, but 250 orthogonal variables in a real dataset would be a bit unusual considering the experimentwise error rate previously mentioned.

Large datasets can be very messy.

Tim

-----Original Message-----
From: Bert Gunter <bgunter.4567 at gmail.com>
Sent: Monday, August 22, 2022 12:07 PM
To: Ebert,Timothy Aaron <tebert at ufl.edu>
Cc: Val <valkremk at gmail.com>; r-help at R-project.org (r-help at r-project.org) <r-help at r-project.org>
Subject: Re: [R] Correlate

[External Email]

... But of course the p-values are essentially meaningless without some sort of multiplicity adjustment.
(search on "multiplicity adjustment" for details). :-(

-- Bert

On Mon, Aug 22, 2022 at 8:59 AM Ebert,Timothy Aaron <tebert at ufl.edu> wrote:
A somewhat clunky solution:
for(i in colnames(dat)){
    print(cor.test(dat[,i], dat$x1, method = "pearson", use = "complete.obs")$estimate)
    print(cor.test(dat[,i], dat$x1, method = "pearson", use =
"complete.obs")$p.value) }
Because of missing data, this computes the correlations on different
subsets of the data. A simple solution is to filter the data for NAs:

D <- na.omit(dat)

More comments below:

Rather than printing you could set up an array or list to save the results.

Tim

-----Original Message-----
From: R-help <r-help-bounces at r-project.org> On Behalf Of Val
Sent: Monday, August 22, 2022 11:09 AM
To: r-help at R-project.org (r-help at r-project.org) <r-help at r-project.org>
Subject: [R] Correlate

[External Email]

Hi all,

I have a data set with  ~250  variables(columns).  I want to calculate
the correlation of  one variable with the rest of the other variables
and also want  the p-values  for each correlation.  Please see the
sample data and my attempt.  I  have got the correlation but unable to
get the p-values

dat <- read.table(text="x1 x2 x3 x4
             1.68 -0.96 -1.25  0.61
            -0.06  0.41  0.06 -0.96
                .    0.08  1.14  1.42
             0.80 -0.67  0.53 -0.68
             0.23 -0.97 -1.18 -0.78
            -1.03  1.11 -0.61    .
             2.15     .    0.02  0.66
             0.35 -0.37 -0.26  0.39
            -0.66  0.89   .    -1.49
             0.11  1.52  0.73  -1.03",header=TRUE)

#change all to numeric
      dat[] <- lapply(dat, function(x) as.numeric(as.character(x)))
This data manipulation is unnecessary. Just specify the argument
na.strings="." to read.table().

      data_cor <- cor(dat[ , colnames(dat) != "x1"],  dat$x1, method =
"pearson", use = "complete.obs")

Result
                [,1]
x2 -0.5845835
x3 -0.4664220
x4  0.7202837

How do I get the p-values ?
Taking a somewhat different approach from cor.test(), you can apply
Fisher's z-transformation (recall that D is the data filtered for NAs):

  > 2*pnorm(abs(atanh(data_cor)), sd=1/sqrt(nrow(D) - 3), lower.tail=FALSE)
          [,1]
x2 0.2462807
x3 0.3812854
x4 0.1156939

I hope this helps,
   John

Thank you,

______________________________________________
R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fstat
.ethz.ch%2Fmailman%2Flistinfo%2Fr-help&amp;data=05%7C01%7Ctebert%40ufl
.edu%7C871d5009dd3c455f398f08da84585e4a%7C0d4da0f84a314d76ace60a62331e
1b84%7C0%7C0%7C637967812337328788%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4w
LjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C
&amp;sdata=3iAfMs1QzQARKF3lqUI8s43PX4IIkgEuQ9PUDyUtpqY%3D&amp;reserved
=0 PLEASE do read the posting guide
https://nam10.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.r
-project.org%2Fposting-guide.html&amp;data=05%7C01%7Ctebert%40ufl.edu%
7C871d5009dd3c455f398f08da84585e4a%7C0d4da0f84a314d76ace60a62331e1b84%
7C0%7C0%7C637967812337328788%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwM
DAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;
sdata=v3IEonnPgg1xTKUzLK4rJc3cfMFxw5p%2FW6puha5CFz0%3D&amp;reserved=0
and provide commented, minimal, self-contained, reproducible code.

______________________________________________
R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fstat
.ethz.ch%2Fmailman%2Flistinfo%2Fr-help&amp;data=05%7C01%7Ctebert%40ufl
.edu%7C871d5009dd3c455f398f08da84585e4a%7C0d4da0f84a314d76ace60a62331e
1b84%7C0%7C0%7C637967812337328788%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4w
LjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C
&amp;sdata=3iAfMs1QzQARKF3lqUI8s43PX4IIkgEuQ9PUDyUtpqY%3D&amp;reserved
=0 PLEASE do read the posting guide
https://nam10.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.r
-project.org%2Fposting-guide.html&amp;data=05%7C01%7Ctebert%40ufl.edu%
7C871d5009dd3c455f398f08da84585e4a%7C0d4da0f84a314d76ace60a62331e1b84%
7C0%7C0%7C637967812337328788%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwM
DAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;
sdata=v3IEonnPgg1xTKUzLK4rJc3cfMFxw5p%2FW6puha5CFz0%3D&amp;reserved=0
and provide commented, minimal, self-contained, reproducible code.

______________________________________________
R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
--
John Fox, Professor Emeritus
McMaster University
Hamilton, Ontario, Canada
web: https://socialsciences.mcmaster.ca/jfox/

--
John Fox, Professor Emeritus
McMaster University
Hamilton, Ontario, Canada
web: https://socialsciences.mcmaster.ca/jfox/

Dear Val,
Thank you John for your help and advice.\
You're welcome, and it occurred to me that if you want the number of 
non-missing cases for each individual variable, you could add

	diag(N) <- colSums(!is.na(dat))

to the code I sent earlier.

Best,
  John
On Fri, Aug 26, 2022 at 11:04 AM John Fox <jfox at mcmaster.ca> wrote:
Dear Val,

On 2022-08-26 10:41 a.m., Val wrote:
Hi John and Timothy

Thank you for your suggestion and help. Using the sample data, I did
carry out a test run and found a difference in the correlation result.

Option 1.
data_cor <- cor(dat[ , colnames(dat) != "x1"],  # Calculate correlations
                      dat$x1, method = "pearson", use = "complete.obs")
resulted
                   [,1]
      x2 -0.5845835
      x3 -0.4664220
      x4  0.7202837

Option 2.
   for(i in colnames(dat)){
        print(cor.test(dat[,i], dat$x1, method = "pearson", use =
"complete.obs")$estimate)
      }
             [,1]
x2  -0.7362030
x3  -0.04935132
x4   0.85766290

This was crosschecked  using Excel and other softwares and all matches
with option 2.
One of the factors that contributed for this difference  is loss of
information when we are using na.rm(). This is because that if x2 has
missing value but x3 and x4 don?t have then  na.rm()  removed  entire
row information including x3 and x4.
Yes, I already explained that in my previous message.

As well, cor() is capable of computing pairwise-complete correlations --
see ?cor.

There's not an obvious right answer here, however. Using
pairwise-complete correlations can produce inconsistent (i.e.,
non-positive semi-definite) correlation matrices because correlations
are computed on different subsets of the data.

There are much better ways to deal with missing data.

My question is there  a way to extract the number of rows (N)  used in
the correlation analysis?.
I'm sure that there are many ways, but here is one that is very
simple-minded and should be reasonably efficient for ~250 variables:

  > (nc <- ncol(dat))
[1] 4

  > R <- N <- matrix(NA, nc, nc)
  > diag(R) <- 1
  > for (i in 1:(nc - 1)){
+   for (j in (i + 1):nc){
+     R[i, j] <- R[j, i] <-cor(dat[, i], dat[, j], use="complete.obs")
+     N[i, j] <- N[j, i] <- nrow(na.omit(dat[, c(i, j)]))
+   }
+ }

  > round(R, 3)
         [,1]   [,2]   [,3]   [,4]
[1,]  1.000 -0.736 -0.049  0.858
[2,] -0.736  1.000  0.458 -0.428
[3,] -0.049  0.458  1.000  0.092
[4,]  0.858 -0.428  0.092  1.000

  > N
       [,1] [,2] [,3] [,4]
[1,]   NA    8    8    8
[2,]    8   NA    8    8
[3,]    8    8   NA    8
[4,]    8    8    8   NA

  > round(cor(dat, use="pairwise.complete.obs"), 3) # check
         x1     x2     x3     x4
x1  1.000 -0.736 -0.049  0.858
x2 -0.736  1.000  0.458 -0.428
x3 -0.049  0.458  1.000  0.092
x4  0.858 -0.428  0.092  1.000

More generally, I think that it's a good idea to learn a little bit
about R programming if you intend to use R in your work. You'll then be
able to solve problems like this yourself.

I hope this helps,
   John

Thank you,

On Mon, Aug 22, 2022 at 1:00 PM John Fox <jfox at mcmaster.ca> wrote:
Dear Val,

On 2022-08-22 1:33 p.m., Val wrote:
For the time being  I am assuming the relationship across  variables
is linear.  I want get the values first  and detailed examining  of
the relationship will follow later.
This seems backwards to me, but I'll refrain from commenting further on
whether what you want to do makes sense and instead address how to do it
(not, BTW, because I disagree with Bert's and Tim's remarks).

Please see below:

On Mon, Aug 22, 2022 at 12:23 PM Ebert,Timothy Aaron <tebert at ufl.edu> wrote:
I (maybe) agree, but I would go further than that. There are assumptions associated with the test that are missing. It is not clear that the relationships are all linear. Regardless of a "significant outcome" all of the relationships need to be explored in more detail than what is provided in the correlation test.

Multiplicity adjustment as in : https://www.sciencedirect.com/science/article/pii/S0197245600001069 is not an issue that I can see in these data from the information provided. At least not in the same sense as used in the link.

My first guess at the meaning of "multiplicity adjustment" was closer to the experimentwise error rate in a multiple comparison procedure. https://dictionary.apa.org/experiment-wise-error-rateEssentially, the type 1 error rate is inflated the more test you do and if you perform enough tests you find significant outcomes by chance alone. There is great significance in the Redskins rule: https://en.wikipedia.org/wiki/Redskins_Rule.

A simple solution is to apply a Bonferroni correction where alpha is divided by the number of comparisons. If there are 250, then 0.05/250 = 0.0002. Another approach is to try to discuss the outcomes in a way that makes sense. What is the connection between a football team's last home game an the election result that would enable me to take another team and apply their last home game result to the outcome of a different election?

Another complication is if variables x2 through x250 are themselves correlated. Not enough information was provided in the problem to know if this is an issue, but 250 orthogonal variables in a real dataset would be a bit unusual considering the experimentwise error rate previously mentioned.

Large datasets can be very messy.

Tim

-----Original Message-----
From: Bert Gunter <bgunter.4567 at gmail.com>
Sent: Monday, August 22, 2022 12:07 PM
To: Ebert,Timothy Aaron <tebert at ufl.edu>
Cc: Val <valkremk at gmail.com>; r-help at R-project.org (r-help at r-project.org) <r-help at r-project.org>
Subject: Re: [R] Correlate

[External Email]

... But of course the p-values are essentially meaningless without some sort of multiplicity adjustment.
(search on "multiplicity adjustment" for details). :-(

-- Bert

On Mon, Aug 22, 2022 at 8:59 AM Ebert,Timothy Aaron <tebert at ufl.edu> wrote:
A somewhat clunky solution:
for(i in colnames(dat)){
     print(cor.test(dat[,i], dat$x1, method = "pearson", use = "complete.obs")$estimate)
     print(cor.test(dat[,i], dat$x1, method = "pearson", use =
"complete.obs")$p.value) }
Because of missing data, this computes the correlations on different
subsets of the data. A simple solution is to filter the data for NAs:

D <- na.omit(dat)

More comments below:

Rather than printing you could set up an array or list to save the results.

Tim

-----Original Message-----
From: R-help <r-help-bounces at r-project.org> On Behalf Of Val
Sent: Monday, August 22, 2022 11:09 AM
To: r-help at R-project.org (r-help at r-project.org) <r-help at r-project.org>
Subject: [R] Correlate

[External Email]

Hi all,

I have a data set with  ~250  variables(columns).  I want to calculate
the correlation of  one variable with the rest of the other variables
and also want  the p-values  for each correlation.  Please see the
sample data and my attempt.  I  have got the correlation but unable to
get the p-values

dat <- read.table(text="x1 x2 x3 x4
              1.68 -0.96 -1.25  0.61
             -0.06  0.41  0.06 -0.96
                 .    0.08  1.14  1.42
              0.80 -0.67  0.53 -0.68
              0.23 -0.97 -1.18 -0.78
             -1.03  1.11 -0.61    .
              2.15     .    0.02  0.66
              0.35 -0.37 -0.26  0.39
             -0.66  0.89   .    -1.49
              0.11  1.52  0.73  -1.03",header=TRUE)

#change all to numeric
       dat[] <- lapply(dat, function(x) as.numeric(as.character(x)))
This data manipulation is unnecessary. Just specify the argument
na.strings="." to read.table().

       data_cor <- cor(dat[ , colnames(dat) != "x1"],  dat$x1, method =
"pearson", use = "complete.obs")

Result
                 [,1]
x2 -0.5845835
x3 -0.4664220
x4  0.7202837

How do I get the p-values ?
Taking a somewhat different approach from cor.test(), you can apply
Fisher's z-transformation (recall that D is the data filtered for NAs):

   > 2*pnorm(abs(atanh(data_cor)), sd=1/sqrt(nrow(D) - 3), lower.tail=FALSE)
           [,1]
x2 0.2462807
x3 0.3812854
x4 0.1156939

I hope this helps,
    John

Thank you,

______________________________________________
R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fstat
.ethz.ch%2Fmailman%2Flistinfo%2Fr-help&amp;data=05%7C01%7Ctebert%40ufl
.edu%7C871d5009dd3c455f398f08da84585e4a%7C0d4da0f84a314d76ace60a62331e
1b84%7C0%7C0%7C637967812337328788%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4w
LjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C
&amp;sdata=3iAfMs1QzQARKF3lqUI8s43PX4IIkgEuQ9PUDyUtpqY%3D&amp;reserved
=0 PLEASE do read the posting guide
https://nam10.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.r
-project.org%2Fposting-guide.html&amp;data=05%7C01%7Ctebert%40ufl.edu%
7C871d5009dd3c455f398f08da84585e4a%7C0d4da0f84a314d76ace60a62331e1b84%
7C0%7C0%7C637967812337328788%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwM
DAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;
sdata=v3IEonnPgg1xTKUzLK4rJc3cfMFxw5p%2FW6puha5CFz0%3D&amp;reserved=0
and provide commented, minimal, self-contained, reproducible code.

______________________________________________
R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fstat
.ethz.ch%2Fmailman%2Flistinfo%2Fr-help&amp;data=05%7C01%7Ctebert%40ufl
.edu%7C871d5009dd3c455f398f08da84585e4a%7C0d4da0f84a314d76ace60a62331e
1b84%7C0%7C0%7C637967812337328788%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4w
LjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C
&amp;sdata=3iAfMs1QzQARKF3lqUI8s43PX4IIkgEuQ9PUDyUtpqY%3D&amp;reserved
=0 PLEASE do read the posting guide
https://nam10.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.r
-project.org%2Fposting-guide.html&amp;data=05%7C01%7Ctebert%40ufl.edu%
7C871d5009dd3c455f398f08da84585e4a%7C0d4da0f84a314d76ace60a62331e1b84%
7C0%7C0%7C637967812337328788%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwM
DAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;
sdata=v3IEonnPgg1xTKUzLK4rJc3cfMFxw5p%2FW6puha5CFz0%3D&amp;reserved=0
and provide commented, minimal, self-contained, reproducible code.

______________________________________________
R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
--
John Fox, Professor Emeritus
McMaster University
Hamilton, Ontario, Canada
web: https://socialsciences.mcmaster.ca/jfox/

--
John Fox, Professor Emeritus
McMaster University
Hamilton, Ontario, Canada
web: https://socialsciences.mcmaster.ca/jfox/

John Fox, Professor Emeritus
McMaster University
Hamilton, Ontario, Canada
web: https://socialsciences.mcmaster.ca/jfox/
And just for fun, here is a more "functional" version without the
explicit 'for()' loops that gives the lower triangular values of the
counts of nonmissing values for each correlation:

## (using John's notation)
w <- expand.grid(1:nc,1:nc)
f <- \(x) ifelse(x[1] <= x[2], NA, nrow(na.omit(dat[,x])))
ans <- matrix(apply(w,1,f), ncol = nc)

Note: This is just a matter of taste. I would be amazed if it's any
faster than using the explicit loops (they are implicit here, but
still there -- and it might be slower!). But maybe it reinforces
John's comment about the value of spending some(more) effort to learn
how to program in R. Gives you flexibility to program however you
like, which usually reduces bugs in one's code, I believe.

Cheers,
Bert

Note: this is just
Dear Val,

On 2022-08-26 10:41 a.m., Val wrote:
Hi John and Timothy

Thank you for your suggestion and help. Using the sample data, I did
carry out a test run and found a difference in the correlation result.

Option 1.
data_cor <- cor(dat[ , colnames(dat) != "x1"],  # Calculate correlations
                     dat$x1, method = "pearson", use = "complete.obs")
resulted
                  [,1]
     x2 -0.5845835
     x3 -0.4664220
     x4  0.7202837

Option 2.
  for(i in colnames(dat)){
       print(cor.test(dat[,i], dat$x1, method = "pearson", use =
"complete.obs")$estimate)
     }
            [,1]
x2  -0.7362030
x3  -0.04935132
x4   0.85766290

This was crosschecked  using Excel and other softwares and all matches
with option 2.
One of the factors that contributed for this difference  is loss of
information when we are using na.rm(). This is because that if x2 has
missing value but x3 and x4 don?t have then  na.rm()  removed  entire
row information including x3 and x4.
Yes, I already explained that in my previous message.

As well, cor() is capable of computing pairwise-complete correlations --
see ?cor.

There's not an obvious right answer here, however. Using
pairwise-complete correlations can produce inconsistent (i.e.,
non-positive semi-definite) correlation matrices because correlations
are computed on different subsets of the data.

There are much better ways to deal with missing data.

My question is there  a way to extract the number of rows (N)  used in
the correlation analysis?.
I'm sure that there are many ways, but here is one that is very
simple-minded and should be reasonably efficient for ~250 variables:

 > (nc <- ncol(dat))
[1] 4

 > R <- N <- matrix(NA, nc, nc)
 > diag(R) <- 1
 > for (i in 1:(nc - 1)){
+   for (j in (i + 1):nc){
+     R[i, j] <- R[j, i] <-cor(dat[, i], dat[, j], use="complete.obs")
+     N[i, j] <- N[j, i] <- nrow(na.omit(dat[, c(i, j)]))
+   }
+ }

 > round(R, 3)
        [,1]   [,2]   [,3]   [,4]
[1,]  1.000 -0.736 -0.049  0.858
[2,] -0.736  1.000  0.458 -0.428
[3,] -0.049  0.458  1.000  0.092
[4,]  0.858 -0.428  0.092  1.000

 > N
      [,1] [,2] [,3] [,4]
[1,]   NA    8    8    8
[2,]    8   NA    8    8
[3,]    8    8   NA    8
[4,]    8    8    8   NA

 > round(cor(dat, use="pairwise.complete.obs"), 3) # check
        x1     x2     x3     x4
x1  1.000 -0.736 -0.049  0.858
x2 -0.736  1.000  0.458 -0.428
x3 -0.049  0.458  1.000  0.092
x4  0.858 -0.428  0.092  1.000

More generally, I think that it's a good idea to learn a little bit
about R programming if you intend to use R in your work. You'll then be
able to solve problems like this yourself.

I hope this helps,
  John

Thank you,

On Mon, Aug 22, 2022 at 1:00 PM John Fox <jfox at mcmaster.ca> wrote:
Dear Val,

On 2022-08-22 1:33 p.m., Val wrote:
For the time being  I am assuming the relationship across  variables
is linear.  I want get the values first  and detailed examining  of
the relationship will follow later.
This seems backwards to me, but I'll refrain from commenting further on
whether what you want to do makes sense and instead address how to do it
(not, BTW, because I disagree with Bert's and Tim's remarks).

Please see below:

On Mon, Aug 22, 2022 at 12:23 PM Ebert,Timothy Aaron <tebert at ufl.edu> wrote:
I (maybe) agree, but I would go further than that. There are assumptions associated with the test that are missing. It is not clear that the relationships are all linear. Regardless of a "significant outcome" all of the relationships need to be explored in more detail than what is provided in the correlation test.

Multiplicity adjustment as in : https://www.sciencedirect.com/science/article/pii/S0197245600001069 is not an issue that I can see in these data from the information provided. At least not in the same sense as used in the link.

My first guess at the meaning of "multiplicity adjustment" was closer to the experimentwise error rate in a multiple comparison procedure. https://dictionary.apa.org/experiment-wise-error-rateEssentially, the type 1 error rate is inflated the more test you do and if you perform enough tests you find significant outcomes by chance alone. There is great significance in the Redskins rule: https://en.wikipedia.org/wiki/Redskins_Rule.

A simple solution is to apply a Bonferroni correction where alpha is divided by the number of comparisons. If there are 250, then 0.05/250 = 0.0002. Another approach is to try to discuss the outcomes in a way that makes sense. What is the connection between a football team's last home game an the election result that would enable me to take another team and apply their last home game result to the outcome of a different election?

Another complication is if variables x2 through x250 are themselves correlated. Not enough information was provided in the problem to know if this is an issue, but 250 orthogonal variables in a real dataset would be a bit unusual considering the experimentwise error rate previously mentioned.

Large datasets can be very messy.

Tim

-----Original Message-----
From: Bert Gunter <bgunter.4567 at gmail.com>
Sent: Monday, August 22, 2022 12:07 PM
To: Ebert,Timothy Aaron <tebert at ufl.edu>
Cc: Val <valkremk at gmail.com>; r-help at R-project.org (r-help at r-project.org) <r-help at r-project.org>
Subject: Re: [R] Correlate

[External Email]

... But of course the p-values are essentially meaningless without some sort of multiplicity adjustment.
(search on "multiplicity adjustment" for details). :-(

-- Bert

On Mon, Aug 22, 2022 at 8:59 AM Ebert,Timothy Aaron <tebert at ufl.edu> wrote:
A somewhat clunky solution:
for(i in colnames(dat)){
    print(cor.test(dat[,i], dat$x1, method = "pearson", use = "complete.obs")$estimate)
    print(cor.test(dat[,i], dat$x1, method = "pearson", use =
"complete.obs")$p.value) }
Because of missing data, this computes the correlations on different
subsets of the data. A simple solution is to filter the data for NAs:

D <- na.omit(dat)

More comments below:

Rather than printing you could set up an array or list to save the results.

Tim

-----Original Message-----
From: R-help <r-help-bounces at r-project.org> On Behalf Of Val
Sent: Monday, August 22, 2022 11:09 AM
To: r-help at R-project.org (r-help at r-project.org) <r-help at r-project.org>
Subject: [R] Correlate

[External Email]

Hi all,

I have a data set with  ~250  variables(columns).  I want to calculate
the correlation of  one variable with the rest of the other variables
and also want  the p-values  for each correlation.  Please see the
sample data and my attempt.  I  have got the correlation but unable to
get the p-values

dat <- read.table(text="x1 x2 x3 x4
             1.68 -0.96 -1.25  0.61
            -0.06  0.41  0.06 -0.96
                .    0.08  1.14  1.42
             0.80 -0.67  0.53 -0.68
             0.23 -0.97 -1.18 -0.78
            -1.03  1.11 -0.61    .
             2.15     .    0.02  0.66
             0.35 -0.37 -0.26  0.39
            -0.66  0.89   .    -1.49
             0.11  1.52  0.73  -1.03",header=TRUE)

#change all to numeric
      dat[] <- lapply(dat, function(x) as.numeric(as.character(x)))
This data manipulation is unnecessary. Just specify the argument
na.strings="." to read.table().

      data_cor <- cor(dat[ , colnames(dat) != "x1"],  dat$x1, method =
"pearson", use = "complete.obs")

Result
                [,1]
x2 -0.5845835
x3 -0.4664220
x4  0.7202837

How do I get the p-values ?
Taking a somewhat different approach from cor.test(), you can apply
Fisher's z-transformation (recall that D is the data filtered for NAs):

  > 2*pnorm(abs(atanh(data_cor)), sd=1/sqrt(nrow(D) - 3), lower.tail=FALSE)
          [,1]
x2 0.2462807
x3 0.3812854
x4 0.1156939

I hope this helps,
   John

Thank you,

______________________________________________
R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fstat
.ethz.ch%2Fmailman%2Flistinfo%2Fr-help&amp;data=05%7C01%7Ctebert%40ufl
.edu%7C871d5009dd3c455f398f08da84585e4a%7C0d4da0f84a314d76ace60a62331e
1b84%7C0%7C0%7C637967812337328788%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4w
LjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C
&amp;sdata=3iAfMs1QzQARKF3lqUI8s43PX4IIkgEuQ9PUDyUtpqY%3D&amp;reserved
=0 PLEASE do read the posting guide
https://nam10.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.r
-project.org%2Fposting-guide.html&amp;data=05%7C01%7Ctebert%40ufl.edu%
7C871d5009dd3c455f398f08da84585e4a%7C0d4da0f84a314d76ace60a62331e1b84%
7C0%7C0%7C637967812337328788%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwM
DAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;
sdata=v3IEonnPgg1xTKUzLK4rJc3cfMFxw5p%2FW6puha5CFz0%3D&amp;reserved=0
and provide commented, minimal, self-contained, reproducible code.

______________________________________________
R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fstat
.ethz.ch%2Fmailman%2Flistinfo%2Fr-help&amp;data=05%7C01%7Ctebert%40ufl
.edu%7C871d5009dd3c455f398f08da84585e4a%7C0d4da0f84a314d76ace60a62331e
1b84%7C0%7C0%7C637967812337328788%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4w
LjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C
&amp;sdata=3iAfMs1QzQARKF3lqUI8s43PX4IIkgEuQ9PUDyUtpqY%3D&amp;reserved
=0 PLEASE do read the posting guide
https://nam10.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.r
-project.org%2Fposting-guide.html&amp;data=05%7C01%7Ctebert%40ufl.edu%
7C871d5009dd3c455f398f08da84585e4a%7C0d4da0f84a314d76ace60a62331e1b84%
7C0%7C0%7C637967812337328788%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwM
DAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;
sdata=v3IEonnPgg1xTKUzLK4rJc3cfMFxw5p%2FW6puha5CFz0%3D&amp;reserved=0
and provide commented, minimal, self-contained, reproducible code.

______________________________________________
R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
--
John Fox, Professor Emeritus
McMaster University
Hamilton, Ontario, Canada
web: https://socialsciences.mcmaster.ca/jfox/

--
John Fox, Professor Emeritus
McMaster University
Hamilton, Ontario, Canada
web: https://socialsciences.mcmaster.ca/jfox/

Thank you John again. I have got my result  in the form as shown below
      Variable   Corr     Pvalue   Nobs
            x1      1.000       0     165425
Dear Val,

On 2022-08-26 4:06 p.m., Val wrote:
Thank you John for your help and advice.\
You're welcome, and it occurred to me that if you want the number of
non-missing cases for each individual variable, you could add

        diag(N) <- colSums(!is.na(dat))

to the code I sent earlier.

Best,
  John

On Fri, Aug 26, 2022 at 11:04 AM John Fox <jfox at mcmaster.ca> wrote:
Dear Val,

On 2022-08-26 10:41 a.m., Val wrote:
Hi John and Timothy

Thank you for your suggestion and help. Using the sample data, I did
carry out a test run and found a difference in the correlation result.

Option 1.
data_cor <- cor(dat[ , colnames(dat) != "x1"],  # Calculate correlations
                      dat$x1, method = "pearson", use = "complete.obs")
resulted
                   [,1]
      x2 -0.5845835
      x3 -0.4664220
      x4  0.7202837

Option 2.
   for(i in colnames(dat)){
        print(cor.test(dat[,i], dat$x1, method = "pearson", use =
"complete.obs")$estimate)
      }
             [,1]
x2  -0.7362030
x3  -0.04935132
x4   0.85766290

This was crosschecked  using Excel and other softwares and all matches
with option 2.
One of the factors that contributed for this difference  is loss of
information when we are using na.rm(). This is because that if x2 has
missing value but x3 and x4 don?t have then  na.rm()  removed  entire
row information including x3 and x4.
Yes, I already explained that in my previous message.

As well, cor() is capable of computing pairwise-complete correlations --
see ?cor.

There's not an obvious right answer here, however. Using
pairwise-complete correlations can produce inconsistent (i.e.,
non-positive semi-definite) correlation matrices because correlations
are computed on different subsets of the data.

There are much better ways to deal with missing data.

My question is there  a way to extract the number of rows (N)  used in
the correlation analysis?.
I'm sure that there are many ways, but here is one that is very
simple-minded and should be reasonably efficient for ~250 variables:

  > (nc <- ncol(dat))
[1] 4

  > R <- N <- matrix(NA, nc, nc)
  > diag(R) <- 1
  > for (i in 1:(nc - 1)){
+   for (j in (i + 1):nc){
+     R[i, j] <- R[j, i] <-cor(dat[, i], dat[, j], use="complete.obs")
+     N[i, j] <- N[j, i] <- nrow(na.omit(dat[, c(i, j)]))
+   }
+ }

  > round(R, 3)
         [,1]   [,2]   [,3]   [,4]
[1,]  1.000 -0.736 -0.049  0.858
[2,] -0.736  1.000  0.458 -0.428
[3,] -0.049  0.458  1.000  0.092
[4,]  0.858 -0.428  0.092  1.000

  > N
       [,1] [,2] [,3] [,4]
[1,]   NA    8    8    8
[2,]    8   NA    8    8
[3,]    8    8   NA    8
[4,]    8    8    8   NA

  > round(cor(dat, use="pairwise.complete.obs"), 3) # check
         x1     x2     x3     x4
x1  1.000 -0.736 -0.049  0.858
x2 -0.736  1.000  0.458 -0.428
x3 -0.049  0.458  1.000  0.092
x4  0.858 -0.428  0.092  1.000

More generally, I think that it's a good idea to learn a little bit
about R programming if you intend to use R in your work. You'll then be
able to solve problems like this yourself.

I hope this helps,
   John

Thank you,

On Mon, Aug 22, 2022 at 1:00 PM John Fox <jfox at mcmaster.ca> wrote:
Dear Val,

On 2022-08-22 1:33 p.m., Val wrote:
For the time being  I am assuming the relationship across  variables
is linear.  I want get the values first  and detailed examining  of
the relationship will follow later.
This seems backwards to me, but I'll refrain from commenting further on
whether what you want to do makes sense and instead address how to do it
(not, BTW, because I disagree with Bert's and Tim's remarks).

Please see below:

On Mon, Aug 22, 2022 at 12:23 PM Ebert,Timothy Aaron <tebert at ufl.edu> wrote:
I (maybe) agree, but I would go further than that. There are assumptions associated with the test that are missing. It is not clear that the relationships are all linear. Regardless of a "significant outcome" all of the relationships need to be explored in more detail than what is provided in the correlation test.

Multiplicity adjustment as in : https://www.sciencedirect.com/science/article/pii/S0197245600001069 is not an issue that I can see in these data from the information provided. At least not in the same sense as used in the link.

My first guess at the meaning of "multiplicity adjustment" was closer to the experimentwise error rate in a multiple comparison procedure. https://dictionary.apa.org/experiment-wise-error-rateEssentially, the type 1 error rate is inflated the more test you do and if you perform enough tests you find significant outcomes by chance alone. There is great significance in the Redskins rule: https://en.wikipedia.org/wiki/Redskins_Rule.

A simple solution is to apply a Bonferroni correction where alpha is divided by the number of comparisons. If there are 250, then 0.05/250 = 0.0002. Another approach is to try to discuss the outcomes in a way that makes sense. What is the connection between a football team's last home game an the election result that would enable me to take another team and apply their last home game result to the outcome of a different election?

Another complication is if variables x2 through x250 are themselves correlated. Not enough information was provided in the problem to know if this is an issue, but 250 orthogonal variables in a real dataset would be a bit unusual considering the experimentwise error rate previously mentioned.

Large datasets can be very messy.

Tim

-----Original Message-----
From: Bert Gunter <bgunter.4567 at gmail.com>
Sent: Monday, August 22, 2022 12:07 PM
To: Ebert,Timothy Aaron <tebert at ufl.edu>
Cc: Val <valkremk at gmail.com>; r-help at R-project.org (r-help at r-project.org) <r-help at r-project.org>
Subject: Re: [R] Correlate

[External Email]

... But of course the p-values are essentially meaningless without some sort of multiplicity adjustment.
(search on "multiplicity adjustment" for details). :-(

-- Bert

On Mon, Aug 22, 2022 at 8:59 AM Ebert,Timothy Aaron <tebert at ufl.edu> wrote:
A somewhat clunky solution:
for(i in colnames(dat)){
     print(cor.test(dat[,i], dat$x1, method = "pearson", use = "complete.obs")$estimate)
     print(cor.test(dat[,i], dat$x1, method = "pearson", use =
"complete.obs")$p.value) }
Because of missing data, this computes the correlations on different
subsets of the data. A simple solution is to filter the data for NAs:

D <- na.omit(dat)

More comments below:

Rather than printing you could set up an array or list to save the results.

Tim

-----Original Message-----
From: R-help <r-help-bounces at r-project.org> On Behalf Of Val
Sent: Monday, August 22, 2022 11:09 AM
To: r-help at R-project.org (r-help at r-project.org) <r-help at r-project.org>
Subject: [R] Correlate

[External Email]

Hi all,

I have a data set with  ~250  variables(columns).  I want to calculate
the correlation of  one variable with the rest of the other variables
and also want  the p-values  for each correlation.  Please see the
sample data and my attempt.  I  have got the correlation but unable to
get the p-values

dat <- read.table(text="x1 x2 x3 x4
              1.68 -0.96 -1.25  0.61
             -0.06  0.41  0.06 -0.96
                 .    0.08  1.14  1.42
              0.80 -0.67  0.53 -0.68
              0.23 -0.97 -1.18 -0.78
             -1.03  1.11 -0.61    .
              2.15     .    0.02  0.66
              0.35 -0.37 -0.26  0.39
             -0.66  0.89   .    -1.49
              0.11  1.52  0.73  -1.03",header=TRUE)

#change all to numeric
       dat[] <- lapply(dat, function(x) as.numeric(as.character(x)))
This data manipulation is unnecessary. Just specify the argument
na.strings="." to read.table().

       data_cor <- cor(dat[ , colnames(dat) != "x1"],  dat$x1, method =
"pearson", use = "complete.obs")

Result
                 [,1]
x2 -0.5845835
x3 -0.4664220
x4  0.7202837

How do I get the p-values ?
Taking a somewhat different approach from cor.test(), you can apply
Fisher's z-transformation (recall that D is the data filtered for NAs):

   > 2*pnorm(abs(atanh(data_cor)), sd=1/sqrt(nrow(D) - 3), lower.tail=FALSE)
           [,1]
x2 0.2462807
x3 0.3812854
x4 0.1156939

I hope this helps,
    John

Thank you,

______________________________________________
R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fstat
.ethz.ch%2Fmailman%2Flistinfo%2Fr-help&amp;data=05%7C01%7Ctebert%40ufl
.edu%7C871d5009dd3c455f398f08da84585e4a%7C0d4da0f84a314d76ace60a62331e
1b84%7C0%7C0%7C637967812337328788%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4w
LjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C
&amp;sdata=3iAfMs1QzQARKF3lqUI8s43PX4IIkgEuQ9PUDyUtpqY%3D&amp;reserved
=0 PLEASE do read the posting guide
https://nam10.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.r
-project.org%2Fposting-guide.html&amp;data=05%7C01%7Ctebert%40ufl.edu%
7C871d5009dd3c455f398f08da84585e4a%7C0d4da0f84a314d76ace60a62331e1b84%
7C0%7C0%7C637967812337328788%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwM
DAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;
sdata=v3IEonnPgg1xTKUzLK4rJc3cfMFxw5p%2FW6puha5CFz0%3D&amp;reserved=0
and provide commented, minimal, self-contained, reproducible code.

______________________________________________
R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fstat
.ethz.ch%2Fmailman%2Flistinfo%2Fr-help&amp;data=05%7C01%7Ctebert%40ufl
.edu%7C871d5009dd3c455f398f08da84585e4a%7C0d4da0f84a314d76ace60a62331e
1b84%7C0%7C0%7C637967812337328788%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4w
LjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C
&amp;sdata=3iAfMs1QzQARKF3lqUI8s43PX4IIkgEuQ9PUDyUtpqY%3D&amp;reserved
=0 PLEASE do read the posting guide
https://nam10.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.r
-project.org%2Fposting-guide.html&amp;data=05%7C01%7Ctebert%40ufl.edu%
7C871d5009dd3c455f398f08da84585e4a%7C0d4da0f84a314d76ace60a62331e1b84%
7C0%7C0%7C637967812337328788%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwM
DAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;
sdata=v3IEonnPgg1xTKUzLK4rJc3cfMFxw5p%2FW6puha5CFz0%3D&amp;reserved=0
and provide commented, minimal, self-contained, reproducible code.

______________________________________________
R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
--
John Fox, Professor Emeritus
McMaster University
Hamilton, Ontario, Canada
web: https://socialsciences.mcmaster.ca/jfox/

--
John Fox, Professor Emeritus
McMaster University
Hamilton, Ontario, Canada
web: https://socialsciences.mcmaster.ca/jfox/

--
John Fox, Professor Emeritus
McMaster University
Hamilton, Ontario, Canada
web: https://socialsciences.mcmaster.ca/jfox/

Dear Val,
Thank you John again. I have got my result  in the form as shown below
       Variable   Corr     Pvalue   Nobs
             x1      1.000       0     165425
This raises two questions: (1) Are p-values interesting when you have n 
 > 165K cases? (2) What's the point of testing a correlation between a 
variable and itself?

Best,
  John

On Fri, Aug 26, 2022 at 3:59 PM John Fox <jfox at mcmaster.ca> wrote:
Dear Val,

On 2022-08-26 4:06 p.m., Val wrote:
Thank you John for your help and advice.\
You're welcome, and it occurred to me that if you want the number of
non-missing cases for each individual variable, you could add

         diag(N) <- colSums(!is.na(dat))

to the code I sent earlier.

Best,
   John

On Fri, Aug 26, 2022 at 11:04 AM John Fox <jfox at mcmaster.ca> wrote:
Dear Val,

On 2022-08-26 10:41 a.m., Val wrote:
Hi John and Timothy

Thank you for your suggestion and help. Using the sample data, I did
carry out a test run and found a difference in the correlation result.

Option 1.
data_cor <- cor(dat[ , colnames(dat) != "x1"],  # Calculate correlations
                       dat$x1, method = "pearson", use = "complete.obs")
resulted
                    [,1]
       x2 -0.5845835
       x3 -0.4664220
       x4  0.7202837

Option 2.
    for(i in colnames(dat)){
         print(cor.test(dat[,i], dat$x1, method = "pearson", use =
"complete.obs")$estimate)
       }
              [,1]
x2  -0.7362030
x3  -0.04935132
x4   0.85766290

This was crosschecked  using Excel and other softwares and all matches
with option 2.
One of the factors that contributed for this difference  is loss of
information when we are using na.rm(). This is because that if x2 has
missing value but x3 and x4 don?t have then  na.rm()  removed  entire
row information including x3 and x4.
Yes, I already explained that in my previous message.

As well, cor() is capable of computing pairwise-complete correlations --
see ?cor.

There's not an obvious right answer here, however. Using
pairwise-complete correlations can produce inconsistent (i.e.,
non-positive semi-definite) correlation matrices because correlations
are computed on different subsets of the data.

There are much better ways to deal with missing data.

My question is there  a way to extract the number of rows (N)  used in
the correlation analysis?.
I'm sure that there are many ways, but here is one that is very
simple-minded and should be reasonably efficient for ~250 variables:

   > (nc <- ncol(dat))
[1] 4

   > R <- N <- matrix(NA, nc, nc)
   > diag(R) <- 1
   > for (i in 1:(nc - 1)){
+   for (j in (i + 1):nc){
+     R[i, j] <- R[j, i] <-cor(dat[, i], dat[, j], use="complete.obs")
+     N[i, j] <- N[j, i] <- nrow(na.omit(dat[, c(i, j)]))
+   }
+ }

   > round(R, 3)
          [,1]   [,2]   [,3]   [,4]
[1,]  1.000 -0.736 -0.049  0.858
[2,] -0.736  1.000  0.458 -0.428
[3,] -0.049  0.458  1.000  0.092
[4,]  0.858 -0.428  0.092  1.000

   > N
        [,1] [,2] [,3] [,4]
[1,]   NA    8    8    8
[2,]    8   NA    8    8
[3,]    8    8   NA    8
[4,]    8    8    8   NA

   > round(cor(dat, use="pairwise.complete.obs"), 3) # check
          x1     x2     x3     x4
x1  1.000 -0.736 -0.049  0.858
x2 -0.736  1.000  0.458 -0.428
x3 -0.049  0.458  1.000  0.092
x4  0.858 -0.428  0.092  1.000

More generally, I think that it's a good idea to learn a little bit
about R programming if you intend to use R in your work. You'll then be
able to solve problems like this yourself.

I hope this helps,
    John

Thank you,

On Mon, Aug 22, 2022 at 1:00 PM John Fox <jfox at mcmaster.ca> wrote:
Dear Val,

On 2022-08-22 1:33 p.m., Val wrote:
For the time being  I am assuming the relationship across  variables
is linear.  I want get the values first  and detailed examining  of
the relationship will follow later.
This seems backwards to me, but I'll refrain from commenting further on
whether what you want to do makes sense and instead address how to do it
(not, BTW, because I disagree with Bert's and Tim's remarks).

Please see below:

On Mon, Aug 22, 2022 at 12:23 PM Ebert,Timothy Aaron <tebert at ufl.edu> wrote:
I (maybe) agree, but I would go further than that. There are assumptions associated with the test that are missing. It is not clear that the relationships are all linear. Regardless of a "significant outcome" all of the relationships need to be explored in more detail than what is provided in the correlation test.

Multiplicity adjustment as in : https://www.sciencedirect.com/science/article/pii/S0197245600001069 is not an issue that I can see in these data from the information provided. At least not in the same sense as used in the link.

My first guess at the meaning of "multiplicity adjustment" was closer to the experimentwise error rate in a multiple comparison procedure. https://dictionary.apa.org/experiment-wise-error-rateEssentially, the type 1 error rate is inflated the more test you do and if you perform enough tests you find significant outcomes by chance alone. There is great significance in the Redskins rule: https://en.wikipedia.org/wiki/Redskins_Rule.

A simple solution is to apply a Bonferroni correction where alpha is divided by the number of comparisons. If there are 250, then 0.05/250 = 0.0002. Another approach is to try to discuss the outcomes in a way that makes sense. What is the connection between a football team's last home game an the election result that would enable me to take another team and apply their last home game result to the outcome of a different election?

Another complication is if variables x2 through x250 are themselves correlated. Not enough information was provided in the problem to know if this is an issue, but 250 orthogonal variables in a real dataset would be a bit unusual considering the experimentwise error rate previously mentioned.

Large datasets can be very messy.

Tim

-----Original Message-----
From: Bert Gunter <bgunter.4567 at gmail.com>
Sent: Monday, August 22, 2022 12:07 PM
To: Ebert,Timothy Aaron <tebert at ufl.edu>
Cc: Val <valkremk at gmail.com>; r-help at R-project.org (r-help at r-project.org) <r-help at r-project.org>
Subject: Re: [R] Correlate

[External Email]

... But of course the p-values are essentially meaningless without some sort of multiplicity adjustment.
(search on "multiplicity adjustment" for details). :-(

-- Bert

On Mon, Aug 22, 2022 at 8:59 AM Ebert,Timothy Aaron <tebert at ufl.edu> wrote:
A somewhat clunky solution:
for(i in colnames(dat)){
      print(cor.test(dat[,i], dat$x1, method = "pearson", use = "complete.obs")$estimate)
      print(cor.test(dat[,i], dat$x1, method = "pearson", use =
"complete.obs")$p.value) }
Because of missing data, this computes the correlations on different
subsets of the data. A simple solution is to filter the data for NAs:

D <- na.omit(dat)

More comments below:

Rather than printing you could set up an array or list to save the results.

Tim

-----Original Message-----
From: R-help <r-help-bounces at r-project.org> On Behalf Of Val
Sent: Monday, August 22, 2022 11:09 AM
To: r-help at R-project.org (r-help at r-project.org) <r-help at r-project.org>
Subject: [R] Correlate

[External Email]

Hi all,

I have a data set with  ~250  variables(columns).  I want to calculate
the correlation of  one variable with the rest of the other variables
and also want  the p-values  for each correlation.  Please see the
sample data and my attempt.  I  have got the correlation but unable to
get the p-values

dat <- read.table(text="x1 x2 x3 x4
               1.68 -0.96 -1.25  0.61
              -0.06  0.41  0.06 -0.96
                  .    0.08  1.14  1.42
               0.80 -0.67  0.53 -0.68
               0.23 -0.97 -1.18 -0.78
              -1.03  1.11 -0.61    .
               2.15     .    0.02  0.66
               0.35 -0.37 -0.26  0.39
              -0.66  0.89   .    -1.49
               0.11  1.52  0.73  -1.03",header=TRUE)

#change all to numeric
        dat[] <- lapply(dat, function(x) as.numeric(as.character(x)))
This data manipulation is unnecessary. Just specify the argument
na.strings="." to read.table().

        data_cor <- cor(dat[ , colnames(dat) != "x1"],  dat$x1, method =
"pearson", use = "complete.obs")

Result
                  [,1]
x2 -0.5845835
x3 -0.4664220
x4  0.7202837

How do I get the p-values ?
Taking a somewhat different approach from cor.test(), you can apply
Fisher's z-transformation (recall that D is the data filtered for NAs):

    > 2*pnorm(abs(atanh(data_cor)), sd=1/sqrt(nrow(D) - 3), lower.tail=FALSE)
            [,1]
x2 0.2462807
x3 0.3812854
x4 0.1156939

I hope this helps,
     John

Thank you,

______________________________________________
R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fstat
.ethz.ch%2Fmailman%2Flistinfo%2Fr-help&amp;data=05%7C01%7Ctebert%40ufl
.edu%7C871d5009dd3c455f398f08da84585e4a%7C0d4da0f84a314d76ace60a62331e
1b84%7C0%7C0%7C637967812337328788%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4w
LjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C
&amp;sdata=3iAfMs1QzQARKF3lqUI8s43PX4IIkgEuQ9PUDyUtpqY%3D&amp;reserved
=0 PLEASE do read the posting guide
https://nam10.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.r
-project.org%2Fposting-guide.html&amp;data=05%7C01%7Ctebert%40ufl.edu%
7C871d5009dd3c455f398f08da84585e4a%7C0d4da0f84a314d76ace60a62331e1b84%
7C0%7C0%7C637967812337328788%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwM
DAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;
sdata=v3IEonnPgg1xTKUzLK4rJc3cfMFxw5p%2FW6puha5CFz0%3D&amp;reserved=0
and provide commented, minimal, self-contained, reproducible code.

______________________________________________
R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fstat
.ethz.ch%2Fmailman%2Flistinfo%2Fr-help&amp;data=05%7C01%7Ctebert%40ufl
.edu%7C871d5009dd3c455f398f08da84585e4a%7C0d4da0f84a314d76ace60a62331e
1b84%7C0%7C0%7C637967812337328788%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4w
LjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C
&amp;sdata=3iAfMs1QzQARKF3lqUI8s43PX4IIkgEuQ9PUDyUtpqY%3D&amp;reserved
=0 PLEASE do read the posting guide
https://nam10.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.r
-project.org%2Fposting-guide.html&amp;data=05%7C01%7Ctebert%40ufl.edu%
7C871d5009dd3c455f398f08da84585e4a%7C0d4da0f84a314d76ace60a62331e1b84%
7C0%7C0%7C637967812337328788%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwM
DAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;
sdata=v3IEonnPgg1xTKUzLK4rJc3cfMFxw5p%2FW6puha5CFz0%3D&amp;reserved=0
and provide commented, minimal, self-contained, reproducible code.

______________________________________________
R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
--
John Fox, Professor Emeritus
McMaster University
Hamilton, Ontario, Canada
web: https://socialsciences.mcmaster.ca/jfox/

--
John Fox, Professor Emeritus
McMaster University
Hamilton, Ontario, Canada
web: https://socialsciences.mcmaster.ca/jfox/

--
John Fox, Professor Emeritus
McMaster University
Hamilton, Ontario, Canada
web: https://socialsciences.mcmaster.ca/jfox/

John Fox, Professor Emeritus
McMaster University
Hamilton, Ontario, Canada
web: https://socialsciences.mcmaster.ca/jfox/
HI John,

Please see my response below your question,
(1) Are p-values interesting when you have n  > 165K cases?
There are some pairwise comparisons that do have few number of
observations. I need to get part of information.

 > (2) What's the point of testing a correlation between a variable and itself?
  No, this part  will be removed.

Thank you
Dear Val,

On 2022-08-26 6:27 p.m., Val wrote:
Thank you John again. I have got my result  in the form as shown below
       Variable   Corr     Pvalue   Nobs
             x1      1.000       0     165425
This raises two questions: (1) Are p-values interesting when you have n
 > 165K cases? (2) What's the point of testing a correlation between a
variable and itself?

Best,
  John

On Fri, Aug 26, 2022 at 3:59 PM John Fox <jfox at mcmaster.ca> wrote:
Dear Val,

On 2022-08-26 4:06 p.m., Val wrote:
Thank you John for your help and advice.\
You're welcome, and it occurred to me that if you want the number of
non-missing cases for each individual variable, you could add

         diag(N) <- colSums(!is.na(dat))

to the code I sent earlier.

Best,
   John

On Fri, Aug 26, 2022 at 11:04 AM John Fox <jfox at mcmaster.ca> wrote:
Dear Val,

On 2022-08-26 10:41 a.m., Val wrote:
Hi John and Timothy

Thank you for your suggestion and help. Using the sample data, I did
carry out a test run and found a difference in the correlation result.

Option 1.
data_cor <- cor(dat[ , colnames(dat) != "x1"],  # Calculate correlations
                       dat$x1, method = "pearson", use = "complete.obs")
resulted
                    [,1]
       x2 -0.5845835
       x3 -0.4664220
       x4  0.7202837

Option 2.
    for(i in colnames(dat)){
         print(cor.test(dat[,i], dat$x1, method = "pearson", use =
"complete.obs")$estimate)
       }
              [,1]
x2  -0.7362030
x3  -0.04935132
x4   0.85766290

This was crosschecked  using Excel and other softwares and all matches
with option 2.
One of the factors that contributed for this difference  is loss of
information when we are using na.rm(). This is because that if x2 has
missing value but x3 and x4 don?t have then  na.rm()  removed  entire
row information including x3 and x4.
Yes, I already explained that in my previous message.

As well, cor() is capable of computing pairwise-complete correlations --
see ?cor.

There's not an obvious right answer here, however. Using
pairwise-complete correlations can produce inconsistent (i.e.,
non-positive semi-definite) correlation matrices because correlations
are computed on different subsets of the data.

There are much better ways to deal with missing data.

My question is there  a way to extract the number of rows (N)  used in
the correlation analysis?.
I'm sure that there are many ways, but here is one that is very
simple-minded and should be reasonably efficient for ~250 variables:

   > (nc <- ncol(dat))
[1] 4

   > R <- N <- matrix(NA, nc, nc)
   > diag(R) <- 1
   > for (i in 1:(nc - 1)){
+   for (j in (i + 1):nc){
+     R[i, j] <- R[j, i] <-cor(dat[, i], dat[, j], use="complete.obs")
+     N[i, j] <- N[j, i] <- nrow(na.omit(dat[, c(i, j)]))
+   }
+ }

   > round(R, 3)
          [,1]   [,2]   [,3]   [,4]
[1,]  1.000 -0.736 -0.049  0.858
[2,] -0.736  1.000  0.458 -0.428
[3,] -0.049  0.458  1.000  0.092
[4,]  0.858 -0.428  0.092  1.000

   > N
        [,1] [,2] [,3] [,4]
[1,]   NA    8    8    8
[2,]    8   NA    8    8
[3,]    8    8   NA    8
[4,]    8    8    8   NA

   > round(cor(dat, use="pairwise.complete.obs"), 3) # check
          x1     x2     x3     x4
x1  1.000 -0.736 -0.049  0.858
x2 -0.736  1.000  0.458 -0.428
x3 -0.049  0.458  1.000  0.092
x4  0.858 -0.428  0.092  1.000

More generally, I think that it's a good idea to learn a little bit
about R programming if you intend to use R in your work. You'll then be
able to solve problems like this yourself.

I hope this helps,
    John

Thank you,

On Mon, Aug 22, 2022 at 1:00 PM John Fox <jfox at mcmaster.ca> wrote:
Dear Val,

On 2022-08-22 1:33 p.m., Val wrote:
For the time being  I am assuming the relationship across  variables
is linear.  I want get the values first  and detailed examining  of
the relationship will follow later.
This seems backwards to me, but I'll refrain from commenting further on
whether what you want to do makes sense and instead address how to do it
(not, BTW, because I disagree with Bert's and Tim's remarks).

Please see below:

On Mon, Aug 22, 2022 at 12:23 PM Ebert,Timothy Aaron <tebert at ufl.edu> wrote:
I (maybe) agree, but I would go further than that. There are assumptions associated with the test that are missing. It is not clear that the relationships are all linear. Regardless of a "significant outcome" all of the relationships need to be explored in more detail than what is provided in the correlation test.

Multiplicity adjustment as in : https://www.sciencedirect.com/science/article/pii/S0197245600001069 is not an issue that I can see in these data from the information provided. At least not in the same sense as used in the link.

My first guess at the meaning of "multiplicity adjustment" was closer to the experimentwise error rate in a multiple comparison procedure. https://dictionary.apa.org/experiment-wise-error-rateEssentially, the type 1 error rate is inflated the more test you do and if you perform enough tests you find significant outcomes by chance alone. There is great significance in the Redskins rule: https://en.wikipedia.org/wiki/Redskins_Rule.

A simple solution is to apply a Bonferroni correction where alpha is divided by the number of comparisons. If there are 250, then 0.05/250 = 0.0002. Another approach is to try to discuss the outcomes in a way that makes sense. What is the connection between a football team's last home game an the election result that would enable me to take another team and apply their last home game result to the outcome of a different election?

Another complication is if variables x2 through x250 are themselves correlated. Not enough information was provided in the problem to know if this is an issue, but 250 orthogonal variables in a real dataset would be a bit unusual considering the experimentwise error rate previously mentioned.

Large datasets can be very messy.

Tim

-----Original Message-----
From: Bert Gunter <bgunter.4567 at gmail.com>
Sent: Monday, August 22, 2022 12:07 PM
To: Ebert,Timothy Aaron <tebert at ufl.edu>
Cc: Val <valkremk at gmail.com>; r-help at R-project.org (r-help at r-project.org) <r-help at r-project.org>
Subject: Re: [R] Correlate

[External Email]

... But of course the p-values are essentially meaningless without some sort of multiplicity adjustment.
(search on "multiplicity adjustment" for details). :-(

-- Bert

On Mon, Aug 22, 2022 at 8:59 AM Ebert,Timothy Aaron <tebert at ufl.edu> wrote:
A somewhat clunky solution:
for(i in colnames(dat)){
      print(cor.test(dat[,i], dat$x1, method = "pearson", use = "complete.obs")$estimate)
      print(cor.test(dat[,i], dat$x1, method = "pearson", use =
"complete.obs")$p.value) }
Because of missing data, this computes the correlations on different
subsets of the data. A simple solution is to filter the data for NAs:

D <- na.omit(dat)

More comments below:

Rather than printing you could set up an array or list to save the results.

Tim

-----Original Message-----
From: R-help <r-help-bounces at r-project.org> On Behalf Of Val
Sent: Monday, August 22, 2022 11:09 AM
To: r-help at R-project.org (r-help at r-project.org) <r-help at r-project.org>
Subject: [R] Correlate

[External Email]

Hi all,

I have a data set with  ~250  variables(columns).  I want to calculate
the correlation of  one variable with the rest of the other variables
and also want  the p-values  for each correlation.  Please see the
sample data and my attempt.  I  have got the correlation but unable to
get the p-values

dat <- read.table(text="x1 x2 x3 x4
               1.68 -0.96 -1.25  0.61
              -0.06  0.41  0.06 -0.96
                  .    0.08  1.14  1.42
               0.80 -0.67  0.53 -0.68
               0.23 -0.97 -1.18 -0.78
              -1.03  1.11 -0.61    .
               2.15     .    0.02  0.66
               0.35 -0.37 -0.26  0.39
              -0.66  0.89   .    -1.49
               0.11  1.52  0.73  -1.03",header=TRUE)

#change all to numeric
        dat[] <- lapply(dat, function(x) as.numeric(as.character(x)))
This data manipulation is unnecessary. Just specify the argument
na.strings="." to read.table().

        data_cor <- cor(dat[ , colnames(dat) != "x1"],  dat$x1, method =
"pearson", use = "complete.obs")

Result
                  [,1]
x2 -0.5845835
x3 -0.4664220
x4  0.7202837

How do I get the p-values ?
Taking a somewhat different approach from cor.test(), you can apply
Fisher's z-transformation (recall that D is the data filtered for NAs):

    > 2*pnorm(abs(atanh(data_cor)), sd=1/sqrt(nrow(D) - 3), lower.tail=FALSE)
            [,1]
x2 0.2462807
x3 0.3812854
x4 0.1156939

I hope this helps,
     John

Thank you,

______________________________________________
R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fstat
.ethz.ch%2Fmailman%2Flistinfo%2Fr-help&amp;data=05%7C01%7Ctebert%40ufl
.edu%7C871d5009dd3c455f398f08da84585e4a%7C0d4da0f84a314d76ace60a62331e
1b84%7C0%7C0%7C637967812337328788%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4w
LjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C
&amp;sdata=3iAfMs1QzQARKF3lqUI8s43PX4IIkgEuQ9PUDyUtpqY%3D&amp;reserved
=0 PLEASE do read the posting guide
https://nam10.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.r
-project.org%2Fposting-guide.html&amp;data=05%7C01%7Ctebert%40ufl.edu%
7C871d5009dd3c455f398f08da84585e4a%7C0d4da0f84a314d76ace60a62331e1b84%
7C0%7C0%7C637967812337328788%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwM
DAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;
sdata=v3IEonnPgg1xTKUzLK4rJc3cfMFxw5p%2FW6puha5CFz0%3D&amp;reserved=0
and provide commented, minimal, self-contained, reproducible code.

______________________________________________
R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fstat
.ethz.ch%2Fmailman%2Flistinfo%2Fr-help&amp;data=05%7C01%7Ctebert%40ufl
.edu%7C871d5009dd3c455f398f08da84585e4a%7C0d4da0f84a314d76ace60a62331e
1b84%7C0%7C0%7C637967812337328788%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4w
LjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C
&amp;sdata=3iAfMs1QzQARKF3lqUI8s43PX4IIkgEuQ9PUDyUtpqY%3D&amp;reserved
=0 PLEASE do read the posting guide
https://nam10.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.r
-project.org%2Fposting-guide.html&amp;data=05%7C01%7Ctebert%40ufl.edu%
7C871d5009dd3c455f398f08da84585e4a%7C0d4da0f84a314d76ace60a62331e1b84%
7C0%7C0%7C637967812337328788%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwM
DAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;
sdata=v3IEonnPgg1xTKUzLK4rJc3cfMFxw5p%2FW6puha5CFz0%3D&amp;reserved=0
and provide commented, minimal, self-contained, reproducible code.

______________________________________________
R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
--
John Fox, Professor Emeritus
McMaster University
Hamilton, Ontario, Canada
web: https://socialsciences.mcmaster.ca/jfox/

--
John Fox, Professor Emeritus
McMaster University
Hamilton, Ontario, Canada
web: https://socialsciences.mcmaster.ca/jfox/

--
John Fox, Professor Emeritus
McMaster University
Hamilton, Ontario, Canada
web: https://socialsciences.mcmaster.ca/jfox/

--
John Fox, Professor Emeritus
McMaster University
Hamilton, Ontario, Canada
web: https://socialsciences.mcmaster.ca/jfox/