Skip to content
Prev 392520 / 398502 Next

Correlate

Val
Hi John and Timothy

Thank you for your suggestion and help. Using the sample data, I did
carry out a test run and found a difference in the correlation result.

Option 1.
data_cor <- cor(dat[ , colnames(dat) != "x1"],  # Calculate correlations
                    dat$x1, method = "pearson", use = "complete.obs")
resulted
                 [,1]
    x2 -0.5845835
    x3 -0.4664220
    x4  0.7202837

Option 2.
 for(i in colnames(dat)){
      print(cor.test(dat[,i], dat$x1, method = "pearson", use =
"complete.obs")$estimate)
    }
           [,1]
x2  -0.7362030
x3  -0.04935132
x4   0.85766290

This was crosschecked  using Excel and other softwares and all matches
with option 2.
One of the factors that contributed for this difference  is loss of
information when we are using na.rm(). This is because that if x2 has
missing value but x3 and x4 don?t have then  na.rm()  removed  entire
row information including x3 and x4.

My question is there  a way to extract the number of rows (N)  used in
the correlation analysis?.
Thank you,
On Mon, Aug 22, 2022 at 1:00 PM John Fox <jfox at mcmaster.ca> wrote: