Skip to content

Partial correlations and p-values

3 messages · Petersen, Isaac T, Juliet Hannah, Charles C. Berry

#
I am trying to calculate a partial correlation and p-values.  Unfortunately,
the results in R are different than what SPSS gives.

Here is an example in R (calculating the partial correlation of x and y,
controlling for z1 and z2):

x <- c(1,20,14,30,9,4,8)
y <- c(5,6,7,9,NA,10,6)
z1 <- c(13,8,16,14,26,13,20)
z2 <- c(12,NA,2,5,8,16,13)
fmx <- lm(x ~ z1 + z2, na.action = na.exclude)
fmy <- lm(y ~ z1 + z2, na.action = na.exclude)
yres <- resid(fmy)
xres <- resid(fmx)
cor(xres, yres, use = "p")
ct <- cor.test(xres, yres)
ct$estimate
ct$p.value

R give me:
r = .65, p = .23

However, SPSS calculates:
r = .46, p = .70

I think something may be different with R's handling of missing data, as
when I replace the NA's with values, R and SPSS give the same r-values,
albeit different p-values still.  I am doing pairwise case exclusion in both
R and SPSS.  Any ideas why I'm getting different values?  Is something wrong
with my formula in R?  Any help would be greatly appreciated.  Thanks!
Peter Ehlers wrote:

  
    
3 days later
#
Your R code looks correct.

Because this is a straightforward calculation, I would be surprised if there
were any differences with SPSS. It may be worthwhile to check
if SPSS  gives partial correlations or semipartial correlations. For example,
if you take the correlation between

py <- resid(lm(y ~ z1 + z2,data=mydat2))

and

x

where mydat2 has missing values removed, you get 0.47.
On Tue, Dec 1, 2009 at 8:24 PM, dadrivr <dadrivr at gmail.com> wrote:
#
On Sat, 5 Dec 2009, Juliet Hannah wrote:

            
There are a couple of hiccups.

First the degrees of freedom for the partial correlation would be wrong 
even if there was no missing data.
There are differences. SPSS seems to use the correlation matrix computed 
with a pairwise present method and compute partial correlations from that.

Following

 	http://wiki.r-project.org/rwiki/doku.php?id=tips:data-matrices:part_corr
[1] 0.4596122
[1] 0.657214

The pairwise present value seems to be what SPSS is reporting.

The complete cases values is nearly (but not the same as) what you got.

A real issue here is how to usefully compute and test partial 
correlations in the presence of missing data. If you want to persue that, 
I would suggest opening a new thread with a subject line like 'partial 
correlations with missing observations'

HTH,

Chuck


It may be worthwhile to check
Charles C. Berry                            (858) 534-2098
                                             Dept of Family/Preventive Medicine
E mailto:cberry at tajo.ucsd.edu	            UC San Diego
http://famprevmed.ucsd.edu/faculty/cberry/  La Jolla, San Diego 92093-0901