Skip to content

Simulate phi-coefficient (correlation between dichotomous vars)

2 messages · Bliese, Paul D LTC USAMH, David Duffy

#
Newsgroup members, 

I appreciate the help on this topic.

David Duffy provided a solution (below) that was quite helpful, and came
close to what I needed.  It did a great job creating two vectors of
dichotomous variables with a known correlation (what I referred to as a
phi-coefficient).

My situation is a bit more complicated and I'm not sure it is easily
solved.  The problem is that I must assume one of the vectors is
constant and generate one or more vectors that covary with the constant
vector.

In a continuous example I could use the following code that I got from
the S-PLUS newsgroup year ago:

sample.cor<-function (x, rho) 
{
    y <- (rho * (x - mean(x)))/sqrt(var(x)) + sqrt(1 - rho^2) * 
        rnorm(length(x))
    cat("Sample corr = ", cor(x, y), "\n")
    return(y)
}

X<-rnorm(100)  #a constant vector
Y1<-sample.cor(X,.30) # a new vector that correlates with X .30
Y2<-sample.cor(X,.45) # a second vector that correlates with X .45

I can, of course, have X be a vector of zeros and ones, and I can
dichotomize Y1 and Y2, but the program always returns a phi-coefficient
correlation lower than the continuous correlation.  Mathematically, I
guess this is expected because the phi-coefficient is partially a
function of the percentage of positive responses.  This, in turn,
explains Pearson's (1900) interest in the whole area of tetrachoric
correlations -- a tetrachoric correlation being the Pearson product
moment correlation that would have been observed had two dichotomously
scored variables been measured on a continuous scale (Pearson, 1900).

Appreciate any additional input or possible solutions.

Paul



-----Original Message-----
From: r-help-bounces at stat.math.ethz.ch
[mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of David Duffy
Sent: Monday, September 12, 2005 1:34 AM
To: r-help at stat.math.ethz.ch
Subject: [R] Simulate phi-coefficient
(for
with
One way is to sample from the 2x2 table with the specified means and
pearson
correlation (phi):

for a fourfold table, a b
                      c d
with marginal proportions p1 and p2
cov <- phi * sqrt(p1*(1-p1)*p2*(1-p2))
a <- p1*p2 + cov
b <- p1*(1-p2) - cov
c <- (1-p1)*p2 - cov
d <- (1-p1)*(1-p2) + cov
expand.grid(0:1,0:1)[sample(1:4, size=25, replace=TRUE,
prob=c(a,b,c,d)),]

David.

| David Duffy (MBBS PhD)                                         ,-_|\
| email: davidD at qimr.edu.au  ph: INT+61+7+3362-0217 fax: -0101  /     *
| Epidemiology Unit, Queensland Institute of Medical Research   \_,-._/
| 300 Herston Rd, Brisbane, Queensland 4029, Australia  GPG 4D0B994A v

______________________________________________
R-help at stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html
#
On Tue, 27 Sep 2005, Bliese, Paul D LTC USAMH wrote:
Calculate the conditional probabilities from the above

P(X2=1|X1=1)= a/(a+b) = p2 + cov/p1
P(X2=1|X1=0)= c/(c+d) = p2 - cov/(1-p1)


condsim <- function(X, phi, p2, p1=NULL) {
  if (!all(X %in% c(0,1))) stop("expecting 1's and 0's")
  if (is.null(p1)) p1 <- mean(X)
  covar <- phi  * sqrt(p1*(1-p1)*p2*(1-p2))
  if (covar>0 && covar>(min(p1,p2)-p1*p2)) {
    warning("Specified correlation too large for given marginal proportions")
    covar <- min(p1,p2)-p1*p2
  }else if (covar<0 && covar < -min(p1*p2,(1-p1)*(1-p2))) {
    warning("Specified correlation too small for given marginal proportions")
    covar <- -min(p1*p2,(1-p1)*(1-p2))
  }
  Y <- X
  i1 <- X==1
  i0 <- X==0
  Y[i1] <- rbinom(sum(i1),1, p2 + covar/p1)
  Y[i0] <- rbinom(sum(i0),1, p2 - covar/(1-p1))
  data.frame(X,Y)
}

David Duffy.