Newsgroup members,
I appreciate the help on this topic.
David Duffy provided a solution (below) that was quite helpful, and came
close to what I needed. It did a great job creating two vectors of
dichotomous variables with a known correlation (what I referred to as a
phi-coefficient).
My situation is a bit more complicated and I'm not sure it is easily
solved. The problem is that I must assume one of the vectors is
constant and generate one or more vectors that covary with the constant
vector.
In a continuous example I could use the following code that I got from
the S-PLUS newsgroup year ago:
sample.cor<-function (x, rho)
{
y <- (rho * (x - mean(x)))/sqrt(var(x)) + sqrt(1 - rho^2) *
rnorm(length(x))
cat("Sample corr = ", cor(x, y), "\n")
return(y)
}
X<-rnorm(100) #a constant vector
Y1<-sample.cor(X,.30) # a new vector that correlates with X .30
Y2<-sample.cor(X,.45) # a second vector that correlates with X .45
I can, of course, have X be a vector of zeros and ones, and I can
dichotomize Y1 and Y2, but the program always returns a phi-coefficient
correlation lower than the continuous correlation. Mathematically, I
guess this is expected because the phi-coefficient is partially a
function of the percentage of positive responses. This, in turn,
explains Pearson's (1900) interest in the whole area of tetrachoric
correlations -- a tetrachoric correlation being the Pearson product
moment correlation that would have been observed had two dichotomously
scored variables been measured on a continuous scale (Pearson, 1900).
Appreciate any additional input or possible solutions.
Paul
-----Original Message-----
From: r-help-bounces at stat.math.ethz.ch
[mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of David Duffy
Sent: Monday, September 12, 2005 1:34 AM
To: r-help at stat.math.ethz.ch
Subject: [R] Simulate phi-coefficient
From: "Bliese, Paul D LTC USAMH" <paul.bliese at us.army.mil>
Given a sample of zeros and ones, for example:
VECTOR1<-rep(c(1,0),c(15,10))
How would I create a new sample (VECTOR2) also containing zeros and
ones, in which the phi-coefficient between the two sample vectors was
drawn from a population with a known phi-coefficient value?
I know there are ways to do this with normally distributed numbers
(for
example the mvrnorm function in MASS), but am stumped when dealing
with
dichotomous variables.
Paul
One way is to sample from the 2x2 table with the specified means and
pearson
correlation (phi):
for a fourfold table, a b
c d
with marginal proportions p1 and p2
cov <- phi * sqrt(p1*(1-p1)*p2*(1-p2))
a <- p1*p2 + cov
b <- p1*(1-p2) - cov
c <- (1-p1)*p2 - cov
d <- (1-p1)*(1-p2) + cov
expand.grid(0:1,0:1)[sample(1:4, size=25, replace=TRUE,
prob=c(a,b,c,d)),]
David.
| David Duffy (MBBS PhD) ,-_|\
| email: davidD at qimr.edu.au ph: INT+61+7+3362-0217 fax: -0101 / *
| Epidemiology Unit, Queensland Institute of Medical Research \_,-._/
| 300 Herston Rd, Brisbane, Queensland 4029, Australia GPG 4D0B994A v
______________________________________________
R-help at stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html
On Tue, 27 Sep 2005, Bliese, Paul D LTC USAMH wrote:
My situation is a bit more complicated and I'm not sure it is easily
solved. The problem is that I must assume one of the vectors is
constant and generate one or more vectors that covary with the constant
vector.
One way is to sample from the 2x2 table with the specified means and
pearson correlation (phi):
for a fourfold table, a b
c d
with marginal proportions p1 and p2
cov <- phi * sqrt(p1*(1-p1)*p2*(1-p2))
a <- p1*p2 + cov
b <- p1*(1-p2) - cov
c <- (1-p1)*p2 - cov
d <- (1-p1)*(1-p2) + cov
Calculate the conditional probabilities from the above
P(X2=1|X1=1)= a/(a+b) = p2 + cov/p1
P(X2=1|X1=0)= c/(c+d) = p2 - cov/(1-p1)
condsim <- function(X, phi, p2, p1=NULL) {
if (!all(X %in% c(0,1))) stop("expecting 1's and 0's")
if (is.null(p1)) p1 <- mean(X)
covar <- phi * sqrt(p1*(1-p1)*p2*(1-p2))
if (covar>0 && covar>(min(p1,p2)-p1*p2)) {
warning("Specified correlation too large for given marginal proportions")
covar <- min(p1,p2)-p1*p2
}else if (covar<0 && covar < -min(p1*p2,(1-p1)*(1-p2))) {
warning("Specified correlation too small for given marginal proportions")
covar <- -min(p1*p2,(1-p1)*(1-p2))
}
Y <- X
i1 <- X==1
i0 <- X==0
Y[i1] <- rbinom(sum(i1),1, p2 + covar/p1)
Y[i0] <- rbinom(sum(i0),1, p2 - covar/(1-p1))
data.frame(X,Y)
}
David Duffy.