Generating a binomial random variable correlated with a
On 17-Apr-05 Ted Harding wrote:
[...] So I'd suggest experimenting on the following lines. 1. Let X1 be a sample of size N using rbinom(N,1,p) (where, in general, p need not be 0.5) 2. Let Y be a sample of size N using rnorm(N,mu,sigma) (and again, in general, mu need not be 0 nor sigma 1). This is as in your example. So far, you have a true Binomial variate X1 and a true Normal variate Y. 3. X1 will have r 1s in it, and (N-r) 0s. Now consider setting up a correspondence between the 1s and 0s in X1 and the values in Y, in such a way that the 1s tend to get allocated to the higher values of Y and the 0s to lower values of Y. The resulting set of pairs (X1,Y) is then your correlated sample. In other words, permute the sample X1 and cbind the result to Y. [...] Here is an example of a possible approach: X0 <- rbinom(100,1,0.5) Y <- rnorm(100) ; Y<-sort(Y) p0<-((1:100)-0.5)/100 ; p0<-p0/sum(p0); p<-cumsum(p0) r<-sum(X0); ix<-sample((1:100),r,replace=FALSE,prob=p) X1<-numeric(100); X1[ix]<-1;sum(X1) ## [1] 50 cor(X1,Y) ## [1] 0.6150937
Here is a follow-up (somewhat inspired by a remark in Bill's
suggestions). Consider the function
test<-function(C,N){
R<-numeric(N); corrs<-numeric(N)
for(i in (1:N)){
X0 <- rbinom(100,1,0.5) ; r<-sum(X0)
Y <- rnorm(100) ; Y<-sort(Y)
p0<-((1:100)-0.5)/100 ; L<-log(p0/(1-p0));
p<-exp(C*L);p<-p/(1+p);
ix<-sample((1:100),r,replace=FALSE,prob=p)
X1<-numeric(100); X1[ix]<-1;
R[i]<-r ; corrs[i]<-cor(X1,Y)
}
list(R=R,corrs=corrs)
}
By experimenting with different values of C, you can see what
you get. For instance:
mean(test(0.1,1000)$corrs)
## [1] 0.05958913
mean(test(0.5,1000)$corrs)
## [1] 0.2722242
mean(test(1.0,1000)$corrs)
## [1] 0.4347375
and finally (in my trials):
mean(test(5.2,1000)$corrs)
## [1] 0.702623
mean(test(5.22,10000)$corrs)
## [1] 0.6998803
and that might just be close enough to your desired 0.7 ...!
So now you have a true Binomial sample, an associated true
Normal sample, and a Pearson correlation of 0.7 between their
values.
Now of course comes the question which has been intriguing
us all: What is the purpose of achieving a given Pearson
correlation between a Normal variate and a binary variate?
Best wishes,
Ted.
--------------------------------------------------------------------
E-Mail: (Ted Harding) <Ted.Harding at nessie.mcc.ac.uk>
Fax-to-email: +44 (0)870 094 0861
Date: 17-Apr-05 Time: 14:10:43
------------------------------ XFMail ------------------------------