Skip to content

How to simulate correlated data

5 messages · Lisa Wang, robin hankin, Berwin A Turlach +2 more

#
Hello there,

I would like to simulate X --Normal (20, 5)
                         Y-- Normal (40, 10)

and the correlation between X and Y is 0.6. How do I do it in R?

Thank you very much

Lisa Wang Msc.
Princess Margaret Hospital
Toronto, Ca
#
Hi

you need

library(mvtnorm)

then

a <- rmvnorm(n=10000,mean=c(20,40),sigma=matrix(c(5,0.6*sqrt(50), 
0.6*sqrt(50),10),2,2))



will do what you want


HTH

rksh
On 15 Dec 2005, at 15:33, Lisa Wang wrote:

            
--
Robin Hankin
Uncertainty Analyst
National Oceanography Centre, Southampton
European Way, Southampton SO14 3ZH, UK
  tel  023-8059-7743
#
G'day Lisa,
LW> I would like to simulate X --Normal (20, 5) Y-- Normal (40,
    LW> 10) and the correlation between X and Y is 0.6.

    LW> How do I do it in R?
That depends on what you want the joint distribution to be. :)

If you want the joint distribution to be normal, you could use the
function mvrnorm() from the MASS package.

HTH.

Cheers,

        Berwin

========================== Full address ============================
Berwin A Turlach                      Tel.: +61 (8) 6488 3338 (secr)   
School of Mathematics and Statistics        +61 (8) 6488 3383 (self)      
The University of Western Australia   FAX : +61 (8) 6488 1028
35 Stirling Highway                   
Crawley WA 6009                e-mail: berwin at maths.uwa.edu.au
Australia                        http://www.maths.uwa.edu.au/~berwin
#
So what you actually wnat is a multivariate normal distribution!
with mean c(20,40) and covariance matrix 
cbind(c(5,0.6*sqrt(5,10)),c(0.6*sqrt(5,10),10))
[Since Corr(x,y) = Cov(x,y)/sqrt(Var(x)*Var(y))


Look at the mvtnorm package, for function rmvnorm


Trying RSiteSearch("Multivariate normal distribution")
should also bring you to the package

Best regrads,
Kristel
Lisa Wang wrote:

  
    
#
On 15-Dec-05 Lisa Wang wrote:
... and, as well as using mvrnorm (MASS) or rmvnorm (mvtnorm),
as have been suggested, you could simply do it "by hand":

If U, V are independent and N(0,1), then

  E(U + a*V)*(U - a*V) = 1 - a^2

  E(U+a*V)^2 = E(U - a*V) = 1 + a*2

so the correlation between (U + a*V) and U - a*V) is

  r = (1 - a^2)/(1 + a^2)

Hence, for -1 < r < 1, choose

  a = sqrt((1 - r)/(1 + r))

which, for r = 0.6, gives a = sqrt(0.4/1.6) = sqrt(1/4) = 1/2
(how nice! ... ).

Then Var(U + a*V) = 1 + a^2 = 1 + 1/4 = 5/4 (I smell more smooth
numbers coming ... ).

Then, since the correlation between two variables is unchanged
if you add a constant to either, or multiply either by a constant,
you can give (U + a*V) variance 5 by multiplying it by 2, and
give (U - a*V) variance 10 by multiplying by 2*sqrt(2), both still
having expectation 0. So finally add 10 and 20:

  X = 10 + 2*(U + V/2) ;  Y = 20 + 2*sqrt(2)*(U - V/2)

So you can get U and V by sampling from rnorm(), and then X and Y
as described.

(Which is how I used to do it before starting to use R, e.g. in
matlab/octave).

Best wishes,
Ted.


--------------------------------------------------------------------
E-Mail: (Ted Harding) <Ted.Harding at nessie.mcc.ac.uk>
Fax-to-email: +44 (0)870 094 0861
Date: 15-Dec-05                                       Time: 17:04:18
------------------------------ XFMail ------------------------------