Mixture of Distributions
(Ted Harding) <Ted.Harding at manchester.ac.uk> wrote in news:XFMail.080221222007.Ted.Harding at manchester.ac.uk:
I just realised I made a bad mistaqke (see below) On 21-Feb-08 21:39:56, Ted Harding wrote:
On 21-Feb-08 20:58:25, Evgenia wrote:
Dear R users, I would like to sample from a mixture distribution p1*f1+p2*f2+p3*f3 with f1,f2,f3 three different forms of distributions. I know that in the case of two distributions I have to sample the mixture compoment membership. How can I weight my three distributions with their respective probabilities? Evgenia
There are several ways in which you could write code to do this, but all amount to the fact that each value sampled from such a mixture is obtained by a) choose between f1, f2, f3 at random according to the probabilities p1, p2, p3 (it is assumed p1+p2+p3=1). b) sample 1 value from whichever of f1, f2, f3 was chosen. Suggestion: Suppose the functions rf1(n), rf2(n), rf3(n) respectively sample n values from f1, f2 and f3.
S0 <- cbind(rf1(n),rf2(n),rf3(n)) ix <- sample(c(1,2,3),n,3,prob=c(p1,p2,p3),replace=TRUE) So far so good! S <- S0[,ix] But this will not do what I intended (i.e. select element ix[1] from the first row os S0, ix[2] from the second row, and so on). Instead, it will return an nxn matrix with n rows, and n columns which are copies of columns of S0 in the order selected by ix! The following will do the correct thing, though there must be a neater way of doing it! The example below has been corrected. S <- S0[,1]*(ix==1) + S0[,2]*(ix==2) + S0[,3]*(ix==3)
I discovered a shorter method (and I even think I might understand why
it works.) Try:
S0[col(S0)==ix]
R loops through the S0 matrix and finds only those entries where the
column numbers match the ix[] values. I would have thought that the
first row and ix[1]th column entry should be first but the machinery
seems to be working differently. The user should realize that all the
column==1 hits will occur first
#-----toy code-------
rf1 <- function(n){rnorm(n,0,1)} ## normal distribution with mean=0
rf2 <- function(n){rnorm(n,4,1)} ## normal distribution with mean=4
rf3 <- function(n){rnorm(n,9,1)} ## normal distribution with mean=9
S0 <- cbind(rf1(10),rf2(10),rf3(10))
p1 <- 0.2; p2 <- 0.3; p3 <-0.5
set.seed<-37
ix <- sample(c(1,2,3),10,prob=c(p1,p2,p3),replace=TRUE)
#> ix
# [1] 2 3 1 3 1 2 1 2 2 1
S0
#----------
S0
[,1] [,2] [,3]
[1,] 0.5143426 3.823013 9.666210
[2,] 0.7044110 6.095287 10.276792
[3,] -0.3597926 5.848588 8.801378
[4,] 0.1346979 5.518667 8.357181
[5,] -0.6856507 2.882090 9.893790
[6,] 0.5882977 4.864201 8.708929
[7,] -0.4241657 2.891672 8.056254
[8,] 1.6243569 2.981563 11.684422
[9,] -0.3072410 4.379916 10.679470
[10,] -0.6289604 3.119195 7.216593
#--------------------------
S0[col(S0)==ix]
# [1] 0.1346979 -0.3072410 3.8230128 5.8485880 4.8642010 2.9815631
# [7] 10.2767916 9.8937904 8.0562542 7.2165927
R seems to be looping through the columns "looking" for the TRUE values
in the col(S0)==ix matrix.
col(S0)==ix
[,1] [,2] [,3]
[1,] FALSE TRUE FALSE
[2,] FALSE FALSE TRUE
[3,] FALSE TRUE FALSE
[4,] TRUE FALSE FALSE
[5,] FALSE FALSE TRUE
[6,] FALSE TRUE FALSE
[7,] FALSE FALSE TRUE
[8,] FALSE TRUE FALSE
[9,] TRUE FALSE FALSE
[10,] FALSE FALSE TRUE
Regards; David Winsemius