Comparison of the amount of computation

Wed, Apr 13, 2011 11:58 PM

On Wed, Apr 13, 2011 at 04:12:39PM -0700, helin_susam wrote:

Hi.

Every number from 1:100 has probability 1 - (1 - 1/100)^100 = 0.6339677
to appear in sample(100, 100, replace=TRUE). So, the expected length
of data2 is 63.39677. If you want to estimate the distribution of the
lengths of data2 using a simulation, then record length(data2). For
example

  n <- 10000
  s <- rep(NA, times=n)
  for (i in 1:n) {
      s[i] <- length(unique(sample(100, 100, replace=TRUE)))
  }
  cbind(table(s))

I obtained

     [,1]
  53    5
  54   16
  55   27
  56   82
  57  165
  58  294
  59  465
  60  672
  61  970
  62 1168
  63 1283
  64 1303
  65 1111
  66  882
  67  626
  68  435
  69  250
  70  143
  71   57
  72   27
  73   14
  74    5

In this case, mean(sample(100, 100, replace=TRUE)) and
mean(unique(sample(100, 100, replace=TRUE))) have the same
expected value 50.5. However, eliminating repeated values may,
in general, change the expected value of the sample mean.

Hope this helps.

Petr Savicky.

Comparison of the amount of computation

Thread (5 messages)