Skip to content

Simulating underdispersed counts

1 message · Greg Snow

#
Yes, that is basically truncation, but the original poster said they wanted something fast and did not give detail.  If they just wanted some data to put into a glm model to demonstrate, then this would work.

Another idea that I had is instead of dropping all the extreme values, is take the extreme values and replace them with a new draw, this would still allow for the possibility of some extreme values, but would reduce the number of them.  This fits the idea "if my results don't match what I expected, something must have gone wrong and I will just do it again" that is sometimes seen in researchers who don't fully understand the idea of variation (Mendel's gardener/assistant as a possible example).

Another approach:  One way to think of a regular Poisson process is to have several bins and place objects in the bins at random.  If the probability of placing something into a bin is independent of how many objects are already in that bin (and the others) then the counts of objects per bin will follow a Poisson distribution.  Doing the same thing but having the probability of which bin to place the object into depend on the number of objects already in bins would lead to over or under dispersion (over if the next object is more likely to go into bins already containing objects, under if more likely to go into bins containing no/fewer objects).  It should not be too hard to write a function that would put m balls in n bins based on a probability model proportional to current counts, some experimentation would probably be needed to get the probability model to match the amount of over/under-dispersion desired.

--
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.snow at imail.org
801.408.8111