Generating Distributions with set skewness and kurtosis
On Tue, Aug 26, 2008 at 11:53 AM, <Matthias.Koberstein at hsbctrinkaus.de> wrote:
Hello, I am reaching out to you for help since I am struggeling to find a function to generate distributions with a set statistical properties as kurtosis and skewdness. Lets say I want to generate random variables following a "normal" distribution, but with skewness 2 and kurtosis 5. How would I do that, the most efficient way? Are there any packages for that? I had a quick look but were only able to find packages which calculate statistical distribution properties after having the data. Thank you very much Matthias
The skewness and kurtosis of the normal distribution are fixed, but there are many continuous univariate distributions defined on the entire real line for which the skewness and kurtosis can be varied. One possible choice is the Pearson Type IV distribution. This distribution has the nice feature that the skewness and kurtosis can be easily formulated in terms of the distributional parameters (and vice versa). The Wikipedia entry of the Pearson distributions is fairly informative: http://en.wikipedia.org/wiki/Pearson_distribution Joel Heinrich has written up a nice implementation guide: http://www-cdf.fnal.gov/publications/cdf6820_pearson4.pdf The translation into R is fairly straightforward. There are many other options for distributions that allow for arbitrary skewness and kurtosis, but relating the parameters of the distribution to the skewness and kurtosis can be a challenge. If you are willing to resort to numerical methods to determine the skewness and kurtosis from the distributional parameters, here are a few choices. One easy option is the skewed-t distribution of F?rnandez and Steel. See the "skewt" package by Robert King and Emily Anderson. The F?rnandez and Steel approach is elegant in that it provides a way to transform any symmetric continuous distribution into a skewed distribution. However, working out the exact skewness and kurtosis from the parameter values can be a challenge. As mentioned by John Frain, the stable distribution is a good choice when the tails are especially heavy. See John Nolan's web site for a wealth of information: http://academic2.american.edu/~jpnolan/stable/stable.html For an R implementation, see Jim Lindsey's web page: http://popgen.unimaas.nl/~jlindsey/rcode.html (Also, although the stable distributions are skewed and heavy-tailed, the traditional definitions of skewness and kurtosis can't be applied to them, because the 2nd and higher moments are not defined.) On the other hand, if you are interested in a distribution that has thinner tails than the normal, you might want to consider the skew GED distribution. See Diethelm Wuertz's fGarch package: http://www.rmetrics.org This package also contains implementations of skew normal and skew student-t distributions, again using F?rnandez and Steel's approach. The normal inverse Gaussian, and its cousin the generalized hyperbolic distribution, has received a fair amount of recent attention. I believe an implementation can be found in the "ghyp" package of Wolfgang Breymann and David Luethi. This does not by any means exhaust the space of possibilities, but it should at least give you a start. BTW, here are a few R functions that will help you to explore the skewness and kurtosis of arbitrary distributions: # Calculate mean of an arbitrary density Mean <- function(f, ...) { integrate(function (x) { f(x, ...) * x }, -Inf, Inf)$value } # Calculate k-th central moment of an arbitrary density M <- function (f, ..., k=1, xm = Mean(f, ...)) { integrate(function(x) {(x - xm)^k*f(x,...)}, -Inf, Inf)$value } # Calculate skewness of an arbitrary density SK <- function(f, ...) { M(f, ..., k=3) / (M(f, ..., k=2)^1.5) } # Calculate excess kurtosis of an arbitrary density KU <- function(f, ...) { M(f, ..., k=4) / (M(f, ..., k=2)^2) - 3}
SK(dnorm)
[1] 0 # normal distribution has skewness of 0
KU(dnorm)
[1] 1.625367e-13 # good enough for government work # Test gamma distribution with shape=1
Mean(dgamma, 1)
[1] 1 # good
SK(dgamma, 1)
[1] 2 # good
KU(dgamma, 1)
[1] 6 # good
library(skewt) KU(dskt, 5, 1)
[1] 6 # Agrees with theory ... skewness of t with 5 d.f. should be 6
SK(dskt, 5, 1.5)
[1] 1.516366
SK(dskt, 5, 1/1.5)
[1] -1.516366
Matt Clegg matthewcleggphd at gmail.com