Skip to content
Prev 70208 / 398506 Next

skewness and kurtosis in e1071 correct?

I wonder whether the functions for skewness and kurtosis in the e1071 
package are based on correct formulas.

The functions in the package e1071 are:

# --------------------------------------------
skewness <- function (x, na.rm = FALSE)
{
     if (na.rm)
         x <- x[!is.na(x)]
     sum((x - mean(x))^3)/(length(x) * sd(x)^3)
}
# --------------------------------------------

and

# --------------------------------------------
kurtosis <- function (x, na.rm = FALSE)
{
     if (na.rm)
         x <- x[!is.na(x)]
     sum((x - mean(x))^4)/(length(x) * var(x)^2) - 3
}
# --------------------------------------------

However, sd() and var() are the estimated population parameters. To 
calculate the sample statistics of skewness and kurtosis, shouldn't one 
use the sample statistics of the standard deviation (and variance), as 
well? For example:

# --------------------------------------------
# Function to calculate the sample statistic of skewness:
skew_s=function(x)
{
  x = x[!is.na(x)]
  n = length(x)
  if (n < 3)
  {
    cat('valid cases = ',n,'\nskewness is not defined for less than 3 
valid cases!\n')
  }
  else
  {
  z = sqrt(n/(n-1))*scale(x)
  mean(z^3)
  }
}
# --------------------------------------------

and

# --------------------------------------------
# Function to calculate the sample statistic of kurtosis:
kurt_s=function(x)
{
  x = x[!is.na(x)]
  n = length(x)
  if (n < 4)
  {
    cat('valid cases = ',n,'\nkurtosis is not defined for less than 4 
valid cases!\n')
  }
  else
  {
  z = sqrt(n/(n-1))*scale(x)
  mean(z^4)-3
  }
}
# --------------------------------------------

Whereas, to calculate the (unbiased) estimated population parameter of 
skewness and kurtosis, the correction should also include the number of 
cases in the following way:

# --------------------------------------------
# Function to calculate the unbiased populataion estimate of skewness:
skew=function(x)
{
  x = x[!is.na(x)]
  n = length(x)
  if (n < 3)
  {
    cat('valid cases = ',n,'\nskewness is not defined for less than 3 
valid cases!\n')
  }
  else
  {
  z = scale(x)
  sum(z^3)*n/((n-1)*(n-2))
  }
}
# --------------------------------------------

and

# --------------------------------------------
# Function to calculate the unbiased population estimate of kurtosis:
kurt=function(x)
{
  x = x[!is.na(x)]
  n = length(x)
  if (n < 4)
  {
    cat('valid cases = ',n,'\nkurtosis is not defined for less than 4 
valid cases!\n')
  }
  else
  {
  z = scale(x)
  sum(z^4)*n*(n+1)/((n-1)*(n-2)*(n-3))-3*(n-1)^2/((n-2)*(n-3))
  }
}
# --------------------------------------------

Thus, it seems that the formulas used in the e1071 package are neither 
formulas for the sample statistics nor for the (unbiased) estimates of 
the population parameters. Is there another definition of kurtosis and 
skewness that I am not aware of?

Dirk