Skip to content
Prev 385604 / 398503 Next

Some code to run Hutcheson t-test using R

If you want a function automating Karl's code, here it is. It returns an 
object of S3 class "htest", R's standard for hypothesis tests functions. 
The returned object can then be subset in the usual ways, ht$statistic, 
ht$parameter, ht$p.value, etc.


library(QSutils)

hutcheson.test <- function(x1, x2){
   dataname1 <- deparse(substitute(x1))
   dataname2 <- deparse(substitute(x2))
   method <- "Hutcheson's t-test for Shannon diversity equality"
   alternative <- "the diversities of the two samples are not equal"
   h1 <- Shannon(x1)
   varh1 <- ShannonVar(x1)
   n1 <- sum(x1)
   h2 <- Shannon(x2)
   varh2 <- ShannonVar(x2)
   n2 <- sum(x2)
   degfree <- (varh1 + varh2)**2 / (varh1**2/n1 + varh2**2/n2)
   tstat <- (h1 - h2)/sqrt(varh1 + varh2)
   p.value <- 2*pt(-abs(tstat), df = degfree)
   ht <- list(
     statistic = c(t = tstat),
     parameter = c(df = degfree),
     p.value = p.value,
     alternative = alternative,
     method = method,
     data.name = paste(dataname1, dataname2, sep = ", ")
   )
   class(ht) <- "htest"
   ht
}

earlier <- c(0,0,146,0,5,46,0,1,295,0,0,3,0,0,0,0,0)
later <- c(0,0,142,0,5,46,0,1,246,0,0,3,0,0,0,0,0)

hutcheson.test(earlier, later)



With the data you provided:


bird_1956 <- c(4,4,190,135,56,3,2,2,1,12,41,201,1,0,131,3)
bird_1957 <- c(4,111,53,66,146,222,126,61,0,2323,21)
bird_1958 <- c(0,3,32,228,56,102,0,11,2,220,0)
bird_1959 <- c(0,0,14,59,26,68,0)
bird_1960 <- c(0,0,73,66,71,25,0,109,63,1)

hutcheson.test(bird_1956, bird_1957)




Note that like David said earlier, there might be better ways to 
interpret Shannon's diversity index. If h is the sample's diversity, 
exp(h) gives the number of equally-common species with equivalent 
diversity.


s1 <- Shannon(earlier)
s2 <- Shannon(later)
c(earlier = s1, later = s2)
exp(c(earlier = s1, later = s2))   # Both round to 3
eq_common <- rep(1, 3)             # Can be 1 specimen or any other number
Shannon(eq_common)                 # Slightly greater than the samples' 
diversity


round(exp(sapply(birds, Shannon))) # Your data


#-------------------------------------


Earlier Karl wrote [1] that


Here var(h) is calculated as in ref 1 cited by Rui Barradas - I guess
that explains the minor numerical differences obtained with the code
above and the published variances.


I don't believe the published variances were computed with the published 
variance estimator. The code below computes the variances like QSutils 
and with formula (4) in Hutcheson's paper. The latter does not give the 
same results.

var_est <- function(n){
   s <- length(n)
   N <- sum(n)
   p <- n/N
   i <- p != 0
   inv.p <- numeric(s)
   inv.p[i] <- 1/p[i]
   log.p <- numeric(s)
   log.p[i] <- log(p[i])
   #
   term1 <- (sum(p * log.p^2) - sum(p * log.p)^2)/N
   term2 <- (s - 1)/(2*N^2)
   #
   numer3 <- -1 + sum(inv.p) - sum(inv.p * log.p) + sum(inv.p)*sum(p * 
log.p)
   denom3 <- 6*N^3
   term3 <- numer3/denom3
   list(
     Bioc = term1 + term2,
     Hutch = term1 + term2 + term3
   )
}

Vh1 <- var_est(earlier)
Vh1
all.equal(ShannonVar(earlier), Vh1$Bioc)
ShannonVar(earlier) - Vh1$Bioc            # FAQ 7.31

Vh2 <- var_est(later)
Vh2
identical(ShannonVar(later), Vh2$Bioc)    # TRUE



[1] https://stat.ethz.ch/pipermail/r-help/2020-September/468664.html


Hope this helps,

Rui Barradas


?s 09:38 de 10/09/20, Luigi Marongiu escreveu: