Some code to run Hutcheson t-test using R

Thu, Sep 10, 2020 3:44 AM

If you want a function automating Karl's code, here it is. It returns an 
object of S3 class "htest", R's standard for hypothesis tests functions. 
The returned object can then be subset in the usual ways, ht$statistic, 
ht$parameter, ht$p.value, etc.


library(QSutils)

hutcheson.test <- function(x1, x2){
   dataname1 <- deparse(substitute(x1))
   dataname2 <- deparse(substitute(x2))
   method <- "Hutcheson's t-test for Shannon diversity equality"
   alternative <- "the diversities of the two samples are not equal"
   h1 <- Shannon(x1)
   varh1 <- ShannonVar(x1)
   n1 <- sum(x1)
   h2 <- Shannon(x2)
   varh2 <- ShannonVar(x2)
   n2 <- sum(x2)
   degfree <- (varh1 + varh2)**2 / (varh1**2/n1 + varh2**2/n2)
   tstat <- (h1 - h2)/sqrt(varh1 + varh2)
   p.value <- 2*pt(-abs(tstat), df = degfree)
   ht <- list(
     statistic = c(t = tstat),
     parameter = c(df = degfree),
     p.value = p.value,
     alternative = alternative,
     method = method,
     data.name = paste(dataname1, dataname2, sep = ", ")
   )
   class(ht) <- "htest"
   ht
}

earlier <- c(0,0,146,0,5,46,0,1,295,0,0,3,0,0,0,0,0)
later <- c(0,0,142,0,5,46,0,1,246,0,0,3,0,0,0,0,0)

hutcheson.test(earlier, later)



With the data you provided:


bird_1956 <- c(4,4,190,135,56,3,2,2,1,12,41,201,1,0,131,3)
bird_1957 <- c(4,111,53,66,146,222,126,61,0,2323,21)
bird_1958 <- c(0,3,32,228,56,102,0,11,2,220,0)
bird_1959 <- c(0,0,14,59,26,68,0)
bird_1960 <- c(0,0,73,66,71,25,0,109,63,1)

hutcheson.test(bird_1956, bird_1957)




Note that like David said earlier, there might be better ways to 
interpret Shannon's diversity index. If h is the sample's diversity, 
exp(h) gives the number of equally-common species with equivalent 
diversity.


s1 <- Shannon(earlier)
s2 <- Shannon(later)
c(earlier = s1, later = s2)
exp(c(earlier = s1, later = s2))   # Both round to 3
eq_common <- rep(1, 3)             # Can be 1 specimen or any other number
Shannon(eq_common)                 # Slightly greater than the samples' 
diversity


round(exp(sapply(birds, Shannon))) # Your data


#-------------------------------------


Earlier Karl wrote [1] that


Here var(h) is calculated as in ref 1 cited by Rui Barradas - I guess
that explains the minor numerical differences obtained with the code
above and the published variances.


I don't believe the published variances were computed with the published 
variance estimator. The code below computes the variances like QSutils 
and with formula (4) in Hutcheson's paper. The latter does not give the 
same results.

var_est <- function(n){
   s <- length(n)
   N <- sum(n)
   p <- n/N
   i <- p != 0
   inv.p <- numeric(s)
   inv.p[i] <- 1/p[i]
   log.p <- numeric(s)
   log.p[i] <- log(p[i])
   #
   term1 <- (sum(p * log.p^2) - sum(p * log.p)^2)/N
   term2 <- (s - 1)/(2*N^2)
   #
   numer3 <- -1 + sum(inv.p) - sum(inv.p * log.p) + sum(inv.p)*sum(p * 
log.p)
   denom3 <- 6*N^3
   term3 <- numer3/denom3
   list(
     Bioc = term1 + term2,
     Hutch = term1 + term2 + term3
   )
}

Vh1 <- var_est(earlier)
Vh1
all.equal(ShannonVar(earlier), Vh1$Bioc)
ShannonVar(earlier) - Vh1$Bioc            # FAQ 7.31

Vh2 <- var_est(later)
Vh2
identical(ShannonVar(later), Vh2$Bioc)    # TRUE



[1] https://stat.ethz.ch/pipermail/r-help/2020-September/468664.html


Hope this helps,

Rui Barradas


?s 09:38 de 10/09/20, Luigi Marongiu escreveu:

Update:
I can see that you used the function Shannon from the package QSutils.
This would supplement the iNext package I used and solve the problem.
Thank you.

On Thu, Sep 10, 2020 at 10:35 AM Luigi Marongiu
<marongiu.luigi at gmail.com> wrote:

Thank you very much for the code, that was very helpful.
I got the article by Hutcheson -- I don't know if I can distribute it
, given the possible copyrights, or if I can attach it here -- but it
does not report numbers directly: it refers to a previous article
counting bird death on a telegraph each year. The numbers
are:
bird_1956 <- c(4,4,190,135,56,3,2,2,1,12,41,201,1,0,131,3)
bird_1957 <- c(4,111,53,66,146,222,126,61,0,2323,21)
bird_1958 <- c(0,3,32,228,56,102,0,11,2,220,0)
bird_1959 <- c(0,0,14,59,26,68,0)
bird_1960 <- c(0,0,73,66,71,25,0,109,63,1)

This for sake of the argument.
As for my problem, I implemented the Shannon index with the package
iNext, which only gives me the index itself and the 95% CI. Even when
I implemented it with vegan, I only got the index. Essentially I don't
have a count of species I could feed into the Hutcheson's. Is there a
way to extract these data? Or to run a Hutcheson's on the final index?
Thank you

On Tue, Sep 8, 2020 at 7:43 PM Karl Schilling
<karl.schilling at uni-bonn.de> wrote:

Dear Luigi,

below some code I cobbled together based on the Hutcheson paper you
mentioned. I was lucky to find code to calculate h and, importantly, its
variance in the R-package QSutils - you may find it on the Bioconductor
website.

here is the code, along with an example. I also attach the code as an
R-file.

Hope that helps.
All my best

Karl
PS don't forget to adjust for multiple testing if you compare more than
two groups.
K


# load package needed
# QSutils is on Bioconductor
library(QSutils)

# here some exemplary data - these are the data from Pilou 1966 that are
used
# in the second example of Hutcheson, J theor Biol 129:151-154 (1970)

earlier <- c(0,0,146,0,5,46,0,1,295,0,0,3,0,0,0,0,0)
later <- c(0,0,142,0,5,46,0,1,246,0,0,3,0,0,0,0,0)
# numbers of the first example used by Hutcheson were unfortunately not
# available to me

# here starts the code ; you may replace the variables "earlier" and "later"
# by your own numbers.

# calculate h, var(h) etc
h1 <- Shannon(earlier)
varh1 <- ShannonVar(earlier)
n1 <- sum (earlier)
h2 <- Shannon(later)
varh2 <- ShannonVar(later)
n2 <- sum (later)
degfree <- (varh1 + varh2)**2 /(varh1**2/n1 + varh2**2 /n2)

# compare numbers with those in the paper
h1
h2
varh1
varh2
# I assume that minor numerical differences are due to differences in the
# numerical precision of computers in the early seventies and today / KS

# this is the actual t-test
t <- (h1-h2) /sqrt(varh1 + varh2)
p <- 2*pt(-abs(t),df= degfree)
p

# that's it
# Best
# Karl
--
Karl Schilling, MD
Professor of Anatomy and Cell Biology
Anatomisches Institut
Rheinische Friedrich-Wilhelms-Universit?t
Nussallee 10

D-53115 Bonn
Germany

phone ++49-228-73-2602


--
Best regards,
Luigi

Some code to run Hutcheson t-test using R

Thread (10 messages)