Skip to content

Some code to run Hutcheson t-test using R

10 messages · Rui Barradas, Karl Schilling, Luigi Marongiu

#
Thank you very much for the code, that was very helpful.
I got the article by Hutcheson -- I don't know if I can distribute it
, given the possible copyrights, or if I can attach it here -- but it
does not report numbers directly: it refers to a previous article
counting bird death on a telegraph each year. The numbers
are:
bird_1956 <- c(4,4,190,135,56,3,2,2,1,12,41,201,1,0,131,3)
bird_1957 <- c(4,111,53,66,146,222,126,61,0,2323,21)
bird_1958 <- c(0,3,32,228,56,102,0,11,2,220,0)
bird_1959 <- c(0,0,14,59,26,68,0)
bird_1960 <- c(0,0,73,66,71,25,0,109,63,1)

This for sake of the argument.
As for my problem, I implemented the Shannon index with the package
iNext, which only gives me the index itself and the 95% CI. Even when
I implemented it with vegan, I only got the index. Essentially I don't
have a count of species I could feed into the Hutcheson's. Is there a
way to extract these data? Or to run a Hutcheson's on the final index?
Thank you

On Tue, Sep 8, 2020 at 7:43 PM Karl Schilling
<karl.schilling at uni-bonn.de> wrote:

  
    
#
Update:
I can see that you used the function Shannon from the package QSutils.
This would supplement the iNext package I used and solve the problem.
Thank you.

On Thu, Sep 10, 2020 at 10:35 AM Luigi Marongiu
<marongiu.luigi at gmail.com> wrote:

  
    
#
If you want a function automating Karl's code, here it is. It returns an 
object of S3 class "htest", R's standard for hypothesis tests functions. 
The returned object can then be subset in the usual ways, ht$statistic, 
ht$parameter, ht$p.value, etc.


library(QSutils)

hutcheson.test <- function(x1, x2){
   dataname1 <- deparse(substitute(x1))
   dataname2 <- deparse(substitute(x2))
   method <- "Hutcheson's t-test for Shannon diversity equality"
   alternative <- "the diversities of the two samples are not equal"
   h1 <- Shannon(x1)
   varh1 <- ShannonVar(x1)
   n1 <- sum(x1)
   h2 <- Shannon(x2)
   varh2 <- ShannonVar(x2)
   n2 <- sum(x2)
   degfree <- (varh1 + varh2)**2 / (varh1**2/n1 + varh2**2/n2)
   tstat <- (h1 - h2)/sqrt(varh1 + varh2)
   p.value <- 2*pt(-abs(tstat), df = degfree)
   ht <- list(
     statistic = c(t = tstat),
     parameter = c(df = degfree),
     p.value = p.value,
     alternative = alternative,
     method = method,
     data.name = paste(dataname1, dataname2, sep = ", ")
   )
   class(ht) <- "htest"
   ht
}

earlier <- c(0,0,146,0,5,46,0,1,295,0,0,3,0,0,0,0,0)
later <- c(0,0,142,0,5,46,0,1,246,0,0,3,0,0,0,0,0)

hutcheson.test(earlier, later)



With the data you provided:


bird_1956 <- c(4,4,190,135,56,3,2,2,1,12,41,201,1,0,131,3)
bird_1957 <- c(4,111,53,66,146,222,126,61,0,2323,21)
bird_1958 <- c(0,3,32,228,56,102,0,11,2,220,0)
bird_1959 <- c(0,0,14,59,26,68,0)
bird_1960 <- c(0,0,73,66,71,25,0,109,63,1)

hutcheson.test(bird_1956, bird_1957)




Note that like David said earlier, there might be better ways to 
interpret Shannon's diversity index. If h is the sample's diversity, 
exp(h) gives the number of equally-common species with equivalent 
diversity.


s1 <- Shannon(earlier)
s2 <- Shannon(later)
c(earlier = s1, later = s2)
exp(c(earlier = s1, later = s2))   # Both round to 3
eq_common <- rep(1, 3)             # Can be 1 specimen or any other number
Shannon(eq_common)                 # Slightly greater than the samples' 
diversity


round(exp(sapply(birds, Shannon))) # Your data


#-------------------------------------


Earlier Karl wrote [1] that


Here var(h) is calculated as in ref 1 cited by Rui Barradas - I guess
that explains the minor numerical differences obtained with the code
above and the published variances.


I don't believe the published variances were computed with the published 
variance estimator. The code below computes the variances like QSutils 
and with formula (4) in Hutcheson's paper. The latter does not give the 
same results.

var_est <- function(n){
   s <- length(n)
   N <- sum(n)
   p <- n/N
   i <- p != 0
   inv.p <- numeric(s)
   inv.p[i] <- 1/p[i]
   log.p <- numeric(s)
   log.p[i] <- log(p[i])
   #
   term1 <- (sum(p * log.p^2) - sum(p * log.p)^2)/N
   term2 <- (s - 1)/(2*N^2)
   #
   numer3 <- -1 + sum(inv.p) - sum(inv.p * log.p) + sum(inv.p)*sum(p * 
log.p)
   denom3 <- 6*N^3
   term3 <- numer3/denom3
   list(
     Bioc = term1 + term2,
     Hutch = term1 + term2 + term3
   )
}

Vh1 <- var_est(earlier)
Vh1
all.equal(ShannonVar(earlier), Vh1$Bioc)
ShannonVar(earlier) - Vh1$Bioc            # FAQ 7.31

Vh2 <- var_est(later)
Vh2
identical(ShannonVar(later), Vh2$Bioc)    # TRUE



[1] https://stat.ethz.ch/pipermail/r-help/2020-September/468664.html


Hope this helps,

Rui Barradas


?s 09:38 de 10/09/20, Luigi Marongiu escreveu:
#
Hello,

Sorry, there's an instruction missing. See inline.

?s 11:44 de 10/09/20, Rui Barradas escreveu:
# Create a list with all the data
birds <- mget(ls(pattern = "^bird"))
Hope this helps,

Rui Barradas
#
Hello,
thank you for the code. To explain better, when I used vegan, I did
not count the species directly but simply prepared a dataframe where,
for each species, I counted the number of samples bearing such
species:
```
'data.frame': 3 obs. of  46 variables:
 $ NC_001416 Enterobacteria phage lambda   : int  5 4 5
 $ NC_001623 Autographa californica nucl...: int  7 7 7
 $ NC_001895 Enterobacteria phage P2       : int  1 0 0
 $ NC_004745 Yersinia phage L-413C         : int  1 0 0
```
here the triplettes refer to healthy, tumor and metastasis. The outcome is:
```
# Shannon index
diversity(new_df)
#> Normal     Tumour     Metastasis
#> 2.520139   3.109512   1.890404
```
Using iNext, I provided a list of all the species counted in a samples
```
$Healthy
 [1] 5 7 1 1 1 8 1 1 2 1 2 1 1 1 1 2 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0

$Tumour
 [1] 4 7 0 0 0 7 0 0 1 0 1 0 0 0 0 2 0 0 1 1 1 1 1 2 1 2 1 1 1 1 1 1 1
1 2 1 1 1 1 1 1 1 0 0 0 0

$Metastasis
 [1] 5 7 0 0 0 9 0 0 0 0 0 0 0 0 0 1 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 1 0 0 1 1 1 1
```
```
mod = iNEXT(new_list, q=0, datatype="abundance")
mod$AsyEst
#Site         Diversity Observed Estimator   s.e.    LCL     UCL
#1     Normal  Species richness   18.000    41.368 19.683 23.563 116.155
#2     Normal Shannon diversity   12.430    21.343  5.183 12.430  31.501
#4     Tumour  Species richness   30.000    94.776 42.936 49.848 241.396
#5     Tumour Shannon diversity   22.410    53.135 14.486 24.743  81.526
#7 Metastasis  Species richness   10.000    27.379 22.821 12.443 133.640
#8 Metastasis Shannon diversity    6.622     9.980  3.102  6.622  16.059
```
So here the Shannon index is 12 instead of 2.5...
Using Karl's function, I get:
```
# compute Shannon
norm_sIdx <- Shannon(array(as.numeric(unlist(new_list[1]))))
canc_sIdx <- Shannon(array(as.numeric(unlist(new_list[2]))))
meta_sIdx <- Shannon(array(as.numeric(unlist(new_list[3]))))
norm_var <- ShannonVar(array(as.numeric(unlist(new_list[1]))))
canc_var <- ShannonVar(array(as.numeric(unlist(new_list[2]))))
meta_var <- ShannonVar(array(as.numeric(unlist(new_list[3]))))
norm_sum <- sum(array(as.numeric(unlist(new_list[1]))))
canc_sum <- sum(array(as.numeric(unlist(new_list[2]))))
meta_sum <- sum(array(as.numeric(unlist(new_list[3]))))
# compute Hutcheson
degfree <- (norm_var + canc_var)**2 /(norm_var**2/norm_sum +
canc_var**2 /canc_sum)
test <- (norm_sIdx-canc_sIdx) /sqrt(norm_var + canc_var)
(p <- 2*pt(-abs(test),df= degree))
```
remarkably, the indices are the same as obtained by vegan:
```
[1] 2.520139
[1] 3.109512
[1] 1.890404
```

I tried Rui's function but I got an error, so I wrote it as
```
hutcheson = function(A, B){
  # compute Shannon index, variance and sum of elements
  A_index <- Shannon(A)
  B_index <- Shannon(B)
  A_var <- ShannonVar(A)
  B_var <- ShannonVar(B)
  A_sum <- sum(A)
  B_sum <- sum(B)
  # compute Hutcheson
  DF <- (A_var + B_var)^2 /(A_var^2/A_sum + B_var^2/B_sum)
  test <- (A_index-B_index) /sqrt(A_var + B_var)
  p <- 2*pt(-abs(test),df= DF)
  # closure
  cat("Hutcheson's t-test for Shannon diversity equality\n\tShannon
index first group: ",
      round(A_index, 4), "\n\tShannon index second group: ", round(B_index, 4),
      "\n\tp-value : ", round(p, 4), "\n", sep = "")
  return(p)
}
```
and I got:
```
Hutcheson's t-test for Shannon diversity equality
Shannon index first group: 2.5201
Shannon index second group: 3.1095
p-value : 0.0183
Hutcheson's t-test for Shannon diversity equality
Shannon index first group: 2.5201
Shannon index second group: 1.8904
p-value : 0.0371
Hutcheson's t-test for Shannon diversity equality
Shannon index first group: 3.1095
Shannon index second group: 1.8904
p-value : 0
```
new_list[1]|[2]|[3] refer to healthy, tumor and metastasis. applied to
the original Hutcheson data:
```
Hutcheson's t-test for Shannon diversity equality
Shannon index first group: 1.8429
Shannon index second group: 1.0689
p-value : 0

```
This is to compare two groups at the time. I'll probably have to
compensate for multiple testing...
But if this all OK, then the case is closed.
Thank you
On Thu, Sep 10, 2020 at 1:04 PM Rui Barradas <ruipbarradas at sapo.pt> wrote:

  
    
#
Update:
I also added the confidence interval for the Shannon index:
```
#! Hutcheson's t-test for Shannon diversity equality
# thanks to Karl Schilling and Rui Barradas
hutcheson = function(A, B){
  # compute Shannon index, variance and sum of elements
  A_index <- Shannon(A)
  B_index <- Shannon(B)
  A_var <- ShannonVar(A)
  B_var <- ShannonVar(B)
  A_sum <- sum(A)
  B_sum <- sum(B)
  # compute Hutcheson
  DF <- (A_var + B_var)^2 /(A_var^2/A_sum + B_var^2/B_sum)
  test <- (A_index-B_index) /sqrt(A_var + B_var)
  p <- 2*pt(-abs(test),df= DF)
  if (p < 0.001) {
    P = "<0.001"
  } else {
    P = round(p, 3)
  }
  if (p < 0.001) {
    S = "***"
  } else if (p < 0.01) {
    S = "**"
  } else if (p < 0.05) {
    S = "*"
  } else {
    S = ""
  }
  # closure
  cat("Hutcheson's t-test for Shannon diversity equality\n\tShannon
index first group: \t",
      round(A_index, 3), " (", round((A_index-2*A_var),3), "-",
round((A_index+2*A_var),3),
      ")\n\tShannon index second group: \t",
      round(B_index, 3), " (", round((B_index-2*B_var),3), "-",
round((B_index+2*B_var),3),
      ")\n\tp-value: ", P, " ", S, "\n", sep = "")
  return(p)
}
```
On Thu, Sep 10, 2020 at 2:10 PM Luigi Marongiu <marongiu.luigi at gmail.com> wrote:

  
    
#
Hello,
I have just realized in the original paper, the t test is defined as:
`t = h1-h2 -(?1?2)/(var1-var2)^1/2`. is the term -(?1?2) missing in
your formula? How to calculate ?1?2?
Thank you
On Thu, Sep 10, 2020 at 2:41 PM Luigi Marongiu <marongiu.luigi at gmail.com> wrote:

  
    
#
Actually,
in the working example, Hutcheson himself did not report the term
?1?2: `t0 = h1-h2/(var1-var2)^1/2`. so I think we can live without it.
Case closed.
Thank you

On Fri, Sep 11, 2020 at 11:11 AM Luigi Marongiu
<marongiu.luigi at gmail.com> wrote:

  
    
#
Dear Luigi:

no, ?1?2 is not "missing"

First, it should actually be "?1 - ?2".

And as your usual null-hypothesis when comparing h1 and h2 is that they 
are not different (i.e. ?1 = ?2), the latter term adds up to 0 and may 
be omitted.

Karl Schilling
#
Thank you for the clarification.

On Fri, Sep 11, 2020 at 2:36 PM Karl Schilling
<karl.schilling at uni-bonn.de> wrote: