Skip to content

rcorr.cens Goodman-Kruskal gamma

5 messages · Kim Vanselow, David Winsemius, Frank E Harrell Jr

#
Dear r-helpers!
I want to classify my vegetation data with hierachical cluster analysis.
My Dataset consist of Abundance-Values (Braun-Blanquet ordinal scale; ranked) for each plant species and relev?.
I found a lot of r-packages dealing with cluster analysis, but none of them is able to calculate a distance measure for ranked data.
Podani recommends the use of Goodman and Kruskals' Gamma for the distance. I found the function rcorr.cens (outx=true) of the Hmisc package which should do it.
What I don't understand is how to define the input vectors x, y with my vegetation dataset. The other thing how I can use the output of rcorr.cens for a distance measure in the cluster analysis (e.g. in vegan or amap).
Any help would be greatly appreciated,
Thank you very much,
Kim



--
#
I looked at the help page for rcorr.cens and was surprised that  
function, designed for censored data and taking input as a Surv  
object, was being considered for that purpose.  This posting to r-help  
may be of interest. John Baron offers a simple implementation that  
takes its input as (x,y):

http://finzi.psych.upenn.edu/R/Rhelp02/archive/19749.html

goodman <- function(x,y){
   Rx <- outer(x,x,function(u,v) sign(u-v))
   Ry <- outer(y,y,function(u,v) sign(u-v))
   S1 <- Rx*Ry
   return(sum(S1)/sum(abs(S1)))}

I then read Frank's response to John and it's clear that my impression  
regarding potential uses of rcorr.cens was too limited. Appears that  
you could supply a "y" vector to the "S" argument and get more  
efficient execution.
#
David Winsemius wrote:
Yes rcorr.cens was designed to handle censored data but works fine with 
uncensored Y.  You may need so specify Surv(Y) but first try just Y.  It 
would be worth testing the execution speed of the two approaches.

Frank
#
Thanks to David and Frank for the suggestions. With a 2-dimensional input rcorr.cens and John Baron's implementation works good. But I am not able to calculate gamma for a multivariate matrix

example: columns=species; rows=releves; the numbers are BB-values (ordinal scale; 1<3 but 3-1 is not necessarily 2)

   K. ceratoides S. caucasica A. tibeticum
A1    3               1            1
A2    0               3            2
A3    1               1            0
A4    2               2            0
A5    0               3            2
B1    1               1            1
B2    4               3            1

I want to calculate a distance matrix with scale unit "Goodman-Kruskals gamma" (instead of classical euclidean, bray curtis, manhattan etc.) which I can use for hierachical cluster analysis (e.g. amap, vegan, cluster) in order to compare the different releves.
  
Further suggestions would be greatly appreciated,
Thank you very much,
Kim



 
-------- Original-Nachricht --------
Dear r-helpers!
I want to classify my vegetation data with hierachical cluster analysis.
My Dataset consist of Abundance-Values (Braun-Blanquet ordinal scale; ranked) for each plant species and relev?.
I found a lot of r-packages dealing with cluster analysis, but none of them is able to calculate a distance measure for ranked data.
Podani recommends the use of Goodman and Kruskals' Gamma for the distance. I found the function rcorr.cens (outx=true) of the Hmisc package which should do it.
What I don't understand is how to define the input vectors x, y with my vegetation dataset. The other thing how I can use the output of rcorr.cens for a distance measure in the cluster analysis (e.g. in vegan or amap).
Any help would be greatly appreciated,
Thank you very much,
Kim
#
Kim Vanselow wrote:
A function related to that is Hmisc's varclus function which will use 
Spearman, Pearson, or Hoeffding indexes for similarity measures.
Frank