Dear r-helpers! I want to classify my vegetation data with hierachical cluster analysis. My Dataset consist of Abundance-Values (Braun-Blanquet ordinal scale; ranked) for each plant species and relev?. I found a lot of r-packages dealing with cluster analysis, but none of them is able to calculate a distance measure for ranked data. Podani recommends the use of Goodman and Kruskals' Gamma for the distance. I found the function rcorr.cens (outx=true) of the Hmisc package which should do it. What I don't understand is how to define the input vectors x, y with my vegetation dataset. The other thing how I can use the output of rcorr.cens for a distance measure in the cluster analysis (e.g. in vegan or amap). Any help would be greatly appreciated, Thank you very much, Kim --
rcorr.cens Goodman-Kruskal gamma
5 messages · Kim Vanselow, David Winsemius, Frank E Harrell Jr
I looked at the help page for rcorr.cens and was surprised that function, designed for censored data and taking input as a Surv object, was being considered for that purpose. This posting to r-help may be of interest. John Baron offers a simple implementation that takes its input as (x,y): http://finzi.psych.upenn.edu/R/Rhelp02/archive/19749.html goodman <- function(x,y){ Rx <- outer(x,x,function(u,v) sign(u-v)) Ry <- outer(y,y,function(u,v) sign(u-v)) S1 <- Rx*Ry return(sum(S1)/sum(abs(S1)))} I then read Frank's response to John and it's clear that my impression regarding potential uses of rcorr.cens was too limited. Appears that you could supply a "y" vector to the "S" argument and get more efficient execution.
David Winsemius -- On Mar 9, 2009, at 11:13 AM, Kim Vanselow wrote: > Dear r-helpers! > I want to classify my vegetation data with hierachical cluster > analysis. > My Dataset consist of Abundance-Values (Braun-Blanquet ordinal > scale; ranked) for each plant species and relev?. > I found a lot of r-packages dealing with cluster analysis, but none > of them is able to calculate a distance measure for ranked data. > Podani recommends the use of Goodman and Kruskals' Gamma for the > distance. I found the function rcorr.cens (outx=true) of the Hmisc > package which should do it. > What I don't understand is how to define the input vectors x, y with > my vegetation dataset. The other thing how I can use the output of > rcorr.cens for a distance measure in the cluster analysis (e.g. in > vegan or amap). > Any help would be greatly appreciated, > Thank you very much, > Kim >
David Winsemius wrote:
I looked at the help page for rcorr.cens and was surprised that function, designed for censored data and taking input as a Surv object, was being considered for that purpose. This posting to r-help may be of interest. John Baron offers a simple implementation that takes its input as (x,y): http://finzi.psych.upenn.edu/R/Rhelp02/archive/19749.html goodman <- function(x,y){ Rx <- outer(x,x,function(u,v) sign(u-v)) Ry <- outer(y,y,function(u,v) sign(u-v)) S1 <- Rx*Ry return(sum(S1)/sum(abs(S1)))} I then read Frank's response to John and it's clear that my impression regarding potential uses of rcorr.cens was too limited. Appears that you could supply a "y" vector to the "S" argument and get more efficient execution.
Yes rcorr.cens was designed to handle censored data but works fine with uncensored Y. You may need so specify Surv(Y) but first try just Y. It would be worth testing the execution speed of the two approaches. Frank
Frank E Harrell Jr Professor and Chair School of Medicine
Department of Biostatistics Vanderbilt University
Thanks to David and Frank for the suggestions. With a 2-dimensional input rcorr.cens and John Baron's implementation works good. But I am not able to calculate gamma for a multivariate matrix example: columns=species; rows=releves; the numbers are BB-values (ordinal scale; 1<3 but 3-1 is not necessarily 2) K. ceratoides S. caucasica A. tibeticum A1 3 1 1 A2 0 3 2 A3 1 1 0 A4 2 2 0 A5 0 3 2 B1 1 1 1 B2 4 3 1 I want to calculate a distance matrix with scale unit "Goodman-Kruskals gamma" (instead of classical euclidean, bray curtis, manhattan etc.) which I can use for hierachical cluster analysis (e.g. amap, vegan, cluster) in order to compare the different releves. Further suggestions would be greatly appreciated, Thank you very much, Kim -------- Original-Nachricht --------
Datum: Mon, 09 Mar 2009 13:27:29 -0500 Von: Frank E Harrell Jr <f.harrell at vanderbilt.edu> An: David Winsemius <dwinsemius at comcast.net> CC: Kim Vanselow <Vanselow at gmx.de>, r-help at r-project.org Betreff: Re: [R] rcorr.cens Goodman-Kruskal gamma
David Winsemius wrote:
I looked at the help page for rcorr.cens and was surprised that function, designed for censored data and taking input as a Surv object, was being considered for that purpose. This posting to r-help may be of interest. John Baron offers a simple implementation that takes its input as (x,y): http://finzi.psych.upenn.edu/R/Rhelp02/archive/19749.html goodman <- function(x,y){ Rx <- outer(x,x,function(u,v) sign(u-v)) Ry <- outer(y,y,function(u,v) sign(u-v)) S1 <- Rx*Ry return(sum(S1)/sum(abs(S1)))} I then read Frank's response to John and it's clear that my impression regarding potential uses of rcorr.cens was too limited. Appears that you could supply a "y" vector to the "S" argument and get more efficient execution.
Yes rcorr.cens was designed to handle censored data but works fine with
uncensored Y. You may need so specify Surv(Y) but first try just Y. It
would be worth testing the execution speed of the two approaches.
Frank
--
Frank E Harrell Jr Professor and Chair School of Medicine
Department of Biostatistics Vanderbilt University
Dear r-helpers! I want to classify my vegetation data with hierachical cluster analysis. My Dataset consist of Abundance-Values (Braun-Blanquet ordinal scale; ranked) for each plant species and relev?. I found a lot of r-packages dealing with cluster analysis, but none of them is able to calculate a distance measure for ranked data. Podani recommends the use of Goodman and Kruskals' Gamma for the distance. I found the function rcorr.cens (outx=true) of the Hmisc package which should do it. What I don't understand is how to define the input vectors x, y with my vegetation dataset. The other thing how I can use the output of rcorr.cens for a distance measure in the cluster analysis (e.g. in vegan or amap). Any help would be greatly appreciated, Thank you very much, Kim
Computer Bild Tarifsieger! GMX FreeDSL - Telefonanschluss + DSL f?r nur 17,95 Euro/mtl.!* http://dsl.gmx.de/?ac=OM.AD.PD003K11308T4569a
Kim Vanselow wrote:
Thanks to David and Frank for the suggestions. With a 2-dimensional input rcorr.cens and John Baron's implementation works good. But I am not able to calculate gamma for a multivariate matrix example: columns=species; rows=releves; the numbers are BB-values (ordinal scale; 1<3 but 3-1 is not necessarily 2) K. ceratoides S. caucasica A. tibeticum A1 3 1 1 A2 0 3 2 A3 1 1 0 A4 2 2 0 A5 0 3 2 B1 1 1 1 B2 4 3 1 I want to calculate a distance matrix with scale unit "Goodman-Kruskals gamma" (instead of classical euclidean, bray curtis, manhattan etc.) which I can use for hierachical cluster analysis (e.g. amap, vegan, cluster) in order to compare the different releves. Further suggestions would be greatly appreciated, Thank you very much, Kim -------- Original-Nachricht --------
Datum: Mon, 09 Mar 2009 13:27:29 -0500 Von: Frank E Harrell Jr <f.harrell at vanderbilt.edu> An: David Winsemius <dwinsemius at comcast.net> CC: Kim Vanselow <Vanselow at gmx.de>, r-help at r-project.org Betreff: Re: [R] rcorr.cens Goodman-Kruskal gamma
David Winsemius wrote:
I looked at the help page for rcorr.cens and was surprised that function, designed for censored data and taking input as a Surv object, was being considered for that purpose. This posting to r-help may be of interest. John Baron offers a simple implementation that takes its input as (x,y): http://finzi.psych.upenn.edu/R/Rhelp02/archive/19749.html goodman <- function(x,y){ Rx <- outer(x,x,function(u,v) sign(u-v)) Ry <- outer(y,y,function(u,v) sign(u-v)) S1 <- Rx*Ry return(sum(S1)/sum(abs(S1)))} I then read Frank's response to John and it's clear that my impression regarding potential uses of rcorr.cens was too limited. Appears that you could supply a "y" vector to the "S" argument and get more efficient execution.
Yes rcorr.cens was designed to handle censored data but works fine with
uncensored Y. You may need so specify Surv(Y) but first try just Y. It
would be worth testing the execution speed of the two approaches.
Frank
--
Frank E Harrell Jr Professor and Chair School of Medicine
Department of Biostatistics Vanderbilt University
Dear r-helpers! I want to classify my vegetation data with hierachical cluster analysis. My Dataset consist of Abundance-Values (Braun-Blanquet ordinal scale; ranked) for each plant species and relev?. I found a lot of r-packages dealing with cluster analysis, but none of them is able to calculate a distance measure for ranked data. Podani recommends the use of Goodman and Kruskals' Gamma for the distance. I found the function rcorr.cens (outx=true) of the Hmisc package which should do it. What I don't understand is how to define the input vectors x, y with my vegetation dataset. The other thing how I can use the output of rcorr.cens for a distance measure in the cluster analysis (e.g. in vegan or amap). Any help would be greatly appreciated, Thank you very much, Kim
A function related to that is Hmisc's varclus function which will use Spearman, Pearson, or Hoeffding indexes for similarity measures. Frank
Frank E Harrell Jr Professor and Chair School of Medicine
Department of Biostatistics Vanderbilt University