set.seed(1000)
t9 <- cor(t(x), method="pearson")+1 # here i add 1
t8 <- as.dist(t9)
t7 <- cutree(hclust(t8), 4)
cluster.stats(t8, t7)$avg.silwidth
set.seed(1000)
t9 <- cor(t(x), method="pearson") # here I did not add 1
t8 <- as.dist(t9)
t7 <- cutree(hclust(t8), 4)
cluster.stats(t8, t7)$avg.silwidth
[1] -0.09543089
On 10/18/06, Weiwei Shi <helprhelp at gmail.com> wrote:
Dear Chris:
thanks for the prompt reply!
You are right, dist from pearson has negatives there, which I should
use cor+1 in my case (since negatively correlated genes should be
considered farthest). Thanks.
as to the ?cluster.stats, I double-checked it and I found I need to
restart my JGR, until then the help page function starts to accept
newly loaded package, like fpc for this case.
sorry for the confusion,
weiwei
On 10/18/06, Christian Hennig <chrish at stats.ucl.ac.uk> wrote:
btw, ?cluster.stats does not work on my Mac machine.
_
platform i386-apple-darwin8.6.1
arch i386
os darwin8.6.1
system i386, darwin8.6.1
status
major 2
minor 3.1
year 2006
month 06
day 01
svn rev 38247
language R
version.string Version 2.3.1 (2006-06-01)
Because I don't have access to a Mac, I can't tell you anything about
this, unfortunately.
I always thought that my package should work on all platforms if it
all the standard tests for packages?
(Is there anyone else who could comment on this please?)
I have a sample like this
[1] 142 28
and I want to cluster rows;
first of all, I followed the examples for cluster.stats() by
d.dd <- dist(dd.df) # use Euclidean
d.4 <- cutree(hclust(d.dd), 4) # 4 clusters I tried
cluster.stats(d.dd, d.4) # gives me some results like this:
$cluster.size
[1] 133 5 2 2
$avg.silwidth
[1] 0.9857916
but when I tried to use pearson dist here, by visualization, i think 4
or 5 clusters are good for pearson dist, but it gave me a very bad
avg.siqlwidth
d.dd <- as.dist(cor(t(x),method="pearson")) # is This correct?
$cluster.size
[1] 86 31 6 19
$avg.silwidth
[1] -0.09543089
cor can give negative values, which doesn't fit the usual definition
of a distance. I don't know what as.dist does in this case, but I think
that, depending on your application, you should rather use the absolute
value of the correlation, or 1+cor.
btw, what's $seperation? where can I find the detailed explanation on
the output from cluster.stats?
This is documented on the cluster.stats help page:
separation: vector of clusterwise minimum distances of a point in the
cluster to a point of another cluster.
Best regards,
Christian
*** --- ***
Christian Hennig
University College London, Department of Statistical Science
Gower St., London WC1E 6BT, phone +44 207 679 1698
chrish at stats.ucl.ac.uk, www.homepages.ucl.ac.uk/~ucakche
--
Weiwei Shi, Ph.D
Research Scientist
GeneGO, Inc.
"Did you always know?"
"No, I did not. But I believed..."
---Matrix III
--
Weiwei Shi, Ph.D
Research Scientist
GeneGO, Inc.
"Did you always know?"
"No, I did not. But I believed..."
---Matrix III