An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20130609/9b6008f7/attachment.pl>
agnes() in package cluster on R 2.14.1 and R 3.0.1
5 messages · Hugo Varet, Martin Maechler
1 day later
Hugo Varet <varethugo at gmail.com>
on Sun, 9 Jun 2013 11:43:32 +0200 writes:
> Dear R users,
> I discovered something strange using the function agnes() of the cluster
> package on R 3.0.1 and on R 2.14.1. Indeed, the clusterings obtained are
> different whereas I ran exactly the same code.
hard to believe... but ..
> I quickly looked at the source code of the function and I discovered that
> there was an important change: agnes() in R 2.14.1 used a FORTRAN code
> whereas agnes() in R 3.0.1 uses a C code.
well, it does so quite a bit longer, e.g., also in R 2.15.0
> Here is one of the contingency table between R 2.14.1 and R 3.0.1:
> classe.agnTani.2.14.1
> classe.agnTani.3.0.1 1 2 3
> 1 74 0 229
> 2 0 235 0
> 3 120 0 15
> So, I was wondering if it was normal that the C and FORTRAN codes give
> different results?
It's not normal, and I'm pretty sure I have had many many
examples which gave identical results.
Can you provide a reproducible example, please?
If the example is too large [for dput() ], please send me the *.rda
file produced from
save(<your data>, file=<the file I neeed>)
*and* a the exact call to agnes() for your data.
Thank you in advance!
Martin Maechler,
the one you could have e-mailed directly
to using maintainer("cluster") ...
> Best regards,
> Hugo Varet
> [[alternative HTML version deleted]]
^^^^^^^^^^^^^ try to avoid, please ^^^^^^^^^^^^^^^^^
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
yes indeed, please.
1 day later
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20130611/0f54dc00/attachment.pl>
Hugo Varet <varethugo at gmail.com>
on Tue, 11 Jun 2013 15:15:36 +0200 writes:
> Dear Martin,
> Thank you for your answer. Here is the exact call to agnes():
> setwd("E:/Hugo")
> library(cluster)
> load("mydata.rda")
> tableauTani<-dist.binary(mydata, method = 4, diag = FALSE, upper = FALSE)
> resAgnes.Tani<-agnes(tableauTani, diss = inherits(tableauTani,
> "dist"),method = "ward")
> classe.agnTani.3 <- cutree(resAgnes.Tani, 3)
> I'm going to send you the data in a separated e-mail.
Thank you, Hugo, and I got that alright.
I can see that many of the distances are *identical*, because
your data is completely binary.
From experience, I know that this can lead (for some algorithms)
to "arbitrary" decisions in clustering, namely when two
*pairs* of observations / clusters have exactly the same
distance, it is somewhat random which of the pair is "merged" /
"fused" first, in a bottom up hierarchical algorithm such as agnes().
To reproduce your example (above) I need however to know
*where* you got the the dist.binary() function from.
It is not part of standard R nor of the cluster package.
Regards,
Martin
> Regards,
> Hugo
> Le lundi 10 juin 2013, Martin Maechler <maechler at stat.math.ethz.ch> a
> ?crit :
>>>>>>> Hugo Varet <varethugo at gmail.com>
>>>>>>> on Sun, 9 Jun 2013 11:43:32 +0200 writes:
>>
>> > Dear R users,
>> > I discovered something strange using the function agnes() of the
> cluster
>> > package on R 3.0.1 and on R 2.14.1. Indeed, the clusterings
> obtained are
>> > different whereas I ran exactly the same code.
>>
>> hard to believe... but ..
>>
>> > I quickly looked at the source code of the function and I
> discovered that
>> > there was an important change: agnes() in R 2.14.1 used a FORTRAN
> code
>> > whereas agnes() in R 3.0.1 uses a C code.
>>
>> well, it does so quite a bit longer, e.g., also in R 2.15.0
>>
>> > Here is one of the contingency table between R 2.14.1 and R 3.0.1:
>> > classe.agnTani.2.14.1
>> > classe.agnTani.3.0.1 1 2 3
>> > 1 74 0 229
>> > 2 0 235 0
>> > 3 120 0 15
>>
>> > So, I was wondering if it was normal that the C and FORTRAN codes
> give
>> > different results?
>>
>> It's not normal, and I'm pretty sure I have had many many
>> examples which gave identical results.
>>
>> Can you provide a reproducible example, please?
>> If the example is too large [for dput() ], please send me the *.rda
>> file produced from
>> save(<your data>, file=<the file I neeed>)
>> *and* a the exact call to agnes() for your data.
>>
>> Thank you in advance!
>>
>> Martin Maechler,
>> the one you could have e-mailed directly
>> to using maintainer("cluster") ...
>>
>>
>> > Best regards,
>> > Hugo Varet
>>
>> > [[alternative HTML version deleted]]
>> ^^^^^^^^^^^^^ try to avoid, please ^^^^^^^^^^^^^^^^^
>>
>> > ______________________________________________
>> > R-help at r-project.org mailing list
>> > https://stat.ethz.ch/mailman/listinfo/r-help
>> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
>> > and provide commented, minimal, self-contained, reproducible code.
>> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>> yes indeed, please.
>>
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20130612/0eba46e5/attachment.pl>