An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20131103/5ffca321/attachment.pl>
Hierarchical Cluster Analysis with large dataset
5 messages · Petar Milin, Ranjan Maitra, Bert Gunter +2 more
On Sun, 3 Nov 2013 10:42:06 +0100 Petar Milin
<petar.milin at uni-tuebingen.de> wrote:
Hello! Can anyone give me advice on running Hierarchical Cluster Analysis on large datasets? For example, 80000x10000. Calculating distances on such a dataframe seems impossible even on very powerful computer. Also, any other advice that would lead to reduction of dimensionality, i.e., cluster/group variables would be more than welcomed.
You have two different issues here: size of dataset (number of observations which prevents storage in memory of the distance matrix) and number of variables (which does not, but probably hinders reading in the dataset. You need to provide more information here: why do you need/want to do hierarchical clustering, if so, do you only need to use R. What hardware you have at your disposal, etc. Depending on your answers to the above, this may well be a research problem in its own right. HTH! Best wishes, Ranjan
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Important Notice: This mailbox is ignored: e-mails are set to be deleted on receipt. Please respond to the mailing list if appropriate. For those needing to send personal or professional e-mail, please use appropriate addresses. ____________________________________________________________ FREE 3D MARINE AQUARIUM SCREENSAVER - Watch dolphins, sharks & orcas on your desktop!
(Offlist, since this is just a personal comment). I cannot help you -- but it sounds like the sort of thing that you should look for on the BioconductoR list. But I wonder how you could possibly interpret the results even if you could get them. I would think they would be more noise than signal, and making sense of such a mess would be hopeless. Maybe you need to rethink your approach. No need to respond to me, of course. Cheers, Bert On Sun, Nov 3, 2013 at 1:42 AM, Petar Milin
<petar.milin at uni-tuebingen.de> wrote:
Hello!
Can anyone give me advice on running Hierarchical Cluster Analysis on large
datasets? For example, 80000x10000. Calculating distances on such a
dataframe seems impossible even on very powerful computer.
Also, any other advice that would lead to reduction of dimensionality,
i.e., cluster/group variables would be more than welcomed.
Many thanks,
PM
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Bert Gunter Genentech Nonclinical Biostatistics (650) 467-7374
Hi, I think your dataset is too large to be interpretable, but in general you should check out the cluster package, specifically clara(), which is intended for use with large data. Sarah On Sun, Nov 3, 2013 at 4:42 AM, Petar Milin
<petar.milin at uni-tuebingen.de> wrote:
Hello! Can anyone give me advice on running Hierarchical Cluster Analysis on large datasets? For example, 80000x10000. Calculating distances on such a dataframe seems impossible even on very powerful computer. Also, any other advice that would lead to reduction of dimensionality, i.e., cluster/group variables would be more than welcomed. Many thanks, PM
Sarah Goslee http://www.functionaldiversity.org
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20131104/d2c93701/attachment.pl>