Skip to content

Hierarchical Cluster Analysis with large dataset

5 messages · Petar Milin, Ranjan Maitra, Bert Gunter +2 more

#
On Sun, 3 Nov 2013 10:42:06 +0100 Petar Milin
<petar.milin at uni-tuebingen.de> wrote:

            
You have two different issues here: size of dataset (number of
observations which prevents storage in memory of the distance matrix)
and number of variables (which does not, but probably hinders reading
in the dataset.

You need to provide more information here: why do you need/want to do
hierarchical clustering, if so, do you only need to use R. What
hardware you have at your disposal, etc.

Depending on your answers to the above, this may well be a research
problem in its own right.

HTH!

Best wishes,
Ranjan

  
    
#
(Offlist, since this is just a personal comment).

I cannot help you -- but it sounds like the sort of thing that you
should look for on the BioconductoR list.

But I wonder how you could possibly interpret the results even if you
could get them. I would think they would be more noise than signal,
and making sense of such a mess would be hopeless. Maybe you need to
rethink your approach.

No need to respond to me, of course.

Cheers,
Bert

On Sun, Nov 3, 2013 at 1:42 AM, Petar Milin
<petar.milin at uni-tuebingen.de> wrote:

  
    
#
Hi,

I think your dataset is too large to be interpretable, but in general
you should check out the cluster package, specifically clara(), which
is intended for use with large data.

Sarah

On Sun, Nov 3, 2013 at 4:42 AM, Petar Milin
<petar.milin at uni-tuebingen.de> wrote: