Skip to content

Any functions to manipulate (merge, cut, remove) hclust objects? (maybe through phylo?)

6 messages · Martin Maechler, Tal Galili

1 day later
#
TG> Hello all, I'm now working with hclust objects and was
    TG> hoping to perform some basic editing on them like:

    TG>    - Joining = the merging of two hclust objects (so
    TG> they will share one root) - Splicing = So to cut/extract
    TG> a branch out of an hclust object - that by itself will
    TG> be an hclust object.

    TG> I noticed I could extract one element of an hclust
    TG> object by turning it into a dendrogram, but that doesn't
    TG> enable me to turn it back into an hclust object.

Why should you "turn it back" ?
What do you want to use them for

The intent of the "dendrogram" has been that it is more flexible
(and more general) than "hclust" and can be printed, plotted,
manipulated, ... better than hclust ones.

Regards,
Martin Maechler, ETH Zurich
 

    TG> Are there any functions that can aid with this?  Maybe
    TG> through the ape package and the phylo objects?

    TG> Thanks, Tal
#
> Hello Martin,
    > Thank you for replying.

    > I have two needs:

    > 1) To merge two dendrograms into one.

    > 2) To then run cutree on it (which works on hclust, but
    >    not on dendrogram).

Well, but cut() does and is prominently mentioned on the
dendrogram help page (and its examples)

    > I guess that if I knew how to perform both steps I would be able to do what
    > I'm trying to do on my data.
    > If nothing like this currently exists, I guess I'll simply implement a
    > method of cutree for a dendrogram, and see how to merge two dendrograms
    > together.

so you only need to program the merge / join part.

I did not take the time to understand what exactly you mean with
that, but as there is no function to do that with "hclust" either,
I'm convinced you should rather write one for "dendrogram"
indeed; as merge() is already "S3 generic", I'd call it
	merge.dendrogram()

If you end up finding it useful and are willing to write a help
page (including examples!) for it, you may consider donating it
back to the R-project ... ;-)

Regards, Martin
#
> Hello Martin,
    > Thank you for the reference to the "cut" option in the dendrogram help page!
    > I guess I was too focused on looking for a solution to the hclust object
    > then to think that such a method existed for dendrograms.

    > The cut.dendrogram  doesn't solve my problem yet, since what I'm looking for
    > is the output of something like:
    > cutree(hc.object, k = 3)

    > which is a vector indicating to which cluster belongs each item.

indeed; and that's only indirectly the result of  a cut(*, h= .) 
call.

BTW: cutree() internally translates a 
     'h = *' specification into a  'k = *' one.....
  ...
  ...
  which is actually a bit peculiar, as a cut at a given height is well-defined, 
  but a cut into a given number of clusters may *NOT* be well
  defined in the case where two sub branches have the exact same
  height 'h'; such that going from  h  to  'h - eps'  leads to
  addition of *two* new clusters, i.e., a step  k --> k+2  
  such that cutree(*, k+1) is not really well defined.
  The cutree() internal algorithm will use the (somewhat)
  arbitrary order of the merges to define the grouping.

Given all the above, I now tend to think that yes, indeed,
it may be most fruitful to provide
a  as.hclust.dendrogram() method, rather than just implementing
a cut() - based cutree method for dendrograms.

    > And for some reason I can't seem to understand the structure of the
    > dendrogram object using "str".

Yes;  there's a str.dendrogram() method which very nicely
shows the structure of a dendrogram, 
however, if you really want to see the internal structure, you need
  str(unclass( . ))

    > But I'll read some more and write back if I can't solve it.

    > p.s: If I'll succeed in writing something useful, it will be
    > my pleasure and honor to contribute it back to the r-project :)

Cool.
Actually, now I think the merge() is the much easier part than
the cutree() / as.hclust.dendrogram() one.
But also that should not be very hard.

As I'm officially in vacation at the moment, I may have some fun
helping with these...

Martin





    > On Wed, Dec 29, 2010 at 1:49 PM, Martin Maechler <maechler at stat.math.ethz.ch
>> wrote:
>> >>>>> Tal Galili <tal.galili at gmail.com>
    >> >>>>>     on Wed, 29 Dec 2010 13:32:26 +0200 writes:
    >> 
    >> > Hello Martin,
    >> > Thank you for replying.
    >> 
    >> > I have two needs:
    >> 
    >> > 1) To merge two dendrograms into one.
    >> 
    >> > 2) To then run cutree on it (which works on hclust, but
    >> >    not on dendrogram).
    >> 
    >> Well, but cut() does and is prominently mentioned on the
    >> dendrogram help page (and its examples)
    >> 
    >> > I guess that if I knew how to perform both steps I would be able to do
    >> what
    >> > I'm trying to do on my data.
    >> > If nothing like this currently exists, I guess I'll simply implement a
    >> > method of cutree for a dendrogram, and see how to merge two
    >> dendrograms
    >> > together.
    >> 
    >> so you only need to program the merge / join part.
    >> 
    >> I did not take the time to understand what exactly you mean with
    >> that, but as there is no function to do that with "hclust" either,
    >> I'm convinced you should rather write one for "dendrogram"
    >> indeed; as merge() is already "S3 generic", I'd call it
    >> merge.dendrogram()
    >> 
    >> If you end up finding it useful and are willing to write a help
    >> page (including examples!) for it, you may consider donating it
    >> back to the R-project ... ;-)
    >> 
    >> Regards, Martin
    >>