Once you have understood how to make a basic dendrogram after your hierarchical clustering, you probably want to apply some customization on it. Dendextend is an awesome R library developed by Tal Galili that should suits your needs. This page is largely inspired from its very good vignette.

Before starting to use this library, let’s recall how to do a basic dendrogram from a distance matrix:

The set function

The set function of dendextend allows to modify the attribute of a specific part of the tree. You can custom the ‘cex’, ‘lwd’, ‘col’, ‘lty’ for ‘branches’ and ‘labels’ for example. You can also custom the nodes or the leaf. The code below illustrates this concept:


Highlight clusters

The dendextend library has some good functionalities to highlight the tree clusters. You can color branches and label following their cluster attribution, specifying the number of cluster you want. The rect.dendrogram function even allows to highlight one or several specific clusters with a rectangle.


Comparing with an expected hierarchy





It is a common task to compare the cluster you get with an expected distribution. In the mtcars dataset we used to build our dendrogram, there is an ‘am’ column that is a binary variable. We can check if this variable is consistent with the cluster we got using the colored_bars function.







Comparing 2 dendrograms




It is possible to compare 2 dendrograms using the tanglegram function.

I use it here to illustrate a very important concept: when you calculate your distance matrix and when you run your hierarchical clustering algorithm, you cannot simply use the default options without thinking about what you’re doing. Have a look to the differences between 2 different methods of clusterisation.







Leave a Reply

Notify of