Share the Gallery !Share on FacebookShare on Google+Tweet about this on TwitterShare on LinkedInEmail this to someone

 

 

 

Once you have understood how to make a basic dendrogram after your hierarchical clustering, you probably want to apply some customization on it. Dendextend is an awesome R library developed by Tal Galili that should suits your needs. This page is largely inspired from its very good vignette.

Before starting to use this library, let’s recall how to do a basic dendrogram from a distance matrix:

The set function


The set function of dendextend allows to modify the attribute of a specific part of the tree. You can custom the ‘cex’, ‘lwd’, ‘col’, ‘lty’ for ‘branches’ and ‘labels’ for example. You can also custom the nodes or the leaf. The code below illustrates this concept:

 

Highlight clusters


The dendextend library has some good functionalities to highlight the tree clusters. You can color branches and label following their cluster attribution, specifying the number of cluster you want. The rect.dendrogram function even allows to highlight one or several specific clusters with a rectangle.

 

Comparing with an expected hierarchy


 

 

 

 

It is a common task to compare the cluster you get with an expected distribution. In the mtcars dataset we used to build our dendrogram, there is an ‘am’ column that is a binary variable. We can check if this variable is consistent with the cluster we got using the colored_bars function.

 

 

 

 

 

 

Comparing 2 dendrograms


 

 

 

It is possible to compare 2 dendrograms using the tanglegram function.

I use it here to illustrate a very important concept: when you calculate your distance matrix and when you run your hierarchical clustering algorithm, you cannot simply use the default options without thinking about what you’re doing. Have a look to the differences between 2 different methods of clusterisation.

 

 

 

 

 

Related



Leave a Reply

Be the First to Comment!

avatar
  Subscribe  
Notify of