A dendrogram or tree diagram allows to illustrate the hierarchical organisation of several entities. For example, we often use it to make family trees. It is constituted of a root node, which give birth to several nodes that end by giving leaf nodes (the

bottom of the tree). Dendrogram can be made with 2 types of dataset. i/ a numeric matrix where several variables describe the features of individuals. We can then calculate the distance between individuals and cluster them. ii/ A hierarchical

dataset where  the relationship between entities is provided directly. These 2 cases are described below. Note that for clusterization, it is a good practice to provide the corresponding heat map that illustrates the structure.

 

 

Sponsors


 

Dendrogram after clusterization


This part interests you if you want to study the structure of your samples. If you have a numeric matrix you can calculate a distance between each pair of sample using the dist or the cor function. Then the hclust function allows to clusterize the samples. Finally, the plot() function of R recognize this format and build a basic tree, like below:

To apply further customizations, you probably want to use the dendextend library:

Dendrogram from hierarchical data


Hierarchical data are usually stored in edge list data frame or nested data frame. In both case, I strongly advise to use the ggraph library to build your dendrogram from it. It provides all the customization you need, and allows to quickly try other related visualization like circle packing, treemap or network.

The collapsibleTree library is another alternative if you want to build an interactive tree (click on a node to unfold the tree). This is really handy to be inserted in a Rmarkdown document or in a Shiny application. Code here.

#336 Interactive tree. (code)

Related