→ Input dataset is a
matrix where each row is a sample, and each column is a variable. Keep in mind you can transpose a matrix using the
t() function if needed.
→ Clustering is performed on a square matrix (sample x sample) that provides the distance between samples. It can be computed using the
dist() or the
cor() function depending on the question your asking
hclust() function is used to perform the hierarchical clustering
→ Its output can be visualized directly with the
plot() function. See possible customization.
# Dataset data <- matrix( sample(seq(1,2000),200), ncol = 10 ) rownames(data) <- paste0("sample_" , seq(1,20)) colnames(data) <- paste0("variable",seq(1,10)) # Euclidean distance dist <- dist(data[ , c(4:8)] , diag=TRUE) # Hierarchical Clustering with hclust hc <- hclust(dist) # Plot the result plot(hc)
There are several ways to calculate the distance between 2 clusters ( using the max between 2 points of the clusters, or the mean, or the min, or ward (default) ).
It is possible to zoom on a specific part of the tree. Select the group of interest using the