This page explains how to build a dotplot histogram with R, ggplot2 and plotly. This type of visualization in a specific type of histogram. It shows the distribution of a numeric variable. But instead of using bars, each individual observation is represented as a dot.
It particularly makes sense to use interactivity for dotplot histograms: hovering a datapoint will give you more information about its identity.
The idea is to split our numerical variable in several bins, and to calculate the position on the Y axis for each individual unit. Once this new information is available, it is possible to use the geom_point like if it was a scatterplot.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 |
# Library library(tidyverse) library(plotly) # A classic histogram for the iris data set (left) ggplot(iris, aes(x=Sepal.Length)) + geom_histogram() # Transform a litte bit the dataset to make dots don = iris %>% arrange(Sepal.Length) %>% # sort using the numeric variable that interest you mutate(var_rounded = (Sepal.Length+1) - ( (Sepal.Length+1) %% 0.2 ) ) %>% # This attributes a bin to each observation. Here 0.2 is the size of the bin. mutate(y=ave(var_rounded, var_rounded, FUN=seq_along)) # This calculates the position on the Y axis: 1, 2, 3, 4... # Make the plot (middle) ggplot(don, aes(x=var_rounded, y=y) ) + geom_point( size=6, color="skyblue" ) # Improve the plot, and make it interactive (right) don=don %>% mutate(text=paste("ID: ", rownames(iris), "\n", "Sepal Length: ", Sepal.Length, "\n", "Species:: ", Species, sep="" )) p=ggplot(don, aes(x=var_rounded, y=y) ) + geom_point( aes(text=text), size=6, color="skyblue" ) + xlab('Sepal Length') + ylab('# of individual') + theme_classic() + theme( legend.position="none", axis.line.y = element_blank(), axis.text=element_text(size=15) ) p # Use the magic of ggplotly to have an interactive version ggplotly(p, tooltip="text") |
Leave a Reply