Error bars give a general idea of how precise a measurement is, or conversely, how far from the reported value the true (error free) value might be.

If the value displayed on your barplot is the result of an aggregation (like the mean value of several data points), you have to display error bars.

Error bars can be added on a ggplot2 barplot using the geom_errorbar function. Here is an example of its utilization:

 

 

 

The geom_errorbar function


The geom_errorbar function takes at least 4 arguments: x and y are the same as for the barplot. Ymin and Ymax give the position of the bottom and the top of the error bar respectively. You can then custom it with color, width, size, and alpha. A few function are really close from the error bar version: here is a presentation of them. In any case, note that the values you want to display in the error bar must be computed before making the plot.

 

Standard deviation, Standard error or Confidence Interval?


Three different types of value are commonly used for error bars, sometimes without even specifying which one is used. I believe it is really important to understand how they are calculated, since they give really different result (see above). Let’s see what is their definition and how to calculate them on a simple vector:

  • Standard Deviation (SD). It represents the amount of  dispersion of the variable. Calculated as the root square of the variance:

  • Standard Error (SE). It is the standard deviation of the vector sampling distribution. Calculated as the SD divided by the square root of the sample size. By construction, SE is smaller than SD. With a very big sample size, SE tends toward 0.

  • Confidence Interval (CI). This interval is defined as that there is a specified probability that a value lies within it. It is calculated as t * SE. Where t is the value of the Student’s t-distribution for a specific alpha. t is often rounded to 1.96 (its value with a big sample size). If the sample size is huge or the distribution not normal, it is better to calculate CI using bootstrap tough.

 

After this short introduction, here is how to compute these 3 values for each group of your dataset, and use them as error bar on your barplot. As you can see the difference is really important and can greatly influence your conclusions.

 

Don’t hide data


 

 

 

Using a barplot to summarise a group of points is a bad practice since it hides an information. The number of data per bar is not available anymore, the distribution of the variable is hidden as well.

If you have the information of all the data points, I highly advise to use a boxplot with jitter, a violin plot or a ridge line plot to compare your groups.

 

 

 

 

 

 

 

Using the arrow function of BaseR


 

 

 

 

It is also possible to add your error bars in base R, using the arrow function. It is however way more complicated, thus I highly advise to use the ggplot2 version.

 

 

 

#4 Confidence Interval on barplot

Related



Leave a Reply

3 Comments on "#4 Barplot with error bar"

avatar
  Subscribe  
newest oldest most voted
Notify of
andres
Guest

Hola me pueden ayudar con la explicación más detallada de el barplot con intervalo de confianza

Delphine
Guest

Merci pour la fonction error.bar !