Error bars give a general idea of how precise a measurement is, or conversely, how far from the reported value the true (error free) value might be.

If the value displayed on your barplot is the result of an aggregation (like the mean value of several data points), you may want to display error bars.

Error bars can be added on a ggplot2 barplot using the geom_errorbar function. Here is an example of its utilization:

 

 

 

The geom_errorbar function


The geom_errorbar function takes at least 4 arguments: x and y are the same as for the barplot. Ymin and Ymax give the position of the bottom and the top of the error bar respectively. You can then customize it with color, width, size, and alpha. A few variations are available to display errorbars as presented above. In any case, note that the values you want to display in the error bar must be computed before making the plot.

 

Standard deviation, Standard error or Confidence Interval?


Three different types of values are commonly used for error bars, sometimes without even specifying which one is used. I believe it is important to understand how they are calculated, since they give very different results (see above). Let’s see what are their definitions and how to calculate them on a simple vector:

  • Standard Deviation (SD). It represents the amount of  dispersion of the variable. Calculated as the root square of the variance:

  • Standard Error (SE). It is the standard deviation of the vector sampling distribution. Calculated as the SD divided by the square root of the sample size. By construction, SE is smaller than SD. With a very big sample size, SE tends toward 0.

  • Confidence Interval (CI). This interval is defined so that there is a specified probability that a value lies within it. It is calculated as t * SE. Where t is the value of the Student’s t-distribution for a specific alpha. Its value is often rounded to 1.96 (its value with a big sample size). If the sample size is huge or the distribution not normal, it is better to calculate the CI using the bootstrap method, however.

 

After this short introduction, here is how to compute these 3 values for each group of your dataset, and use them as error bars on your barplot. As you can see, the differences can greatly influence your conclusions.

 

Don’t hide data


 

 

 

Using a barplot to summarise a group of points (as with the mean) is bad practice, since it hides information. The number of data points per bar is no longer available, and the shape of the distribution is hidden as well.

If you have the information on all the data points, I highly advise that you use a boxplot with jitter, a violin plot or a ridge line plot to compare your groups.

 

 

 

 

 

 

 

Using the arrow function of BaseR


 

 

 

 

It is also possible to add your error bars in base R, using the arrow function. It is however way far complicated, thus I highly advise that you use the ggplot2 version.

 

 

 

#4 Confidence Interval on barplot

Related



4
Leave a Reply

avatar
3 Comment threads
1 Thread replies
1 Followers
 
Most reacted comment
Hottest comment thread
4 Comment authors
SergioandresHoltzDelphine Recent comment authors
  Subscribe  
newest oldest most voted
Notify of
Sergio
Guest

This may be a naive question. Why did you multiply the stdev by 0.196 in the last script?

andres
Guest
andres

Hola me pueden ayudar con la explicación más detallada de el barplot con intervalo de confianza

Delphine
Guest
Delphine

Merci pour la fonction error.bar !