LESSON 12: Error bars

FOCUS QUESTION: How can I depict uncertainty and variability in data?

This lesson discusses various ways of putting error bars on graphs.

In this lesson you will:
  • Understand visual methods for conveying confidence in measurements.
  • Display error bars on different types of charts.
Drawing of Iris persica by James Sowerby c. 1792

Contents


DATA FOR THIS LESSON

File Description
fisheriris
  • This data set contains the famous Fisher iris data set. The data set consists of measurements of 150 flower samples from each of three species of flowers: Iris setosa, Iris virginica, and Iris versicolor. The measurements are in mm.
  • Four features were measured for each sample:
    • The length of the flower sepal
    • The width of the flower sepal
    • The length of the flower petal
    • The width of the flower petal
  • All 150 samples from the Fisher iris data are stored in a single table called meas:
    • The four columns correspond to the four types of measurements: sepal length, sepal width, petal length and petal width, respectively.
    • The first 50 rows contain data for Iris setosa
    • The second 50 rows contain data for Iris virginica
    • The third 50 rows contain data for Iris versicolor.
  • The species information is kept in a separate vector called species.
  • The data is sometimes referred to as Anderson's Iris data in honor of Edgar Anderson, the biologist who collected the data. See http://en.wikipedia.org/wiki/Iris_flower_data_set for additional information.

Note: This dataset comes with the MATLAB distribution so you don't have to download it separately.

SETUP FOR LESSON 12


EXAMPLE 1: Load the Fisher iris data (comes with MATLAB)

Create a new cell in which you type and execute:

   load fisheriris;

You should see the following 2 variables in your Workspace Browser:


EXAMPLE 2: Compute the mean and standard deviation of the sepal lengths for 3 species

Create a new cell in which you type and execute:

   sepalLens = reshape(meas(:, 1), 50, 3);  % Make column 1 into a 50 x 3 array
   sLenMeans = mean(sepalLens);  % Calculate the means for the 3 species
   sLenSDs = std(sepalLens);     % Calculate the standard deviations for the 3 species

You should see the following varibles in your Workspace Browser:

Note: std(sepalLens) is population estimate of standard deviation, not the sample standard deviation.

In the space below: Enter your definitions in this cell and execute the cell to create these variables.


EXAMPLE 3: Plot mean sepal length using standard deviation (SD) error bars

Create a new cell in which you type and execute:

   fisherTitle = 'Comparison of three species in the Fisher Iris data';
   irisSpecies = {'Setosa', 'Virginica', 'Versicolor'}; % Use for legend
   figure                                      % Label the top
   errorbar(sLenMeans, sLenSDs, 'ks');         % Error bars use black squares
   set(gca, 'XTick', 1:3, 'XTickLabel', irisSpecies) % Set ticks and tick labels
   xlabel('Species of Iris')
   ylabel('Sepal length in mm')
   title(fisherTitle)
   legend('Mean (SD error bars)', 'Location', 'Southeast') % Put in lower right

You should see the following 2 variables in your Workspace Browser:

You should see a Figure Window with a labeled error bar plot:


Create a new cell right here (beginning of a cell starts with %%). Write MATLAB code to display the mean petal widths for the three species, showing standard deviation errorbars.


EXAMPLE 4: Plot the SD error bars on a bar chart

Create a new cell in which you type and execute:

   figure
   hold on
   bar(sLenMeans, 'FaceColor', [0.5, 0.5, 1])  % Lighter so error bars show up
   errorbar(sLenMeans, sLenSDs, 'ks');            % Error bars use black squares
   set(gca, 'XTick', 1:3, 'XTickLabel', irisSpecies) % Set ticks and tick labels
   xlabel('Species of Iris');
   ylabel('Sepal length in mm')
   title(fisherTitle)
   legend('Mean (SD error bars)', 'Location', 'Northwest') % Put in lower right
   box on                                         % Force box around axes
   hold off

You should see a Figure Window with a labeled error bar plot:


EXAMPLE 5: Compute the standard error of the mean (SEM) for sepal lengths

Create a new cell in which you type and execute:

   numSamples = length(sepalLens);  % Length along the longest dimension
   sLenSEMs = sLenSDs./sqrt(numSamples); % Compute the standard error of the mean (SEM)

You should see the following 2 variables in your Workspace Browser:

The SEM estimates the standard deviation of sample means from the true population mean.


EXAMPLE 6: Plot mean sepal length using (SEM) error bars

Create a new cell in which you type and execute:

   figure
   errorbar(sLenMeans, sLenSEMs, 'rd') % Red diamonds
   set(gca, 'XTick', 1:3, 'XTickLabel', irisSpecies) % Set ticks and tick labels
   xlabel('Species of Iris');
   ylabel('Sepal length in mm')
   title(fisherTitle)
   legend('Mean (SEM error bars)', 'Location', 'Southeast')

You should see a Figure Window with a labeled error bar plot:


EXAMPLE 7: Plot SD and SEM error bars on the same graph

Create a new cell in which you type and execute:

   xPositions = [1, 2, 3];  % Use these as base x-axis error bar positions
   figure
   hold on
   errorbar(xPositions-0.1, sLenMeans, sLenSDs, 'g^')  % Green up triangles
   errorbar(xPositions+0.1, sLenMeans, sLenSEMs, 'bv') % Blue down triangles
   set(gca, 'XTick', 1:3, 'XTickLabel', irisSpecies) % Set ticks and tick labels
   xlabel('Species of Iris');
   ylabel('Sepal length in mm')
   title(fisherTitle)
   legend({'Mean (SD error bars)', 'Mean (SEM error bars)'}, 'Location', 'Southeast')
   hold off

You should see a Figure Window with two sets of error bars:


EXAMPLE 8: Compute the median and inter quartile range (IQR) for sepal lengths

Create a new cell in which you type and execute:

   sLenMedians = median(sepalLens);        % Species median sepal lengths
   sLenIQR = prctile(sepalLens, [25, 75]); % 25th and 75th percentiles

You should see the following 2 variables in your Workspace Browser:

The rows of sLenIQR correspond to the percentiles, and the columns correspond to the species.


EXAMPLE 9: Plot median sepal length using the inter quartile range (IQR) for error bars

Create a new cell in which you type and execute:

   lowerDist = sLenMedians - sLenIQR(1, :);      % Size of bottom bar
   upperDist = sLenIQR(2, :) - sLenMedians;      % Size of top bar
   figure
   errorbar(xPositions, sLenMedians, lowerDist, upperDist, 'm*') % Magenta asterisks
   set(gca, 'XTick', 1:3, 'XTickLabel', irisSpecies)
   xlabel('Species of Iris');
   ylabel('Sepal length in mm')
   title(fisherTitle)
   legend('Median (IQR error bars)', 'Location', 'Northwest') % Upper left

You should see the following 2 variables in your Workspace Browser:

You should see a Figure Window with median/IQR error bars:


EXAMPLE 10: Calculate the means and standard deviations of all characteristics

Create a new cell in which you type and execute:

    setosa = meas(1:50, :);               % First 50 rows are setosa
    virginica = meas(51:100, :);          % Second 50 rows are viginica
    versicolor = meas(101:150, :);        % Third 50 rows are versicolor
    irisMeans = [mean(setosa); mean(virginica); mean(versicolor)];
    irisSTDs =  [std(setosa); std(virginica); std(versicolor)];

*You should see the following 2 variables in your Workspace Browser:


EXAMPLE 11: Draw a grouped bar chart of mean iris characteristics

Create a new cell in which you type and execute:

   irisMeas = {'Sepal length', 'Sepal width', 'Petal length', ...
                               'Petal width'}; % Characteristics for legend
   figure
   bar(irisMeans, 'grouped')  % Rows are groups columns are group members
   set(gca, 'XTickLabel', irisSpecies);
   legend(irisMeas, 'Location', 'Northwest')
   xlabel('Species of Iris');
   ylabel('Mean size in mm')
   title(fisherTitle)

*You should see the following variable in your Workspace Browser:

You should see a Figure Window with a labeled group bar chart (groups correspond to species):


EXAMPLE 12: Plot the means of all characteristics using SD error bars

Create a new cell in which you type and execute:

   figure
   errorbar(irisMeans, irisSTDs, 's') % Use default colors and square markers
   set(gca, 'XTick', 1:3, 'XTickLabel', irisSpecies);
   legend(irisMeas, 'Location', 'Northwest')
   xlabel('Species of Iris');
   ylabel('Mean size in mm')
   title(fisherTitle)

You should see a Figure Window with multiple labeled error bar plots:


EXAMPLE 13: Plot the means of all characteristics using connected SD error bars

Create a new cell in which you type and execute:

   figure
   hold on
   errorbar(irisMeans(:, 1), irisSTDs(:, 1), ':sk')
   errorbar(irisMeans(:, 2), irisSTDs(:, 2), ':og')
   errorbar(irisMeans(:, 3), irisSTDs(:, 3), ':vb')
   errorbar(irisMeans(:, 4), irisSTDs(:, 4), ':^r')
   set(gca, 'XTick', 1:3, 'XTickLabel', irisSpecies);
   legend(irisMeas, 'Location', 'Northwest')
   xlabel('Species of Iris');
   ylabel('Mean size in mm')
   title(fisherTitle)
   hold off

You should see a Figure Window with multiple labeled error bar plots (with bars representing the same characteristic being connected):


SUMMARY OF SYNTAX

MATLAB syntax Description
errorbar(Y, E) creates a plot of the values of Y similar to plot(Y). The corresponding values in E show error bars at +/- that amount above and below the corresponding values in Y.
errorbar(X, Y, E) creates a plot similar to errorbar(Y, E) except that this function uses the values of X for the x positions rather than using the integers 1, 2, ... .
errorbar(X, Y, L, U) creates a plot similar to errorbar(X, Y, E) except that this function uses the values of L and U to determine the span of the error bars. The L array gives the distances below the corresponding values in Y, and the U array gives the distances above the corresponding values of Y.
Y = prctile(X, p) returns a vector of the percentiles of the vector X. The vector p specifies the percentiles. When X is a 2D array, the i-th row of Y contains the percentiles p(i).
set(gca, PropertyName, PropertyValue) sets a property of the current axis. The PropertyName argument is a string representing a property. The PropertyValue argument gives the value of the property.


_This lesson was written by Kay A. Robbins of the University of Texas at San Antonio and last modified on 31-Dec-2010. Please contact krobbins@cs.utsa.edu with comments or suggestions. The image is a reproduction of a drawing by James Sowerby that appeared in The Botanical Magazine vol. 1. no. 1 (1792). The drawing is available on Wikipedia at http://en.wikipedia.org/wiki/File:Iris_persica_%28Sowerby%29.jpg._