LESSON 12: Error bars questions

FOCUS QUESTION: How can I depict uncertainty and variability in data?

Contents


EXAMPLE 1: Load the Fisher iris data (comes with MATLAB)

   load fisheriris;


EXAMPLE 2: Compute the mean and standard deviation of the sepal lengths for 3 species

   sepalLens = reshape(meas(:, 1), 50, 3);  % Make column 1 into a 50 x 3 array
   sLenMeans = mean(sepalLens);  % Calculate the means for the 3 species
   sLenSDs = std(sepalLens);     % Calculate the standard deviations for the 3 species


EXAMPLE 3: Plot mean sepal length using standard deviation (SD) error bars

Create a new cell in which you type and execute:

   fisherTitle = 'Comparison of three species in the Fisher Iris data';
   irisSpecies = {'Setosa', 'Virginica', 'Versicolor'}; % Use for legend
   figure                                      % Label the top
   errorbar(sLenMeans, sLenSDs, 'ks');         % Error bars use black squares
   set(gca, 'XTick', 1:3, 'XTickLabel', irisSpecies) % Set ticks and tick labels
   xlabel('Species of Iris')
   ylabel('Sepal length in mm')
   title(fisherTitle)
   legend('Mean (SD error bars)', 'Location', 'Southeast') % Put in lower right

Questions Answers
What are error bars? Error bars are a visual device that is generally used to convey uncertainty. However, you can choose to use the error bars to represent anything you wish.
What do error bars actually represent in this example? Rather than depicting actual errors, these error bars indicate how widely the sepal lengths are spread around the mean.
What determines the location of the top and bottom of the error bars? The first argument of errorbar specifies the center of the bars. The second argument specifies the distance from the center in either direction.
How many lines and points does errorbar(X, Y) plot? The errorbar function plots a point for each element in X. Each column of X is treated as a data set, and its adjacent points are connected by lines of the same color. The plot shows a different line for each column of X. Note that if X and Y are vectors, errorbar draws one curve, regardless of whether these vectors are row vectors or column vectors.
Must X and Y have the same number of elements for errorbar(X, Y) to work? Yes, X and Y must be the same size.
Where are the error bars for errorbar(x, y) located when x = [10, 30, 20, 10] and y = [5, 2, 3, 1]? The tops of the error bars are located at 15, 32, 23, and 11, respectively. The bottoms of the error bars are located at 5, 28, 17, and 9, respectively. The markers of the error bars are located at 10, 30, 20, and 10, respectively
Where are the error bars for errorbar(x, y) located when x = [10; 30; 20; 10] and y = [5; 2; 3; 1]? The results are the same as in the previous question.
What does 'XTick' designate? The string 'XTick' is an example of a property. Property arguments are always specified in pairs of property name followed by the property value. XTick specifies the locations of the tick marks on the horizontal axis. Its value should be a vector of locations.
What does 'XTickLabel' designate? XTickLabel is a property specifying the labels of tick marks. Often the values of the XTickLabel property is a vector of strings or a cell array of strings. In this example, it is a 1 x 3 cell array with strings naming the three species.
What is gca used for? The gca designates the graphic context of the current axis, allowing us to access and set the axis properties from MATLAB programs.
How many properties can I change with a single call to set? The set function doesn't limit the number of properties that can be specified.
How do I find out what properties are available for setting? Use get(gca) to find out what properties are available for modification.
Will get(gca) give me all the properties of the figure? No, only the properties associated with the current axis are accessible. The figure window itself has its own properties (accessible by get(gcf)). Each axis on a figure with multiple axes has its own properties. In fact each object on the figure has its own properties. We'll access these properties later in the course.
What does 'Location' designate in the legend function? 'Location' is a property specifying where the legend should be placed on the axis. This property is useful for preventing the legend from overlapping with the graph.


EXAMPLE 4: Plot the SD error bars on a bar chart

   figure
   hold on
   bar(sLenMeans, 'FaceColor', [0.5, 0.5, 1])  % Lighter so error bars show up
   errorbar(sLenMeans, sLenSDs, 'ks');            % Error bars use black squares
   set(gca, 'XTick', 1:3, 'XTickLabel', irisSpecies) % Set ticks and tick labels
   xlabel('Species of Iris');
   ylabel('Sepal length in mm')
   title(fisherTitle)
   legend('Mean (SD error bars)', 'Location', 'Northwest') % Put in lower right
   box on                                         % Force box around axes
   hold off

Questions Answers
What happens if I put the errorbar after the bar in this example? The bar chart is not transparent and will cover the bottom of the error bars.
What is 'FaceColor'? FaceColor is a property of bar that specifies the color of the bars. The color value should be a three-element row vector with values between 0 and 1 specifying how much red, green and blue, respectively, should make up the color. The example has a blue component that is twice as big as the red and green components. White corresponds to a vector of 3 ones, while black corresponds to a vector of 3 zeros.


EXAMPLE 5: Compute the standard error of the mean (SEM) for sepal lengths

   numSamples = length(sLenSDs);  % Length along the longest dimension
   sLenSEMs = sLenSDs./sqrt(numSamples);  % Compute the standard error of the mean (SEM)

Questions Answers
What does SEM stand for? SEM is short for standard error of the mean.
How can I interpret SEM? Here is one interpretation. Suppose I think of my data as a sample of an underlying population. A reasonable way to estimate the true mean of the population is to compute the sample mean. However, it is likely that if you take a different sample of this population, you will get a different value for the sample mean. The SEM estimates (from a single sample) the standard deviation of the error between these sample means and the true mean.


EXAMPLE 6: Plot mean sepal length using (SEM) error bars

   figure
   errorbar(sLenMeans, sLenSEMs, 'rd') % Red diamonds
   set(gca, 'XTick', 1:3, 'XTickLabel', irisSpecies) % Set ticks and tick labels
   xlabel('Species of Iris');
   ylabel('Sepal length in mm')
   title(fisherTitle)
   legend('Mean (SEM error bars)', 'Location', 'Southeast')

Questions Answers
Will error bars depicting standard deviation be larger or smaller than error bars depicting SEM? Since SEM is only defined for data sets that have at least two elements and SEM is the standard deviation divided by the square root of the number of elements, the SEM error bars will always be smaller than the standard deviation error bars.
How are SEM error bars different from SD error bars? Standard deviation indicates how widely data values are distributed from the data set mean, while SEM indicates how widely estimates of the true population mean vary from the true population mean.


EXAMPLE 7: Plot SD and SEM error bars on the same graph

   xPositions = [1, 2, 3];  % Use these as base x-axis error bar positions
   figure
   hold on
   errorbar(xPositions-0.1, sLenMeans, sLenSDs, 'g^')  % Green up triangles
   errorbar(xPositions+0.1, sLenMeans, sLenSEMs, 'bv') % Blue down triangles
   set(gca, 'XTick', 1:3, 'XTickLabel', irisSpecies) % Set ticks and tick labels
   xlabel('Species of Iris');
   ylabel('Sepal length in mm')
   title(fisherTitle)
   legend({'Mean (SD error bars)', 'Mean (SEM error bars)'}, 'Location', 'Southeast')
   hold off

Questions Answers
How can I set the x positions of error bars? Call the errorbar function with three arguments: the x coordinates, the y coordinates and the distance above and below.
Why did this example explicitly set the error bar positions? The example compares the SD error bars and SEM error bars for the same data. We wanted to group the bars for each data point together rather than spacing all the bars out evenly.


EXAMPLE 8: Compute the median and inter quartile range (IQR) for sepal lengths

   sLenMedians = median(sepalLens);        % Species median sepal lengths
   sLenIQR = prctile(sepalLens, [25, 75]); % 25th and 75th percentiles

Questions Answers
Does sLengthIQR really represent the inter quartile range? No, the inter quartile range (IQR) is the different between the 75th percentile and the 25th percentile. The sLengthIQR variable contains separate values for each of these items in a column.
Why is sLengthIQR a 2 x 3 array? The sepalLengths variable has 3 columns. The prctile function treats each column as a data set and finds the specified percentiles of each column. The result is column vector of size 2, since we asked for two percentile values. Putting the results together for the three data sets gives the 2x3 result.
What does sLengthIQR(2, 1) represent? The value is the 75th percentile of the first column of sepalLengths. Therefore, sLengthIQR(2, 1) gies the 75th percentile of Iris setosa sepal lengths.


EXAMPLE 9: Plot median sepal length using the inter quartile range (IQR) for error bars

   lowerDist = sLenMedians - sLenIQR(1, :);      % Size of bottom bar
   upperDist = sLenIQR(2, :) - sLenMedians;      % Size of top bar
   figure
   errorbar(xPositions, sLenMedians, lowerDist, upperDist, 'm*') % Magenta asterisks
   set(gca, 'XTick', 1:3, 'XTickLabel', irisSpecies)
   xlabel('Species of Iris');
   ylabel('Sepal length in mm')
   title(fisherTitle)
   legend('Median (IQR error bars)', 'Location', 'Northwest') % Upper left

Questions Answers
Why do the IQR error bars need different values for the distances above and below? The median does not necessarily fall in the center of the IQR range. In other words, the 75th percentile minus the median does not necessarily have the same as the median minus the 25th percentile.


EXAMPLE 10: Calculate the means and standard deviations of all characteristics

    setosa = meas(1:50, :);               % First 50 rows are setosa
    virginica = meas(51:100, :);          % Second 50 rows are viginica
    versicolor = meas(101:150, :);        % Third 50 rows are versicolor
    irisMeans = [mean(setosa); mean(virginica); mean(versicolor)];
    irisSTDs =  [std(setosa); std(virginica); std(versicolor)];


EXAMPLE 11: Draw a grouped bar chart of mean iris characteristics

   irisMeas = {'Sepal length', 'Sepal width', 'Petal length', ...
                               'Petal width'}; % Characteristics for legend
   figure
   bar(irisMeans, 'grouped')  % Rows are groups columns are group members
   set(gca, 'XTickLabel', irisSpecies);
   legend(irisMeas, 'Location', 'Northwest')
   xlabel('Species of Iris');
   ylabel('Mean size in mm')
   title(fisherTitle)

Questions Answers
How does bar group data? The bar function groups data from a single row into a cluster or group. Each row forms a different group. The irisMeans variable is a 3 x 4 array so there are 3 groups of 4 bars each. In this example the groups correspond to species, and the bars within each group correspond to different characteristics.
How could I present the characteristics in separate groups rather than the species? You would need to interchange the roles of rows and columns by taking the transpose: bar(irisMeans', 'grouped').
What is the advantage of grouping by species rather than grouping by characteristic? One approach is not necessarily better than the other. You would group characteristics to emphasize a comparison of species and group species to emphasize a comparison of characteristics.
Could I put error bars on this graph? It is possible, but not straight forward. You would need to acquire or explicitly set the bar x-positions and then use those x values for the error bar positions.


EXAMPLE 12: Plot the means of all characteristics using SD error bars

   figure
   errorbar(irisMeans, irisSTDs, 's') % Use default colors and square markers
   set(gca, 'XTick', 1:3, 'XTickLabel', irisSpecies);
   legend(irisMeas, 'Location', 'Northwest')
   xlabel('Species of Iris');
   ylabel('Mean size in mm')
   title(fisherTitle)

Questions Answers
This graph groups characteristics for each species together. How can I group species for each characteristic? You would need to interchange the roles of rows and columns by taking the transpose: errorbar(irisMeans', irisSTDs', 's').
What would a grouping of species for each characteristic emphasize? This would allow you to better see which characteristics are similar for all species and which are not.


EXAMPLE 13: Plot the means of all characteristics using connected SD error bars

   figure
   hold on
   errorbar(irisMeans(:, 1), irisSTDs(:, 1), ':sk')
   errorbar(irisMeans(:, 2), irisSTDs(:, 2), ':og')
   errorbar(irisMeans(:, 3), irisSTDs(:, 3), ':vb')
   errorbar(irisMeans(:, 4), irisSTDs(:, 4), ':^r')
   set(gca, 'XTick', 1:3, 'XTickLabel', irisSpecies);
   legend(irisMeas, 'Location', 'Northwest')
   xlabel('Species of Iris');
   ylabel('Mean size in mm')
   title(fisherTitle)
   hold off

Questions Answers
Why bother connecting the error bars? The connecting error bars provide a visual compromise between grouping by species and grouping by characteristics. Here we have grouped characteristics for each species, but we connect the individual characteristics to emphasize the difference in behavior across species.



_This lesson was written by Kay A. Robbins of the University of Texas at San Antonio and last modified on 31-Dec-2010. Please contact krobbins@cs.utsa.edu with comments or suggestions. The image is a reproduction of a drawing by James Sowerby that appeared in The Botanical Magazine vol. 1. no. 1 (1792). The drawing is available on Wikipedia at http://en.wikipedia.org/wiki/File:Iris_persica_%28Sowerby%29.jpg._