LESSON 12: Error bars questions
FOCUS QUESTION: How can I depict uncertainty and variability in data?
Contents
- EXAMPLE 1: Load the Fisher iris data (comes with MATLAB)
- EXAMPLE 2: Compute the mean and standard deviation of the sepal lengths for 3 species
- EXAMPLE 3: Plot mean sepal length using standard deviation (SD) error bars
- EXAMPLE 4: Plot the SD error bars on a bar chart
- EXAMPLE 5: Compute the standard error of the mean (SEM) for sepal lengths
- EXAMPLE 6: Plot mean sepal length using (SEM) error bars
- EXAMPLE 7: Plot SD and SEM error bars on the same graph
- EXAMPLE 8: Compute the median and inter quartile range (IQR) for sepal lengths
- EXAMPLE 9: Plot median sepal length using the inter quartile range (IQR) for error bars
- EXAMPLE 10: Calculate the means and standard deviations of all characteristics
- EXAMPLE 11: Draw a grouped bar chart of mean iris characteristics
- EXAMPLE 12: Plot the means of all characteristics using SD error bars
- EXAMPLE 13: Plot the means of all characteristics using connected SD error bars
EXAMPLE 1: Load the Fisher iris data (comes with MATLAB)
load fisheriris;
EXAMPLE 2: Compute the mean and standard deviation of the sepal lengths for 3 species
sepalLens = reshape(meas(:, 1), 50, 3); % Make column 1 into a 50 x 3 array sLenMeans = mean(sepalLens); % Calculate the means for the 3 species sLenSDs = std(sepalLens); % Calculate the standard deviations for the 3 species
EXAMPLE 3: Plot mean sepal length using standard deviation (SD) error bars
Create a new cell in which you type and execute:
fisherTitle = 'Comparison of three species in the Fisher Iris data'; irisSpecies = {'Setosa', 'Virginica', 'Versicolor'}; % Use for legend figure % Label the top errorbar(sLenMeans, sLenSDs, 'ks'); % Error bars use black squares set(gca, 'XTick', 1:3, 'XTickLabel', irisSpecies) % Set ticks and tick labels xlabel('Species of Iris') ylabel('Sepal length in mm') title(fisherTitle) legend('Mean (SD error bars)', 'Location', 'Southeast') % Put in lower right
| Questions | Answers |
| What are error bars? | Error bars are a visual device that is generally used to convey uncertainty. However, you can choose to use the error bars to represent anything you wish. |
| What do error bars actually represent in this example? | Rather than depicting actual errors, these error bars indicate how widely the sepal lengths are spread around the mean. |
| What determines the location of the top and bottom of the error bars? | The first argument of errorbar specifies
the center of the bars. The second argument specifies the
distance from the center in either direction. |
How many lines and points does errorbar(X, Y) plot? |
The errorbar function plots a point for each
element in X. Each column of X is
treated as a data set, and its adjacent points are connected by lines
of the same color. The plot shows a different line for each column
of X. Note that if X and Y
are vectors, errorbar draws one curve, regardless
of whether these vectors are row vectors or column vectors. |
Must X and Y have
the same number of elements for errorbar(X, Y)
to work? |
Yes, X and Y must be the same
size. |
Where are the error bars for errorbar(x, y)
located when x = [10, 30, 20, 10] and
y = [5, 2, 3, 1]? |
The tops of the error bars are located at 15, 32, 23, and 11, respectively. The bottoms of the error bars are located at 5, 28, 17, and 9, respectively. The markers of the error bars are located at 10, 30, 20, and 10, respectively |
Where are the error bars for errorbar(x, y)
located when x = [10; 30; 20; 10] and
y = [5; 2; 3; 1]? |
The results are the same as in the previous question. |
What does 'XTick' designate? |
The string 'XTick' is an example of a property.
Property arguments are always specified in pairs of property name
followed by the property value. XTick specifies
the locations of the tick marks on the horizontal axis. Its value
should be a vector of locations.
|
What does 'XTickLabel' designate? |
XTickLabel is a property specifying the labels
of tick marks. Often the values of the XTickLabel property
is a vector of strings or a cell array of strings. In this example,
it is a 1 x 3 cell array with strings naming the three species.
|
What is gca used for? |
The gca designates the graphic context of the current
axis, allowing us to access and set the axis
properties from MATLAB programs.
|
How many properties can I change with a single
call to set? |
The set function doesn't limit the number of properties
that can be specified.
|
| How do I find out what properties are available for setting? | Use get(gca) to find out what properties are
available for modification.
|
Will get(gca) give me all the
properties of the figure? |
No, only the properties associated with the current axis
are accessible. The figure window itself has its own properties
(accessible by get(gcf)). Each axis on a figure with
multiple axes has its own properties. In fact each object on the
figure has its own properties. We'll access these properties later
in the course.
|
What does 'Location' designate in the
legend function? |
'Location' is a property
specifying where the legend should be placed on the axis. This property
is useful for preventing the legend from overlapping with the graph.
|
EXAMPLE 4: Plot the SD error bars on a bar chart
figure hold on bar(sLenMeans, 'FaceColor', [0.5, 0.5, 1]) % Lighter so error bars show up errorbar(sLenMeans, sLenSDs, 'ks'); % Error bars use black squares set(gca, 'XTick', 1:3, 'XTickLabel', irisSpecies) % Set ticks and tick labels xlabel('Species of Iris'); ylabel('Sepal length in mm') title(fisherTitle) legend('Mean (SD error bars)', 'Location', 'Northwest') % Put in lower right box on % Force box around axes hold off
| Questions | Answers |
What happens if I put the errorbar
after the bar in this example? |
The bar chart is not transparent and will cover the bottom of the error bars. |
What is 'FaceColor'? |
FaceColor is a property of bar
that specifies the color of the bars. The color value should be
a three-element row vector with values between 0 and 1 specifying
how much red, green and blue, respectively, should make up
the color. The example has a blue component that is twice as
big as the red and green components. White corresponds to a
vector of 3 ones, while black corresponds to a vector of 3 zeros. |
EXAMPLE 5: Compute the standard error of the mean (SEM) for sepal lengths
numSamples = length(sLenSDs); % Length along the longest dimension sLenSEMs = sLenSDs./sqrt(numSamples); % Compute the standard error of the mean (SEM)
| Questions | Answers |
| What does SEM stand for? | SEM is short for standard error of the mean. |
| How can I interpret SEM? | Here is one interpretation. Suppose I think of my data as a sample of an underlying population. A reasonable way to estimate the true mean of the population is to compute the sample mean. However, it is likely that if you take a different sample of this population, you will get a different value for the sample mean. The SEM estimates (from a single sample) the standard deviation of the error between these sample means and the true mean. |
EXAMPLE 6: Plot mean sepal length using (SEM) error bars
figure errorbar(sLenMeans, sLenSEMs, 'rd') % Red diamonds set(gca, 'XTick', 1:3, 'XTickLabel', irisSpecies) % Set ticks and tick labels xlabel('Species of Iris'); ylabel('Sepal length in mm') title(fisherTitle) legend('Mean (SEM error bars)', 'Location', 'Southeast')
| Questions | Answers |
| Will error bars depicting standard deviation be larger or smaller than error bars depicting SEM? | Since SEM is only defined for data sets that have at least two elements and SEM is the standard deviation divided by the square root of the number of elements, the SEM error bars will always be smaller than the standard deviation error bars. |
| How are SEM error bars different from SD error bars? | Standard deviation indicates how widely data values are distributed from the data set mean, while SEM indicates how widely estimates of the true population mean vary from the true population mean. |
EXAMPLE 7: Plot SD and SEM error bars on the same graph
xPositions = [1, 2, 3]; % Use these as base x-axis error bar positions figure hold on errorbar(xPositions-0.1, sLenMeans, sLenSDs, 'g^') % Green up triangles errorbar(xPositions+0.1, sLenMeans, sLenSEMs, 'bv') % Blue down triangles set(gca, 'XTick', 1:3, 'XTickLabel', irisSpecies) % Set ticks and tick labels xlabel('Species of Iris'); ylabel('Sepal length in mm') title(fisherTitle) legend({'Mean (SD error bars)', 'Mean (SEM error bars)'}, 'Location', 'Southeast') hold off
| Questions | Answers |
| How can I set the x positions of error bars? | Call the errorbar function with three arguments:
the x coordinates, the y coordinates and the distance above and
below. |
| Why did this example explicitly set the error bar positions? | The example compares the SD error bars and SEM error bars for the same data. We wanted to group the bars for each data point together rather than spacing all the bars out evenly. |
EXAMPLE 8: Compute the median and inter quartile range (IQR) for sepal lengths
sLenMedians = median(sepalLens); % Species median sepal lengths sLenIQR = prctile(sepalLens, [25, 75]); % 25th and 75th percentiles
| Questions | Answers |
Does sLengthIQR really represent the
inter quartile range? |
No, the inter quartile range (IQR) is the different between
the 75th percentile and the 25th percentile. The
sLengthIQR variable contains separate values for
each of these items in a column. |
Why is sLengthIQR a 2 x 3 array? |
The sepalLengths variable has 3 columns. The
prctile function treats each column as a data set
and finds the specified percentiles of each column. The result is
column vector of size 2, since we asked for two percentile values.
Putting the results together for the three data sets gives the
2x3 result. |
What does sLengthIQR(2, 1) represent? |
The value is the 75th percentile of the first
column of sepalLengths. Therefore, sLengthIQR(2, 1)
gies the 75th percentile of Iris setosa sepal lengths. |
EXAMPLE 9: Plot median sepal length using the inter quartile range (IQR) for error bars
lowerDist = sLenMedians - sLenIQR(1, :); % Size of bottom bar upperDist = sLenIQR(2, :) - sLenMedians; % Size of top bar figure errorbar(xPositions, sLenMedians, lowerDist, upperDist, 'm*') % Magenta asterisks set(gca, 'XTick', 1:3, 'XTickLabel', irisSpecies) xlabel('Species of Iris'); ylabel('Sepal length in mm') title(fisherTitle) legend('Median (IQR error bars)', 'Location', 'Northwest') % Upper left
| Questions | Answers |
| Why do the IQR error bars need different values for the distances above and below? | The median does not necessarily fall in the center of the IQR range. In other words, the 75th percentile minus the median does not necessarily have the same as the median minus the 25th percentile. |
EXAMPLE 10: Calculate the means and standard deviations of all characteristics
setosa = meas(1:50, :); % First 50 rows are setosa virginica = meas(51:100, :); % Second 50 rows are viginica versicolor = meas(101:150, :); % Third 50 rows are versicolor irisMeans = [mean(setosa); mean(virginica); mean(versicolor)]; irisSTDs = [std(setosa); std(virginica); std(versicolor)];
EXAMPLE 11: Draw a grouped bar chart of mean iris characteristics
irisMeas = {'Sepal length', 'Sepal width', 'Petal length', ...
'Petal width'}; % Characteristics for legend
figure
bar(irisMeans, 'grouped') % Rows are groups columns are group members
set(gca, 'XTickLabel', irisSpecies);
legend(irisMeas, 'Location', 'Northwest')
xlabel('Species of Iris');
ylabel('Mean size in mm')
title(fisherTitle)
| Questions | Answers |
How does bar group data? |
The bar function groups data from a single row
into a cluster or group. Each row forms a different group. The
irisMeans variable is a 3 x 4 array so there are 3
groups of 4 bars each. In this example the groups correspond to
species, and the bars within each group correspond to different
characteristics.
|
| How could I present the characteristics in separate groups rather than the species? | You would need to interchange the roles of rows and columns
by taking the transpose: bar(irisMeans', 'grouped'). |
| What is the advantage of grouping by species rather than grouping by characteristic? | One approach is not necessarily better than the other. You would group characteristics to emphasize a comparison of species and group species to emphasize a comparison of characteristics. |
| Could I put error bars on this graph? | It is possible, but not straight forward. You would need to acquire or explicitly set the bar x-positions and then use those x values for the error bar positions. |
EXAMPLE 12: Plot the means of all characteristics using SD error bars
figure errorbar(irisMeans, irisSTDs, 's') % Use default colors and square markers set(gca, 'XTick', 1:3, 'XTickLabel', irisSpecies); legend(irisMeas, 'Location', 'Northwest') xlabel('Species of Iris'); ylabel('Mean size in mm') title(fisherTitle)
| Questions | Answers |
| This graph groups characteristics for each species together. How can I group species for each characteristic? | You would need to interchange the roles of rows and columns
by taking the transpose: errorbar(irisMeans', irisSTDs', 's'). |
| What would a grouping of species for each characteristic emphasize? | This would allow you to better see which characteristics are similar for all species and which are not. |
EXAMPLE 13: Plot the means of all characteristics using connected SD error bars
figure hold on errorbar(irisMeans(:, 1), irisSTDs(:, 1), ':sk') errorbar(irisMeans(:, 2), irisSTDs(:, 2), ':og') errorbar(irisMeans(:, 3), irisSTDs(:, 3), ':vb') errorbar(irisMeans(:, 4), irisSTDs(:, 4), ':^r') set(gca, 'XTick', 1:3, 'XTickLabel', irisSpecies); legend(irisMeas, 'Location', 'Northwest') xlabel('Species of Iris'); ylabel('Mean size in mm') title(fisherTitle) hold off
| Questions | Answers |
| Why bother connecting the error bars? | The connecting error bars provide a visual compromise between grouping by species and grouping by characteristics. Here we have grouped characteristics for each species, but we connect the individual characteristics to emphasize the difference in behavior across species. |
_This lesson was written by Kay A. Robbins of the University of Texas at San Antonio and last modified on 31-Dec-2010. Please contact krobbins@cs.utsa.edu with comments or suggestions. The image is a reproduction of a drawing by James Sowerby that appeared in The Botanical Magazine vol. 1. no. 1 (1792). The drawing is available on Wikipedia at http://en.wikipedia.org/wiki/File:Iris_persica_%28Sowerby%29.jpg._