LESSON 10: Histograms
FOCUS QUESTION: How can I understand and compare the distributions of two data sets?
This lesson demonstrates different ways of combining and displaying histograms.
Contents
- DATA FOR THIS LESSON
- SETUP FOR LESSON 10
- EXAMPLE 1: Load the Daphne Island and Santa Cruz Island beak size data
- EXAMPLE 2: Display a histogram of the Daphne Island beak size data
- EXAMPLE 3: Calculate and display a histogram using a bar chart, line graph and stair plot
- EXAMPLE 4: Explore the data cursor feature
- EXAMPLE 5: Use different bin sizes for Daphne Island histograms
- EXAMPLE 6: Compare beak distributions of Daphne and Santa Cruz Islands
- EXAMPLE 7: Calculate histograms with explicit bin positions
- EXAMPLE 8: Use overlapping stair plots to compare scaled distributions
- EXAMPLE 9: Calculate the cumulative distributions for both data sets
- EXAMPLE 10: Plot the cumulative distributions for both data sets
- EXAMPLE 11: Generate "random" numbers from three common probability distributions
- EXAMPLE 12: Display the histograms of the generated distributions
- SUMMARY OF SYNTAX
DATA FOR THIS LESSON
| File | Description |
| DaphneIslandBeaks.txt SantaCruzIslandBeaks.txt |
See http://en.wikipedia.org/wiki/Peter_and_Rosemary_Grant for additional information on the work of Peter and Rosemary Grant. |
SETUP FOR LESSON 10
- Set the Current Directory to Z:\working\MATLAB\Lesson10. (You will need to make a new directory for Lesson10.)
- Download the data file DaphneIslandBeaks.txt to your Lesson10 directory.
- Download the data file SantaCruzIslandBeaks.txt to your Lesson10 directory.
- Create a new script called Lesson10Script.m. (Use File->New->Blank M-File from the main MATLAB menubar.) You will enter each of the examples in a new cell in this script.
EXAMPLE 1: Load the Daphne Island and Santa Cruz Island beak size data
Create a new cell in which you type and execute:
Daphne = load('DaphneIslandBeaks.txt'); SantaCruz = load('SantaCruzIslandBeaks.txt');
You should see the following 2 variables in your Workspace Browser:
- Daphne - a column vector with beak sizes of the Daphne Island finches
- SantaCruz - a column vector with the beak sizes of the Santa Cruz Island finches
- Define a variable called meanDaphne that contains the mean beak size of the Daphne Island finches.
- Define a variable called meanSantaCruz that contains the mean beak size of the Santa Cruz Island finches.
EXAMPLE 2: Display a histogram of the Daphne Island beak size data
Create a new cell in which you type and execute:
nDaphne = length(Daphne); % Find number of Daphne Island finches titleDaphne = ['Daphne Island ground finches (n=' num2str(nDaphne) ')']; figure('Name', titleDaphne); % Create a titled figure window hist(Daphne) % Calculate and plot the histogram xlabel('Beak size in mm'); ylabel('Number of birds'); title(titleDaphne); % Use same title for plot as window
You should see the following 2 variables in your Workspace Browser:
- nDaphne - the number of Daphne Island finches in the study
- titleDaphne - a string to be used for a figure title.
You also should the following labeled and titled plot:
EXAMPLE 3: Calculate and display a histogram using a bar chart, line graph and stair plot
Create a new cell in which you type and execute:
[n, xout] = hist(Daphne); % Calculate histogram but don't display figure hold on bar(xout, n); % Plot histogram using a bar chart plot(xout, n, '-ok') % Plot histogram using a black line graph stairs(xout, n, 'r') % Plot histogram using a red stair plot hold off xlabel('Beak size in mm'); ylabel('Number of birds'); title(titleDaphne); datacursormode on % Turn the data cursor on for exploration
You should see the following 2 variables in your Workspace Browser:
- n - vector with the counts of number of beak sizes at each position of xout
- xout - vector with the positions of the centers of the bins for the histogram
You should the following labeled and titled plot:
EXAMPLE 4: Explore the data cursor feature
- Click on one of the lines or bars in the previous graph.
- Observe the result.
- Turn on or off the data cursor using the data cursor icon in the figure window.
EXAMPLE 5: Use different bin sizes for Daphne Island histograms
Create a new cell in which you type and execute:
figure
subplot(3, 1, 1)
hist(Daphne, 10) % Create a plot a 10-bin histogram
title(titleDaphne) % Put title over topmost graph
legend('10 bins')
ylabel('Birds')
subplot(3, 1, 2)
hist(Daphne, 25) % Create a plot a 25-bin histogram
legend('25 bins')
ylabel('Birds')
subplot(3, 1, 3)
hist(Daphne, 100) % Create a plot a 100-bin histogram
legend('100 bins')
xlabel('Beak size in mm')
ylabel('Birds')
You should see a subplot with three axes aligned vertically:
EXAMPLE 6: Compare beak distributions of Daphne and Santa Cruz Islands
Create a new cell in which you type and execute:
nSantaCruz = length(SantaCruz); % Find number Santa Cruz Island finches figure subplot(1, 2, 1) hist(Daphne) % Histogram of Daphne Island finches title(['Daphne (n=' num2str(nDaphne) ')']) xlabel('Beak size (mm)') ylabel('Number of birds') % Only use one y label for both axes subplot(1, 2, 2) hist(SantaCruz) % Histogram of Santa Cruz Island finches title(['Santa Cruz (n=' num2str(nSantaCruz) ')']) xlabel('Beak size (mm)')
You should see the following variable in your Workspace Browser:
- nSantaCruz - the number of Santa Cruz Island measurements
You should also see a subplot with two side-by-side axes:
EXAMPLE 7: Calculate histograms with explicit bin positions
Create a new cell in which you type and execute:
Calculate bin positions explicitly to encompass range of both data sets
minBeak = min([min(Daphne), min(SantaCruz)]);
maxBeak = max([max(Daphne), max(SantaCruz)]);
xBins = minBeak:0.2:maxBeak; % Bin positions will be 0.2 apart
% Calculate the histograms based on these bins
[nD, xD] = hist(Daphne, xBins); % Histogram of Daphne Island
[nS, xS] = hist(SantaCruz, xBins); % Histogram of Santa Cruz Island
You should see the following variables in your Workspace Browser:
- minBeak - smallest beak size in both data sets
- maxBeak - largest beak size in both data sets
- xBins - vector specifying equally spaced bin positions 0.2 mm apart
- nD- vector with counts of Daphne Island beak sizes at specified bin positions
- xD - vector with bin positions
- nS - vector with counts of Santa Cruz Island beak sizes at specified bin positions
- xS - vector of bin positions
EXAMPLE 8: Use overlapping stair plots to compare scaled distributions
Create a new cell in which you type and execute:
legendString = {['Daphne (n=' num2str(nDaphne) ')'], ...
['Santa Cruz (n=' num2str(nSantaCruz) ')']};
figure
hold on
stairs(xD, nD/sum(nD), 'k'); % Daphne in black
stairs(xS, nS/sum(nS), 'r'); % SC in red
xlabel('Beak size in mm');
ylabel('Fraction of birds with this beak size');
legend(legendString, 'Location', 'NorthWest');
title('Scaled comparison of Daphne and Santa Cruz finches')
hold off
You should see the following variable in your Workspace Browser:
- legendString - string with the legend for the plot
You should see a labeled plot with overlapping stair graphs:
EXAMPLE 9: Calculate the cumulative distributions for both data sets
Create a new cell in which you type and execute:
cumD = cumsum(nD)/sum(nD); % Cumulative distribution of Daphne Island cumS = cumsum(nS)./sum(nS); % Cumulative distribution of Santa Cruz Island
You should see the following two variables in your Workspace Browser:
- cumD - vector with the cumulative probability distribution of Daphne Island beak sizes
- cumS - vector with the cumulative probability distribution of Santa Cruz Island beak sizes
EXAMPLE 10: Plot the cumulative distributions for both data sets
Create a new cell in which you type and execute:
figure
hold on
plot(xD, cumD, 'k');
plot(xS, cumS, 'r');
title('Beak size distributions at two islands in the Galápagos')
xlabel('Beak size (mm)')
ylabel('Fraction of birds with beak size less than x')
legend(legendString, 'Location', 'NorthWest'); % Reuse legend from last example
hold off
You should see a subplot with two side-by-side axes:
EXAMPLE 11: Generate "random" numbers from three common probability distributions
Create a new cell in which you type and execute:
yNormal = random('norm', 0, 1, [10000, 1]); %normal with zero mean and unit sd yUniform = random('unif', -1, 1, [10000,1]); %uniform in the interval [-1, 1] yExponential = random('exp', 1, [10000, 1]); %exponential with mean 1
You should see the following variables in your Workspace Browser:
- yNormal - vector with 10,000 normally distributed "random" values
- yUniform - vector with 10,000 "random" values uniformly distributed in [-1, 1]
- yExponential - vector with 10,000 exponentially distributed "random" values
EXAMPLE 12: Display the histograms of the generated distributions
Create a new cell in which you type and execute:
figure
subplot(3,1,1)
hist(yNormal,50)
title('Normal distribution (mean = 0, sd = 1)')
subplot(3,1,2)
hist(yUniform,50)
title('Uniform distribution (on interval [-1, 1])')
subplot(3,1,3)
hist(yExponential,50)
title('Exponential distribution (mean = 1)')
You should see a subplot with two side-by-side axes:
SUMMARY OF SYNTAX
| MATLAB syntax | Description |
cumsum(A) |
calculates the cumulative sum along the first non-singleton
dimension of the array A. |
cumsum(A, dim) |
calculates the cumulative sum of the array A
along dimension dim. |
hist(x) |
creates a histogram plot of the values in the vector x. |
[n, xout] = hist(x) |
calculates the histogram of the vector x,
but does not plot anything. The n variable contains the counts and
the xout variable contains bin locations. |
random(distName, parameters, [n, m]) |
generates an n x m array of numbers randomly selected from the
specified probability distribution. The parameters
item represents the values of the parameters needed to
define the particular probability distribution. For example,
a normal distribution is specified by its mean and standard deviation.
On the other hand, the exponential distribution is specified
only by its mean. |
stairs(Y) |
plots stair-step graphs of the columns of the array Y
against the positive integers. |
stairs(X, Y) |
plots stair-step graphs of the columns of the array Y
against the columns of the array X. |
_This lesson was written by Kay A. Robbins of the University of Texas at San Antonio and last modified on 31-Dec-2010. Please contact krobbins@cs.utsa.edu with comments or suggestions. The image, Medium Ground Finch Geospiza fortis, Santa Cruz, Galapago taken by Mark Putney. The original source is http://www.flickr.com/photos/putneymark/13516124843/in/set-72157601810082531/. The image is available under common license at http://commons.wikimedia.org/wiki/File:Geospiza_fortis.jpg._