LESSON 11: Box plots
FOCUS QUESTION: How can I compare the distributions for data sets that have outliers?
Contents
- DATA FOR THIS LESSON
- SETUP FOR LESSON 11
- EXAMPLE 1: Load the Fisher iris data (comes with MATLAB)
- EXAMPLE 2: Compare the distributions of sepal and petal lengths using box plots
- EXAMPLE 3: Draw a box plot of the sepal lengths by species
- EXAMPLE 4: Draw a notched box plot of the sepal widths
- EXAMPLE 5: Load the data about New York contagious diseases
- EXAMPLE 6: Draw a horizontal box plot of the monthly counts of scaled measles data
- EXAMPLE 7: Draw a compact box plot of the monthly counts of scaled measles data
- EXAMPLE 8: Draw a labeled box plot of the monthly counts of all diseases
- EXAMPLE 9: Load the Daphne Island and Santa Cruz Island beak size data
- EXAMPLE 10: Create a labeled vector of beak sizes for plotting
- EXAMPLE 11: Create a box plot of unequal length data sets using labeled data
- EXAMPLE 12*: Set the box widths to reflect the data set sizes
- SUMMARY OF SYNTAX
DATA FOR THIS LESSON
| File | Description |
fisheriris |
Note: This dataset comes with the MATLAB distribution so you don't have to download it separately. |
NYCDiseases.mat |
|
| DaphneIslandBeaks.txt SantaCruzIslandBeaks.txt |
See http://en.wikipedia.org/wiki/Peter_and_Rosemary_Grant for additional information on the work of Peter and Rosemary Grant. |
SETUP FOR LESSON 11
- Set the Current Directory to Z:\working\MATLAB\Lesson11. (You will need to make a new directory for Lesson11.)
- Download the data file NYCDiseases.mat to your Lesson11 directory.
- Download the data file DaphneIslandBeaks.txt to your Lesson11 directory.
- Download the data file SantaCruzIslandBeaks.txt to your Lesson11 directory.
- Create a new script called Lesson11Script.m. (Use File->New->Blank M-File from the main MATLAB menubar.) You will enter each of the examples in a new cell in this script.
EXAMPLE 1: Load the Fisher iris data (comes with MATLAB)
Create a new cell in which you type and execute:
load fisheriris;
You should see the following 2 variables in your Workspace Browser:
- meas - an array in which each column corresponds to a particular type of measurement and each row corresponds to the 4 measurements for a particular speciman. The different species are combined into a single array.
- species - a cell column vector containing the species designation for the speciman given in the corresponding row of meas. Possible values are 'setosa', 'versicolor', and 'virginica'.
EXAMPLE 2: Compare the distributions of sepal and petal lengths using box plots
Create a new cell in which you type and execute:
lengths = meas(:, [1, 3]); % Define a variable for sepal and petal lengths figure boxplot(lengths, 'Label', {'Sepal', 'Petal'}) % Show boxplots of lengths ylabel('Length in mm') title('Comparison of sepal and petal lengths for Fisher iris data')
You should see the following variable in your Workspace Browser:
- lengths - an array containing 2 columns corresponding to the sepal and petal lengths, respectively.
You should also a Figure Window with a labeled box plot:
Create a new cell right here (beginning of a cell starts with %%). Write MATLAB code to make a boxplot of the sepal widths. (This will be a plot consisting of a single box.
EXAMPLE 3: Draw a box plot of the sepal lengths by species
Create a new cell in which you type and execute:
sepalLens = meas(:, 1); % Define a variable for the sepal length figure boxplot(sepalLens, species) % The species vector specifies the group ylabel('Sepal length in mm') title('Comparison of three species in the Fisher iris data')
You should see the following variable in your Workspace Browser:
- sepalLens - a vector containing the sepal lengths of all 150 specimans
You should also a Figure Window with a labeled box plot:
Create a new cell right here (beginning of a cell starts with %%). Write MATLAB code to make a boxplot of the sepal width separated by species. (You should have three boxplots.)
EXAMPLE 4: Draw a notched box plot of the sepal widths
Create a new cell in which you type and execute:
sepalWidths = meas(:, 2); % Define a variable for the sepal widths figure boxplot(sepalWidths, species, 'notch', 'on') ylabel('Sepal width in mm') title('Comparison of three species in the Fisher iris data')
You should see the following variable in your Workspace Browser:
- sepalWidths - a vector containing the sepal widths of all 150 specimans
You should also a Figure Window with a labeled box plot:
EXAMPLE 5: Load the data about New York contagious diseases
Create a new cell in which you type and execute:
load NYCDiseases.mat; % Load the disease data
You should see 4 variables in the Workspace Browser:
- measles - an array containing the monthly cases of measles
- mumps - an array containing the monthly cases of mumps
- chickenPox - an array containing the monthly cases of chicken pox
- years - a vector containing the years 1931 through 1971
EXAMPLE 6: Draw a horizontal box plot of the monthly counts of scaled measles data
Create a new cell in which you type and execute:
monthLabels = {'Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', ...
'Aug', 'Sep', 'Oct', 'Nov', 'Dec'};
figure
boxplot(measles./1000, monthLabels, 'orientation', 'horizontal')
xlabel('Measles cases in thousands');
ylabel('Month')
title('Measles cases in NYC 1931-1971')
You should see the following variable in your Workspace Browser:
- monthLabels - a cell array with strings corresponding to the 12 months
You should also a Figure Window with a labeled box plot:
EXAMPLE 7: Draw a compact box plot of the monthly counts of scaled measles data
Create a new cell in which you type and execute:
figure boxplot(measles./1000, monthLabels, 'plotstyle', 'compact') xlabel('Month') ylabel('Measles cases in thousands'); title('Measles cases in NYC 1931-1971')
You should a Figure Window with a labeled box plot:
EXAMPLE 8: Draw a labeled box plot of the monthly counts of all diseases
Create a new cell in which you type and execute:
diseases = [measles(:), mumps(:), chickenPox(:)]./1000; figure boxplot(diseases, 'labels', {'measles', 'mumps', 'chickenpox'}); ylabel('Monthly cases in thousands') title('Contagious childhood disease in NYC 1931-1971');
You should see the following variable in your Workspace Browser:
- diseases - an array whose 3 columns correspond to disease counts for measles, mumps and chicken pox respectively.
You should see a Figure Window with a labeled box plot:
EXAMPLE 9: Load the Daphne Island and Santa Cruz Island beak size data
Create a new cell in which you type and execute:
Daphne = load('DaphneIslandBeaks.txt'); SantaCruz = load('SantaCruzIslandBeaks.txt');
You should see the following 2 variables in your Workspace Browser:
- Daphne - a column vector with beak sizes of the Daphne Island finches
- SantaCruz - a column vector with the beak sizes of the Santa Cruz Island finches
EXAMPLE 10: Create a labeled vector of beak sizes for plotting
Create a new cell in which you type and execute:
beakSizes = [Daphne; SantaCruz]; islands = [repmat(' Daphne ', size(Daphne)); repmat('Santa Cruz', size(SantaCruz))];
You should see the following 2 variables in your Workspace Browser:
- beakSizes - a vector containing the beak sizes for all of the birds
- islands - a cell vector with the island designations corresponding to the values in beakSizes
EXAMPLE 11: Create a box plot of unequal length data sets using labeled data
Create a new cell in which you type and execute:
figure boxplot(beakSizes, islands, 'notch', 'on') ylabel('Beak size in mm') title('Geospiza fortis from nearby islands in the Galápagos');
You should see a Figure Window with a labeled box plot:
EXAMPLE 12*: Set the box widths to reflect the data set sizes
Create a new cell in which you type and execute:
boxWidths = 0.75*[length(Daphne)./length(beakSizes), length(SantaCruz)./length(beakSizes)]; figure boxplot(beakSizes, islands, 'widths', boxWidths) ylabel('Beak size in mm') title('Geospiza fortis from nearby islands in the Galápagos')
You should see the following variable in your Workspace browser:
- boxWidths - widths proportional to number of values in each data set
You should see a Figure Window with a labeled box plot:
SUMMARY OF SYNTAX
| MATLAB syntax | Description |
boxplot(X) |
creates a box plot of the values in the array X. Each
column of X is treated as a distinct data set and gets its own
box. The boxplot function has a large number of optional
parameters. We used the following options:
|
repmat(X, n, m) |
creates a new array by tiling the array X in a
pattern with n rows and m columns.
|
repmat(X, size(A)) |
creates a new array by tiling the array X in a
pattern whose size is the same size as the array A
(i.e., the pattern has the same number of rows and columns as
A does).
|
_This lesson was written by Kay A. Robbins of the University of Texas at San Antonio and last modified on 31-Dec-2010. Please contact krobbins@cs.utsa.edu with comments or suggestions. The photo was taken by Danielle Langlois in July 2005 and is available under public license at http://commons.wikimedia.org/wiki/File:Iris_versicolor_3.jpg._