LESSON 13: Depicting confidence and significance
FOCUS QUESTION: How reliable is my estimate of population mean?
This lesson introduces some basic techniques for depicting confidence in estimating the mean of a population. The goal is for you to gain a working knowledge of how to apply these tools in standard situations. You will learn the theoretical underpinning of the techniques in your statistics courses.
Contents
- DATA FOR THIS LESSON
- SETUP FOR LESSON 13
- EXAMPLE 1: Load the Fisher iris data
- EXAMPLE 2: Compute the ends of the 95% confidence intervals for sepal length
- EXAMPLE 3: Output population estimates for Iris setosa
- EXAMPLE 4: Plot mean sepal length using 95% confidence interval error bars
- EXAMPLE 5: Plot mean sepal length with SEM and 95% confidence interval error bars
- EXAMPLE 6: Load the circadian rhythm data from Lab 1
- EXAMPLE 7: Estimate hourly temperature mean and 95% CI (horse 1)
- EXAMPLE 8: Plot hourly mean temperature using 95% CI error bars (1 horse)
- EXAMPLE 9: Estimate hourly temperature mean and 95% CI (5 horses)
- EXAMPLE 10: Plot hourly mean temperature using 95% CI error bars (5 horses)
- EXAMPLE 11: Show results from 1 horse and 5 horses on same graph
DATA FOR THIS LESSON
| File | Description |
fisheriris |
Note: This dataset comes with the MATLAB distribution so you don't have to download it separately. |
circadian.mat |
The circadian rhythm of body temperature of the horse G. Piccione, G. Caola, and R. Refinetti Biological Rhythm Research, 33(1):113-119, 2002. |
SETUP FOR LESSON 13
- Set the Current Directory to Z:\working\MATLAB\Lesson13. (You will need to make a new directory for Lesson13.)
- Download the data file circadian.mat to your Lesson13 directory.
- Create a new script called Lesson13Script.m. (Use File->New->Blank M-File from the main MATLAB menubar.) You will enter each of the examples in a new cell in this script.
SUGGESTED READING: Wikipedia has a discussion of confidence intervals. Read the introduction and the sections entitled "Conceptual basis" and "Practical example" found at <http://en.wikipedia.org/wiki/Confidence_interval>.
EXAMPLE 1: Load the Fisher iris data
Create a new cell in which you type and execute:
load fisheriris;
You should see the following 2 variables in your Workspace Browser:
- meas - an array in which each column corresponds to a particular type of measurement and each row corresponds to the 4 measurements for a particular speciman. The different species are combined into a single array.
- species - a cell column vector containing the species designation for the speciman given in the corresponding row of meas. Possible values are 'setosa', 'versicolor', and 'virginica'.
EXAMPLE 2: Compute the ends of the 95% confidence intervals for sepal length
Create a new cell in which you type and execute:
sepalLens = reshape(meas(:, 1), 50, 3); % Make sepal lengths 50 x 3 sampleSize = size(sepalLens, 1); % Number of rows (points in the sample) sLenMeans = mean(sepalLens); % Calculate the mean sepal length each species sLenSEMs = std(sepalLens)./ sqrt(sampleSize); sLenCIEnds = sLenSEMs.* 1.96; % Calculate the size of confidence interval
*You should see the following 5 variables in your Workspace Browser:
- sepalLens - array with the sepal lengths for the three species in 3 columns
- sampleSize - number of points in the sample for each species
- sLenMeans - vector of mean sepal lengths for each of the three species
- sLenSEMs - vector of standard errors for mean sepal lengths of the 3 species
- sLenCIEnds - length of each end of 95% confidence intervals
- Define a variable called
meanSepalLencontaining the overall mean sepal length for this data set. - Define a variable called
stdSepalLencontaining the overall standard deviation of the sepal length for this data set. - Define a variable called
numSepalLensthat contains the total number of sepal length measurements present in the data set. - Define a variable called
semSepalLencontaining overall standard error of the mean for the sepal lengths in this data set.
EXAMPLE 3: Output population estimates for Iris setosa
Create a new cell in which you type and execute:
fprintf('Population estimate of mean sepal length for Iris setosa: %g\n', ... sLenMeans(1)) fprintf('95%% confidence interval for this estimate: [%g, %g]\n', ... sLenMeans(1) - sLenCIEnds(1), sLenMeans(1) + sLenCIEnds(1)); fprintf('Standard error (SE or SEM): %g\n', sLenSEMs(1)); fprintf('Relative standard error (RSE): %g%%\n', ... 100*sLenSEMs(1)./sLenMeans(1));
You should see the following output in the Command Window:
Population estimate of mean sepal length for Iris setosa: 5.006 95% confidence interval for this estimate: [4.90829, 5.10371] Standard error (SE or SEM): 0.0498496 Relative standard error (RSE): 0.995796%
EXAMPLE 4: Plot mean sepal length using 95% confidence interval error bars
Create a new cell in which you type and execute:
irisSpecies = {'Setosa', 'Virginica', 'Versicolor'}; % Use a standard legend
irisTitle = 'Comparison of three iris species'; % Use a standard title
figure
errorbar(sLenMeans, sLenCIEnds, 'ks'); % Error bars use black squares
set(gca, 'XTick', 1:3, 'XTickLabel', irisSpecies) % Set ticks and tick labels
xlabel('Species of Iris');
ylabel('Sepal length in mm')
title(irisTitle)
legend('Mean (95% CI error bars)', 'Location', 'Southeast') % Put in lower right
You should see a Figure Window with a labeled error bar plot:
EXAMPLE 5: Plot mean sepal length with SEM and 95% confidence interval error bars
Create a new cell in which you type and execute:
xPositions = [1, 2, 3]; % Use these as base x-axis error bar positions figure hold on errorbar(xPositions-0.1, sLenMeans, sLenSEMs, 'g^') % Green up triangles errorbar(xPositions+0.1, sLenMeans, sLenCIEnds, 'bv') % Blue down triangles set(gca, 'XTick', 1:3, 'XTickLabel', irisSpecies) % Set ticks and tick labels xlabel('Species of Iris'); ylabel('Sepal length in mm') title(irisTitle) legend({'Mean (SEM error bars)', 'Mean (95% CI error bars)'}, 'Location', 'Southeast') hold off
You should see a Figure Window with two sets of error bars:
EXAMPLE 6: Load the circadian rhythm data from Lab 1
Create a new cell in which you type and execute:
load circadian.mat
You should see the following 4 variables in your Workspace Browser:
- horse - an array with the body temperatures (in degrees centigrade) of the 5 horses in the columns.
- horseHours - a column vector with the hour that the corresponding data point of horse was measured relative to midnight (0 hours) on the starting day of the experiment.
- shrew - an array with the body temperatures (in degrees centigrade) of the same male tree shrew measured for 4 different environmental temperatures in the columns.
- shrewHours - a column vector with the hour that the corresponding data point of the shrew was measured relative to midnight (0 hours) on the starting day of the experiment.
EXAMPLE 7: Estimate hourly temperature mean and 95% CI (horse 1)
Create a new cell in which you type and execute:
hPtsPerDay = 12; % 12 data points per day hSampleSz = length(horse(:, 1))./hPtsPerDay; % Sample size horse1 = reshape(horse(:, 1), hPtsPerDay, hSampleSz); % Extract and reshape data from first horse horse1Means = mean(horse1, 2); % Average for each hour over all days horse1STDs = std(horse1, 0, 2); % Unbiased sd for each hour over all days horse1SEMs = horse1STDs./sqrt(hSampleSz); % Standard error of mean for each hour over all days horse1CIEnds = horse1SEMs.* 1.96; % 95% confidence interval size
*You should see the following 7 variables in your Workspace Browser:
- hPtsPerDay - sampling rate in pts per day
- hSampleSz - number of samples for each sampling time for each horse
- horse1 - array of data for the first horse with each day in a column
- horse1Means - vector of mean temperatures of horse 1 for each sample hour
- horse1STDs - vector of standard deviations of the temperatures of horse 1 for each sample hour
- horse1SEMs - vector of standard errors of mean temperatures of horse 1 for each sample hour
- horse1CIEnds - vector of confidence interval end lengths for horse
EXAMPLE 8: Plot hourly mean temperature using 95% CI error bars (1 horse)
Create a new cell in which you type and execute:
hHours = horseHours(1:hPtsPerDay); % Extract hour of day for means figure errorbar(hHours, horse1Means, horse1CIEnds, '-ks'); % Error bars use black squares xlabel('Hours (from midnight)'); ylabel('Body temperature ( ^oC)') title('Mean temperature variation of thoroughbred horse (10-day average)') legend('Mean 1 horse (95% CI error bars)', 'Location', 'Southeast') % Put in lower right
You should see a Figure Window with a labeled error bar plot:
EXAMPLE 9: Estimate hourly temperature mean and 95% CI (5 horses)
Create a new cell in which you type and execute:
hAllSampleSz = length(horse(:))./hPtsPerDay; % Sample size when all horses used horseR = reshape(horse, hPtsPerDay, hAllSampleSz); % Each row corresponds to an hour horseMeans = mean(horseR, 2); horseSTDs = std(horseR, 0, 2); horseSEMs = horseSTDs./sqrt(hAllSampleSz); horseCIEnds = horseSEMs.* 1.96;
*You should see the following 6 variables in your Workspace Browser:
- hAllSampleSz - number of samples for each sampling time for all horses combined
- horseR - array of data for all horses with each day in a column
- horseMeans - vector of mean temperatures of the horses for each sample hour
- horseSTDs - vector of standard deviations of the temperatures of the horses for each sample hour
- horseSEMs - vector of standard errors of mean temperatures of horses for each sample hour
- horseCIEnds - vector of confidence interval end lengths for horse
EXAMPLE 10: Plot hourly mean temperature using 95% CI error bars (5 horses)
Create a new cell in which you type and execute:
figure errorbar(hHours, horseMeans, horseCIEnds, '-ro'); % Error bars use red circles xlabel('Hours (from midnight)'); ylabel('Body temperature ( ^oC)') title('Mean temperature variation of thoroughbred horse (10-day average)') legend('Mean 5 horses (95% CI error bars)', 'Location', 'Southeast') % Put in lower right
You should see a Figure Window with a labeled error bar plot:
EXAMPLE 11: Show results from 1 horse and 5 horses on same graph
Create a new cell in which you type and execute:
figure hold on errorbar(hHours, horse1Means, horse1CIEnds, '-ks'); errorbar(hHours, horseMeans, horseCIEnds, '-ro'); xlabel('Hours (from midnight)'); ylabel('Body temperature ( ^oC)') title('Mean temperature variation of thoroughbred horse (10-day average)') legend({'Mean 1 horse (95% CI error bars)' 'Mean 5 horses (95% CI error bars)'}, ... 'Location', 'Southeast') % Put in lower right
You should see a Figure Window with a labeled error bar plot:
This lesson was written by Kay A. Robbins of the University of Texas at San Antonio and last modified on 31-Dec-2010. Please contact krobbins@cs.utsa.edu with comments or suggestions. The image is a cladogram showing the ancestry of the mammals discussed in Richard Dawkins' book The Ancestor's Tale and generated by Fred Hsu from ITOL: Interactive Tree of Life (http://itol.embl.de/). See http://en.wikipedia.org/wiki/The_Ancestor%27s_Tale for a description of The Ancestor's Tale.
