LESSON 13: Depicting confidence and significance

FOCUS QUESTION: How reliable is my estimate of population mean?

This lesson introduces some basic techniques for depicting confidence in estimating the mean of a population. The goal is for you to gain a working knowledge of how to apply these tools in standard situations. You will learn the theoretical underpinning of the techniques in your statistics courses.

In this lesson you will:
  • Compute the standard error of the mean or SEM.
  • Compute confidence intervals for estimates of population mean.
  • Arrange data sets in different configurations to average in different ways.
  • Use 3D arrays to assist in more complicated rearrangements.
Caldogram showing ancestry of Ancestor Tale Mammals

Contents


DATA FOR THIS LESSON

File Description
fisheriris
  • This data set contains the famous Fisher iris data set. The data set consists of measurements of 150 flower samples from each of three species of flowers: Iris setosa, Iris virginica, and Iris versicolor. The measurements are in mm.
  • Four features were measured for each sample:
    • The length of the flower sepal
    • The width of the flower sepal
    • The length of the flower petal
    • The width of the flower petal
  • All 150 samples from the Fisher iris data are stored in a single table called meas:
    • The four columns correspond to the four types of measurements: sepal length, sepal width, petal length and petal width, respectively.
    • The first 50 rows contain data for Iris setosa
    • The second 50 rows contain data for Iris virginica
    • The third 50 rows contain data for Iris versicolor.
  • The species information is kept in a separate vector called species.
  • The data is sometimes referred to as Anderson's Iris data in honor of Edgar Anderson, the biologist who collected the data. See http://en.wikipedia.org/wiki/Iris_flower_data_set for additional information.

Note: This dataset comes with the MATLAB distribution so you don't have to download it separately.

circadian.mat
  • This data set contains core body temperature (in degrees centigrade) of female horse (Equus caballus) and male tree shrew (tupaia belangeri).
  • The horse data is stored in the variables horse and horseHours:
    • Each column of the horse array contains the body temperatures of one of 5 thoroughbred female horses measured at 2-hour intervals for 10 consecutive days.
    • The horseHours variable holds a column vector with the hour that the corresponding data point of horse was measured relative to midnight (0 hours) on the starting day of the experiment.
  • During the experiment the horses were kept in a climate-controlled environment maintained at 13 degrees centigrade. Lights were on daily from 0800 h to 1700 h. The data starts at 0800 h on the first day.
  • The tree shrew data is stored in the variables shrew and shrewHours:
    • Each column of the shrew array contains the body temperature of the same male tree shrew measured for 4 consecutive days at 6 minute intervals. The first column was measured at an environmental temperature of 14 degrees centigrade. The remaining columns were measured at environmental temperatures of 20, 26, and 32 degrees centigrade, respectively.
    • The shrewHours variable holds a column vector with the hour that the corresponding data point of shrew was measured relative to midnight (0 hours) on the starting day of the experiment.
  • Lights were on daily from 1200 h to 2400 h. The file starts at 0000 h on the first day.
  • The data was taken from the repository found at http://www.circadian.org/data.h tml and maintained by the laboratory of Roberto Refinetti.
  • The horse measurements were reported in the paper:
    The circadian rhythm of body temperature of the horse
    G. Piccione, G. Caola, and R. Refinetti
    Biological Rhythm Research, 33(1):113-119, 2002.
  • The shrew measurements were reported in the paper:
    The effects of ambient temperature on the body temperature rhythm of rats, hamsters, gerbils and tree shrews
    R. Refinetti
    Journal of Thermal Biology 22(4/5):281-284, 1997.

SETUP FOR LESSON 13

SUGGESTED READING: Wikipedia has a discussion of confidence intervals. Read the introduction and the sections entitled "Conceptual basis" and "Practical example" found at <http://en.wikipedia.org/wiki/Confidence_interval>.


EXAMPLE 1: Load the Fisher iris data

Create a new cell in which you type and execute:

   load fisheriris;

You should see the following 2 variables in your Workspace Browser:


EXAMPLE 2: Compute the ends of the 95% confidence intervals for sepal length

Create a new cell in which you type and execute:

   sepalLens = reshape(meas(:, 1), 50, 3);  % Make sepal lengths 50 x 3
   sampleSize = size(sepalLens, 1); % Number of rows (points in the sample)
   sLenMeans = mean(sepalLens); % Calculate the mean sepal length each species
   sLenSEMs = std(sepalLens)./ sqrt(sampleSize);
   sLenCIEnds = sLenSEMs.* 1.96; % Calculate the size of confidence interval

*You should see the following 5 variables in your Workspace Browser:

In the space below: Enter your definitions in this cell and execute the cell to create these variables.


EXAMPLE 3: Output population estimates for Iris setosa

Create a new cell in which you type and execute:

   fprintf('Population estimate of mean sepal length for Iris setosa: %g\n',  ...
           sLenMeans(1))
   fprintf('95%% confidence interval for this estimate: [%g, %g]\n', ...
           sLenMeans(1) - sLenCIEnds(1), sLenMeans(1) + sLenCIEnds(1));
   fprintf('Standard error (SE or SEM): %g\n', sLenSEMs(1));
   fprintf('Relative standard error (RSE): %g%%\n', ...
       100*sLenSEMs(1)./sLenMeans(1));

You should see the following output in the Command Window:

Population estimate of mean sepal length for Iris setosa: 5.006
95% confidence interval for this estimate: [4.90829, 5.10371]
Standard error (SE or SEM): 0.0498496
Relative standard error (RSE): 0.995796%


EXAMPLE 4: Plot mean sepal length using 95% confidence interval error bars

Create a new cell in which you type and execute:

   irisSpecies = {'Setosa', 'Virginica', 'Versicolor'}; % Use a standard legend
   irisTitle = 'Comparison of three iris species';      % Use a standard title
   figure
   errorbar(sLenMeans, sLenCIEnds, 'ks'); % Error bars use black squares
   set(gca, 'XTick', 1:3, 'XTickLabel', irisSpecies) % Set ticks and tick labels
   xlabel('Species of Iris');
   ylabel('Sepal length in mm')
   title(irisTitle)
   legend('Mean (95% CI error bars)', 'Location', 'Southeast') % Put in lower right

You should see a Figure Window with a labeled error bar plot:


EXAMPLE 5: Plot mean sepal length with SEM and 95% confidence interval error bars

Create a new cell in which you type and execute:

   xPositions = [1, 2, 3];  % Use these as base x-axis error bar positions
   figure
   hold on
   errorbar(xPositions-0.1, sLenMeans, sLenSEMs, 'g^') % Green up triangles
   errorbar(xPositions+0.1, sLenMeans, sLenCIEnds, 'bv') % Blue down triangles
   set(gca, 'XTick', 1:3, 'XTickLabel', irisSpecies) % Set ticks and tick labels
   xlabel('Species of Iris');
   ylabel('Sepal length in mm')
   title(irisTitle)
   legend({'Mean (SEM error bars)', 'Mean (95% CI error bars)'}, 'Location', 'Southeast')
   hold off

You should see a Figure Window with two sets of error bars:


EXAMPLE 6: Load the circadian rhythm data from Lab 1

Create a new cell in which you type and execute:

   load circadian.mat

You should see the following 4 variables in your Workspace Browser:


EXAMPLE 7: Estimate hourly temperature mean and 95% CI (horse 1)

Create a new cell in which you type and execute:

   hPtsPerDay = 12;                              % 12 data points per day
   hSampleSz = length(horse(:, 1))./hPtsPerDay;     % Sample size
   horse1 = reshape(horse(:, 1), hPtsPerDay, hSampleSz); % Extract and reshape data from first horse
   horse1Means = mean(horse1, 2);         % Average for each hour over all days
   horse1STDs = std(horse1, 0, 2);        % Unbiased sd for each hour over all days
   horse1SEMs = horse1STDs./sqrt(hSampleSz); % Standard error of mean for each hour over all days
   horse1CIEnds = horse1SEMs.* 1.96;      % 95% confidence interval size

*You should see the following 7 variables in your Workspace Browser:


EXAMPLE 8: Plot hourly mean temperature using 95% CI error bars (1 horse)

Create a new cell in which you type and execute:

   hHours = horseHours(1:hPtsPerDay);        % Extract hour of day for means
   figure
   errorbar(hHours, horse1Means, horse1CIEnds, '-ks');  % Error bars use black squares
   xlabel('Hours (from midnight)');
   ylabel('Body temperature ( ^oC)')
   title('Mean temperature variation of thoroughbred horse (10-day average)')
   legend('Mean 1 horse (95% CI error bars)', 'Location', 'Southeast') % Put in lower right

You should see a Figure Window with a labeled error bar plot:


EXAMPLE 9: Estimate hourly temperature mean and 95% CI (5 horses)

Create a new cell in which you type and execute:

   hAllSampleSz = length(horse(:))./hPtsPerDay;     % Sample size when all horses used
   horseR = reshape(horse, hPtsPerDay, hAllSampleSz); % Each row corresponds to an hour
   horseMeans = mean(horseR, 2);
   horseSTDs = std(horseR, 0, 2);
   horseSEMs = horseSTDs./sqrt(hAllSampleSz);
   horseCIEnds = horseSEMs.* 1.96;

*You should see the following 6 variables in your Workspace Browser:


EXAMPLE 10: Plot hourly mean temperature using 95% CI error bars (5 horses)

Create a new cell in which you type and execute:

   figure
   errorbar(hHours, horseMeans, horseCIEnds, '-ro');  % Error bars use red circles
   xlabel('Hours (from midnight)');
   ylabel('Body temperature ( ^oC)')
   title('Mean temperature variation of thoroughbred horse (10-day average)')
   legend('Mean 5 horses (95% CI error bars)', 'Location', 'Southeast') % Put in lower right

You should see a Figure Window with a labeled error bar plot:


EXAMPLE 11: Show results from 1 horse and 5 horses on same graph

Create a new cell in which you type and execute:

   figure
   hold on
   errorbar(hHours, horse1Means, horse1CIEnds, '-ks');
   errorbar(hHours, horseMeans, horseCIEnds, '-ro');
   xlabel('Hours (from midnight)');
   ylabel('Body temperature ( ^oC)')
   title('Mean temperature variation of thoroughbred horse (10-day average)')
   legend({'Mean 1 horse (95% CI error bars)' 'Mean 5 horses (95% CI error bars)'}, ...
       'Location', 'Southeast') % Put in lower right

You should see a Figure Window with a labeled error bar plot:


This lesson was written by Kay A. Robbins of the University of Texas at San Antonio and last modified on 31-Dec-2010. Please contact krobbins@cs.utsa.edu with comments or suggestions. The image is a cladogram showing the ancestry of the mammals discussed in Richard Dawkins' book The Ancestor's Tale and generated by Fred Hsu from ITOL: Interactive Tree of Life (http://itol.embl.de/). See http://en.wikipedia.org/wiki/The_Ancestor%27s_Tale for a description of The Ancestor's Tale.