LESSON 5: Basic statistical indicators

FOCUS QUESTION: How can I find typical characteristics and central tendencies of data?

This lesson shows you how to calculate and display statistical indicators such as mean, median, maximum and minimum.

In this lesson you will:
  • Calculate mean and median of various data groupings.
  • Calculate maxima and minima.
  • Print results to the screen.
  • Output different types of variables.
  • Develop a utility function from scratch.
Measles rash

Contents


DATA FOR THIS LESSON

File Description
NYCDiseases.mat
  • The data set contains of the monthly totals of the number of new cases of measles, mumps, and chicken pox for New York City during the years 1931-1971.
  • The data is organized into the following variables:
    • measles - an array containing the monthly cases of measles
    • mumps - an array containing the monthly cases of mumps
    • chickenPox - an array containing the monthly cases of chicken pox
    • years - a vector containing the years 1931 through 1971
  • The data was extracted from the Hipel-McLeod Time Series Datasets Collection, available at http://www.stats.uwo.ca/faculty/aim/epubs/mhsets/readme-mhsets.html.
  • The data originally appeared in: Yorke, J.A. and London, W.P. (1973). "Recurrent Outbreaks of Measles, Chickenpox and Mumps", American Journal of Epidemiology, Vol. 98, pp. 469.

SETUP FOR LESSON 5


EXAMPLE 1: Load the data about New York contagious diseases

Create a new cell in which you type and execute:

   load NYCDiseases.mat;    % Load the disease data

You should see four new variables in the Workspace Browser:


EXAMPLE 2: Calculate overall average measles cases occurring each month

Create a new cell in which you type and execute:

   measlesAver = mean(measles(:));   % Average of entire array

You should see a new variable measlesAver corresponding the overall average monthly cases measles.

In the space below: Enter your definitions in this cell and execute the cell to create these variables.


EXAMPLE 3: Output the overall monthly average number of measles cases

Create a new cell in which you type and execute:

   fprintf('Average measles cases per month: %g\n', measlesAver);

You should see the following output in the Command Window:

Average measles cases per month: 1418.59


Create a new cell right here (beginning of a cell starts with %%). Write MATLAB code to print the overall monthly average number of mumps cases.


EXAMPLE 4: Calculate the individual monthly and yearly averages of measles

Create a new cell in which you type and execute:

   measlesMonthAver = mean(measles);     % Average the columns
   measlesYearAver = mean(measles, 2);   % Average the rows

You should see two new variables in the Workspace Browser:

In the space below: Enter your definitions in this cell and execute the cell to create these variables.


EXAMPLE 5: Output the individual monthly averages of measles

Create a new cell in which you type and execute:

   fprintf('Monthly averages of measles: [ '); % Output a leading string
   fprintf('%g ', measlesMonthAver);     % Output each element
   fprintf(']\n');                             % Output ending ] and newline

You should see the following output in the Command Window:

Monthly averages of measles: [ 940.39 1816.39 3428.2 3855.12 3159.73 2100.54 696.122 192.195 80.0732 100.854 193.976 459.537 ]


EXAMPLE 6: Create your own printList function


EXAMPLE 7: Output the monthly averages of measles by calling printList

Create a new cell in which you type and execute:

    printList('Monthly averages of measles', measlesMonthAver);

You should see the same output in the Command Window as in EXAMPLE 5:

Monthly averages of measles: [ 940.39 1816.39 3428.2 3855.12 3159.73 2100.54 696.122 192.195 80.0732 100.854 193.976 459.537 ]

In the space below: Enter your definitions in this cell and execute the cell to create these variables.


EXAMPLE 8: Output median, maximum and minimum of measles cases by month

Create a new cell in which you type and execute:

   printList('Median cases of measles by month', median(measles));
   printList('Maximum cases of measles by month', max(measles));
   printList('Minimum cases of measles by month', min(measles));

You should see the following output in the Command Window:

Median cases of measles by month: [ 283 603 1075 1301 1949 1797 596 185 68 79 110 149 ]
Maximum cases of measles by month: [ 6336 13226 25826 22741 8634 6253 1975 453 184 354 1050 2996 ]
Minimum cases of measles by month: [ 39 52 57 78 83 79 35 28 18 11 12 21 ]

In the space below: Enter your definitions in this cell and execute the cell to create these variables.


EXAMPLE 9: Output the yearly maxima and minima of measles

Create a new cell in which you type and execute:

   printList('Yearly maxima of measles', max(measles, [], 2));
   printList('Yearly minima of measles', min(measles, [], 2));

You should see the following output in the Command Window:

Yearly maxima of measles: [ 7095 2537 9635 1414 6813 8792 3546 10018 969 2996 25826 557 5760 8498 358 6597 1682 6909 1008 5428 1915 8616 1122 10720 1865 6064 1949 7634 837 6780 1043 7875 1289 3338 1199 2349 83 494 1301 185 844 ]
Yearly minima of measles: [ 43 118 50 45 67 87 84 56 36 90 55 34 142 21 18 56 88 32 100 73 184 40 164 59 98 110 170 47 97 43 109 58 168 49 83 24 11 39 31 39 12 ]

In the space below: Enter your definitions in this cell and execute the cell to create these variables.


EXAMPLE 10: Output the overall maximum and minimum monthly measles cases

Create a new cell in which you type and execute:

   fprintf('Measles: overall max = %g, overall min = %g\n', ...
           max(measles(:)), min(measles(:)));

You should see the following output in the Command Window:

Measles: overall max = 25826, overall min = 11



Create a new cell right here (beginning of a cell starts with %%). Write MATLAB code to plot the average and median number of measles cases by month. Make sure that your graph is labeled properly.


SUMMARY OF SYNTAX

MATLAB syntax Description
mean(x) computes the means of the columns of the array x. This could also be written as mean(x, 1).
mean(x, 2) computes the means of the rows of the array x.
median(x) computes the medians of the columns of the array x. This could also be written as median(x, 1).
median(x, 2) computes the medians of the rows of the array x.
max(x) computes the maxima of the columns of the array x. (This could also be written as max(x, [], 1).
max(x, [], 2) computes the maxima of the rows of the array x.
min(x) computes the minima of the columns of the array x. (This could also be written as max(x, [], 1).
min(x, [], 2) computes the minima of the rows of the array x.
fprintf('My_Message') outputs My_Message to the Command Window.
fprintf('The value %g is larger than zero\n', X) substitutes the value of the variable X at the point where the %g occurs in the message. The value of X must be numeric.
fprintf('Another message is %s\n', A) substitutes the value of the variable A at the point where the %s occurs in the message. A must contain a string.


This lesson was written by Kay A. Robbins of the University of Texas at San Antonio and last modified on 31-Dec-2010. Please contact krobbins@cs.utsa.edu with comments or suggestions.