LESSON 6: Scatter plots, curve fitting and correlation
FOCUS QUESTION: How can I determine whether two variables are related?
This lesson shows you how to determine whether two variables are related by calculating their correlation. This lesson also introduces scatter plots and illustrates how to create a linear model to capture the relationship between two variables.
Contents
- DATA FOR THIS LESSON
- SETUP FOR LESSON 6
- EXAMPLE 1: Load the Darwin Finch beak size parentage data
- EXAMPLE 2: Define variables for child, mother, father, and parent beak sizes
- EXAMPLE 3: Calculate the beak size correlations
- EXAMPLE 4: Plot the average parent beak size against the child beak size
- EXAMPLE 5: Edit the figure of EXAMPLE 4
- EXAMPLE 6: Display the plot in a new Figure Window
- EXAMPLE 7: Calculate and print the best fit lines for beak parentage
- EXAMPLE 8: Evaluate each of the best fit lines to find predicted values
- EXAMPLE 9: Find the difference between actual child beak sizes and predictions
- EXAMPLE 10: Find the root mean squared error (RMS) between actual and predicted sizes
- EXAMPLE 11: Output the root mean squared error (RMS)
- SUMMARY OF SYNTAX
DATA FOR THIS LESSON
| File | Description |
| DaphneIsland.txt |
|
SETUP FOR LESSON 6
- Set the Current Directory to Z:\working\MATLAB\Lesson6. (You will need to make a new directory for Lesson6.)
- Download the data file DaphneIsland.txt to your Lesson6 directory.
- Create a new script called Lesson6Script.m. (Use File->New->Blank M-File from the main MATLAB menubar.) You will enter each of the examples in a new cell in this script.
EXAMPLE 1: Load the Darwin Finch beak size parentage data
Create a new cell in which you type and execute:
beaks = load('DaphneIsland.txt');
You should see a beaks variable in your Workspace.
EXAMPLE 2: Define variables for child, mother, father, and parent beak sizes
Create a new cell in which you type and execute:
child = beaks(:, 1); % Size of offspring beaks mother = beaks(:, 2); % Size of maternal beaks father = beaks(:, 3); % Size of paternal beaks parent = mean(beaks(:, 2:3), 2); % Average the parent beak sizes
You should 4 additional variables in your Workspace Browser:
- child - a column vector containing the offspring beak sizes
- mother - a column vector containing the maternal beak sizes
- father - a column vector containing the paternal beak sizes
- parent - a column vector containing the average of the parents' beak sizes
EXAMPLE 3: Calculate the beak size correlations
Create a new cell in which you type and execute:
fprintf('Daphne Island ground finch beak size correlations:\n'); fprintf(' mother-child:\t %g\n', corr(mother, child));
You should see the following output in your Command Window:
Daphne Island ground finch beak size correlations: mother-child: 0.75621
Create a new cell right here (beginning of a cell starts with %%). Write a MATLAB statement to print the correlation between the father's beaksize and the child's beaksize. Also write a MATLAB statement to print the correlation between the parent's average beak size and the child's beaksize.
EXAMPLE 4: Plot the average parent beak size against the child beak size
Create a new cell in which you type and execute:
pcCor = corr(parent, child); % Calculate parent-child correlation % Create a title string tString = ['Daphne Island ground finches (corr = ' num2str(pcCor) ')']; figure('Name', tString) % Put this title on the window plot(parent, child, 'ko') % Plot a scatter plot xlabel('Mean parent beak size (mm)'); % Label the x-axis ylabel('Child beak size (mm)'); % Label the y-axis title(tString); % Put this title on the graph
You should see a Figure Window with the following plot:
EXAMPLE 5: Edit the figure of EXAMPLE 4
- Make the plot editable. (Use Tools->Edit Plot from the Figure Window menubar.)
- Add a linear fit line. (Use Tools->Basic Fitting from the Figure Window menubar.)
- Save the figure as BeakSize.fig.
EXAMPLE 6: Display the plot in a new Figure Window
Create a new cell in which you type and execute:
open BeakSize.fig;
You should see a Figure Window containing an edited plot with a fit line:
EXAMPLE 7: Calculate and print the best fit lines for beak parentage
Create a new cell in which you type and execute:
fprintf('Best fit for beak size of Daphne Island ground finches:\n'); mPoly = polyfit(mother, child, 1); % Linear fit of mother vs child fprintf(' mother-child:\t y = %g*x + %g\n', mPoly(1), mPoly(2));
You should see the following variable in the Workspace Browser:
- mPoly - a two-element row vector with the coefficients of a line
You should also see the following output in your Command Window:
Best fit for beak size of Daphne Island ground finches: mother-child: y = 0.658622*x + 2.66147
- Define a variable called fPoly that contains the coefficients for the line fit of the father and child data.
- Define a variable called pPoly that contains the coefficients for the line fit of the parent and child data.
EXAMPLE 8: Evaluate each of the best fit lines to find predicted values
Create a new cell in which you type and execute:
mPred = polyval(mPoly, mother); % Evaluate linear fit of mother equation
You should see the following variable in the Workspace Browser:
- mPred - prediction from linear model for the offspring's beak given the mother's
- Define a variable called fPred that contains predictions from linear model for the offspring's beak given the fathers's beak size.
- Define a variable called pPred that contains predictions from linear model for the offspring's beak given the parents' beak size.
EXAMPLE 9: Find the difference between actual child beak sizes and predictions
Create a new cell in which you type and execute:
mErrors = child - mPred; % Actual - predicted by mother's size
You should see the following variable in the Workspace Browser:
- mErrors - difference between offspring and mother's prediction
- Define a variable called fErrors that contains difference between offspring's actual beaksize and the value predicted by the fathers's beak size.
- Define a variable called pErrors that contains difference between offspring's actual beaksize and the value predicted by the parents' average beak size.
EXAMPLE 10: Find the root mean squared error (RMS) between actual and predicted sizes
Create a new cell in which you type and execute:
mRMS = sqrt( mean(mErrors.* mErrors) );
You should see the following variable in the Workspace Browser:
- mRMS - error in offspring prediction from mother's data
- Define a variable called fRMS that contains root mean squared error between offspring's actual beaksize and the value predicted by the father's beak size.
- Define a variable called pRMS that contains root mean squared error between offspring's actual beaksize and the value predicted by the parents' average beak size.
EXAMPLE 11: Output the root mean squared error (RMS)
Create a new cell in which you type and execute:
fprintf('Root mean squared error (RMS) for Best fit lines:\n'); fprintf(' mother-child:\t %g\n', mRMS);
You should see the following output in your Command Window:
Root mean squared error (RMS) for Best fit lines: mother-child: 0.484323
SUMMARY OF SYNTAX
| MATLAB syntax | Description |
rho = corr(x, y) |
calculates the correlation between the column vectors x and
y. The variable rho contains a
single value between -1 and 1 indicating how
closely the values of x and y are related.
If the correlation is close to 1, the values of x
and y go up and down together. If the correlation
is close to -1, the values of x and y go in
opposite directions (if one goes up, the other tends to go
down). A correlation value close to 0 indicates that the values
of x and y are not related. The column
vectors x and y must have the same
same number of elements. |
p = polyfit(x, y, n) |
calculates the coefficients of the best polynomial of
degree n that fits the curve x
versus y. This polynomial minimizes the RMS error, and
is sometimes called the least-squared error approximation.
The coefficients of the polynomial appear in the vector p,
such that p(1) has the coefficient of the highest
term in the polynomial. |
Y = polyval(p, X) |
evaluates the polynomial whose coefficients are in the vector
p at the points contained in the array X.
The array Y holds the results of this evaluation. |
This lesson was written by Kay A. Robbins of the University of Texas at San Antonio and last modified on 31-Dec-2010. Please contact krobbins@cs.utsa.edu with comments or suggestions. The drawing was done by John Gould before 1772 and was cataloged in Darwin's finches or Galapagos finches. Darwin, 1745. Journal of researches into the natural history and geology of the countries visited during the voyage of H.M.S. Beagle round the world, under the Command of Capt. Fitz Roy, R.N. 2d edition. The copyright has expired on this image.
