LESSON 15 QUESTIONS: Hypothesis testing
FOCUS QUESTION: How can I tell whether the experimental group is different from the control group?
Contents
- EXAMPLE 1: Load the Fisher iris data and extract sepal lengths of Iris setosa
- EXAMPLE 2: Test Iris setosa mean sepal length using a one sample t-test
- EXAMPLE 3: Apply one sample t-test (standard statistial terminology)
- EXAMPLE 4: Test for inequality using the one sample t-test
- EXAMPLE 5: Look at the p-value and the confidence interval
- EXAMPLE 6: Load the Daphne Island and Santa Cruz Island beak size data
- EXAMPLE 7: Are Daphne finch beak sizes different from those of Santa Cruz finches?
- EXAMPLE 8: Look at the p-value and confidence interval for the two-sample t-test
- EXAMPLE 9: Load the consolidated sleep diary data
- EXAMPLE 10: Calculate wake-up hours, separated by gender
- EXAMPLE 11: Do men get up earlier than women on average?
- EXAMPLE 12: Compare average wake-up time of instructor to that of a random student
- EXAMPLE 13: Output IDs of the subjects whose average wake-up is similar instructor's
EXAMPLE 1: Load the Fisher iris data and extract sepal lengths of Iris setosa
load fisheriris; sLenSetosa = meas(strcmp(species, 'setosa'), 1); % Sepal lengths of Iris setosa
EXAMPLE 2: Test Iris setosa mean sepal length using a one sample t-test
fprintf('Estimated population mean is %g\n', mean(sLenSetosa)); testMean = 5.936; % Could the true mean sepal length be this value? fprintf('True population mean of Iris setosa sepal length '); if ttest(sLenSetosa, testMean) == 1 fprintf('is likely to be different from %g\n', testMean); else fprintf('could be %g\n', testMean); end;
Estimated population mean is 5.006 True population mean of Iris setosa sepal length is likely to be different from 5.936
| Questions | Answers |
What is the purpose of the ttest function? |
The ttest uses knowledge about a sample to
determine whether a population
mean is likely to be different from a specified value. The sample
should be drawn at random from a normally distributed population. |
What should I conclude if this ttest returns 1? |
You can conclude that the true mean sepal length of Iris setosa is unlikely to be 5.936. |
What should I conclude if this ttest returns 0? |
Based on the evidence provided by this sample, it is possible that the true mean sepal length of Iris setosa is 5.936. However, you do not have evidence to make a definitive conclusion. (Presume innocent until proven guilty!) |
Does ttest make any assumptions
about the distribution of the population? |
Yes, ttest assumes that the population
is normally distributed. |
When should I use ttest? |
Use ttest when you want to
know whether a sample comes from population that has a different
mean than a specified value. |
Should I apply the ttest to the sample mean? |
No, this a point of confusion. The test function
requires the entire sample not just the sample mean. |
EXAMPLE 3: Apply one sample t-test (standard statistial terminology)
testMean = 5.936; % Could the true mean sepal length be this value? fprintf(['\nNull hypothesis: The true mean Iris setosa sepal length ' ... ' is %g\n'], testMean); fprintf(['Alt hypothesis: The true mean Iris setosa sepal length ' ... 'is different from %g\n'], testMean); if ttest(sLenSetosa, testMean) == 1 fprintf('\tReject null hypothesis in favor of alternative '); else fprintf('\tCannot reject the null hypothesis '); end; fprintf('at the 0.05 significance level\n');
Null hypothesis: The true mean Iris setosa sepal length is 5.936 Alt hypothesis: The true mean Iris setosa sepal length is different from 5.936 Reject null hypothesis in favor of alternative at the 0.05 significance level
| Questions | Answers |
What is the null hypothesis for this ttest? |
The null hypothesis is that the true mean sepal length for Iris setosa is 5.936. |
What is the alternative hypothesis for this ttest? |
The alternative hypothesis is that the true mean sepal length for Iris setosa is not 5.936. |
| Why bother with the strange null hypothesis terminology? | Many researchers report the results of standard statistical tests in terms of the null hypothesis, p-values, and confidence intervals. In order to understand these results, you must learn to work with the terminology. |
What does a ttest return value of 1 mean in terms of
the null hypothesis? |
A return value of 1 indicates that you should reject the null hypothesis in favor of the alternative hypothesis. In other words, you should conclude that the true mean sepal length of Iris setosa is unlikely to be 5.936. |
| How unlikely? | The default level of significance for ttest
is 5%, meaning that less than 5% of the
samples randomly selected from a normal distribution
with mean 5.936 would have this statistical characteristic.
Roughly, you can
think of this as being "95% certain" that the true mean sepal
length for Iris setosa is not 5.936. |
How is the value of h related to
the level of significance? |
The level of significance is a threshold set by you. By
default the ttest uses a 5% significance level
corresponding to an alpha of 0.05. A
test statistic for your sample is compared with the results
from a normal distribution with mean 5.936. If fewer than
5% of the samples from this normal distribution have a test statistic
more extreme than that of your sample, the ttest
returns a value of 1. Otherwise, the ttest,/code>returns
a value of 0. |
What does a ttest return value of 0 mean in terms of
the null hypothesis? |
A return value of 0 indicates that you cannot reject the null hypothesis. This does not mean that the null hypothesis is true, just that you don't have enough information on which to make an informed decision. |
Is it necessary to state null and alternative hypotheses
when applying the ttest? |
No, you can call the ttest function without
explicitly stating these hypotheses. |
EXAMPLE 4: Test for inequality using the one sample t-test
testMean = 5.936; % Could the true mean sepal length be this value? fprintf('\nTrue population mean of Iris setosa sepal length '); if ttest(sLenSetosa, testMean, 0.05, 'left') == 1 fprintf('is likely to be less than ') else fprintf('could be be greater than or equal to '); end; fprintf('%g\n', testMean);
True population mean of Iris setosa sepal length is likely to be less than 5.936
| Questions | Answers |
Can I use the ttest to determine whether
the mean of the population corresponding to sample x is greater than 5.2? |
Yes, the one-sided test: ttest(x, 5.2, 0.05, 'right')
has the alternative hypothesis: "The mean of the population represented
by sample x is greater than 5.2". |
Could I have asked for a more statistically significant answer? | Yes, the For example, |
What is the difference between the value of
alpha and the significance level? |
The value of alpha is always a fraction between 0 and
1. The significance level is the corresponding percentage. For
example an alpha of 0.05 corresponds to a 5% significance
level, while an alpha of 0.01 corresponds to a 1%
significance level. We use the two interchangeability, noting
that MATLAB requires a fraction rather than a
percentage when specifying significance. |
EXAMPLE 5: Look at the p-value and the confidence interval
testMean = 5.936; % Could the true mean sepal length be this value? fprintf( ... '\nIs the true population mean of Iris setosa different from %g?\n', ... testMean); [h, p, ci] = ttest(sLenSetosa, testMean); fprintf('\t hypothesis = %g\n', h); % Truth of alternative hypothesis fprintf('\t pvalue = %g\n', p); % Lower value indicates more support for alt hyp fprintf('\t 95%% confidence interval for population mean: [%g, %g]\n', ci);
Is the true population mean of Iris setosa different from 5.936? hypothesis = 1 pvalue = 6.6085e-24 95% confidence interval for population mean: [4.90582, 5.10618]
| Questions | Answers |
What does the pvalue tell me? |
The pvalue gives the probability that a sample drawn
at random from the true population
could have produced the observed test statistic if
the null hypothesis were actually true.
In other words, the pvalue is a measure
of how probable the observed differences could be due to a bad
draw of the random sample. The pvalue in this example
is miniscule, so it is very unlikely that such a sample could have
been drawn by chance from a distribution whose mean was actually 5.936.
|
How is the ttest return value
related to the pvalue? |
The ttest has a cut-off value called the level of
significance or alpha. If the pvalue is
less than alpha, then the ttest return value is 1.
|
How should I interpret ci? |
The ci holds the confidence interval for this test.
95% of the samples will have a confidence interval that contains
the actual population mean.
Since ci is
[4.90582, 5.10618], you can be 95% "sure" that this
confidence interval holds the
true mean of the Iris setosa. In other words, you can be
95% sure that the true mean is in [4.90582, 5.10618]. |
What is h? |
The h indicates whether to reject
the null hypothesis in favor of the alternative. If h
is 1, reject the null hypothesis. If h is 0, you
don't have sufficient evidence to reject the null hypothesis.
|
How is h related to the ttest
return value of EXAMPLE 2, 3 and 4? |
The h value corresponds to the ttest
return value of these examples.
|
How is the value of h to the? |
The h value corresponds to the ttest
return value of these examples.
|
How should I interpret ci? |
The ci holds the confidence interval for this test.
Since ci is
[4.90582, 5.10618], you can be 95% confident that the
true mean of the Iris setosa is in [4.90582, 5.10618]. |
EXAMPLE 6: Load the Daphne Island and Santa Cruz Island beak size data
Daphne = load('DaphneIsland.txt'); SantaCruz = load('SantaCruzIsland.txt');
EXAMPLE 7: Are Daphne finch beak sizes different from those of Santa Cruz finches?
fprintf(['\nNull hypothesis: The true mean beak sizes of Daphne and ' ... 'Santa Cruz finches are equal\n']); fprintf(['Alt hypothesis: The true mean beak size of Daphne finches ' ... 'is different from the true mean beak size of Santa Cruz finches\n']); if ttest2(Daphne, SantaCruz) == 1 fprintf('\tReject null hypothesis in favor of alternative '); else fprintf('\tCannot reject the null hypothesis '); end; fprintf('at the 0.05 significance level\n');
Null hypothesis: The true mean beak sizes of Daphne and Santa Cruz finches are equal Alt hypothesis: The true mean beak size of Daphne finches is different from the true mean beak size of Santa Cruz finches Reject null hypothesis in favor of alternative at the 0.05 significance level
| Questions | Answers |
What is the purpose of the ttest2 function? |
The ttest2 determines whether the means of two distinct
populations are likely to be different based on a random sample from each
population. |
When would I use ttest rather
than ttest2? |
Use the ttest
function when you have a particular mean value in mind and want to
determine whether the mean of a single population is likely to be different from that value.
Use the ttest2 when you want to determine whether the
means of two populations are likely to be different.
|
Should I apply the ttest2 to the sample means? |
No, this a point of confusion. The test2 function
requires the sample values, not just their respective means. |
How should I interpret a ttest2 return value of 1? |
A ttest return value of 1 indicates that
the true means of the sepal lengths
for Iris setosa and Iris virginica are likely to be different. |
| How likely is likely? | The default level of significance for ttest2
is 5% (alpha is 0.05),
meaning that there is less than a 5% probability that you would
observe the test statistic if
the means of the two species were the same. Roughly speaking, you can
be "95% sure" that the two species have
different mean sepal lengths. |
What if the test2 returns 0? |
A ttest2 return value of 0 means the test has not provided
evidence that the means are different. You cannot then conclude
that the means are the same. (Innocent until proven guilty again!) |
Does ttest2 make any assumptions
about the distribution of the population? |
Yes, ttest2 assumes that the samples were
drawn at random from normally distributed populations. |
| Can I get a more statistically significant answer? | Yes, the ttest2 function has a third argument, alpha,
that specifies the significance level. When omitted, this argument
is assumed to be 0.05, meaning the 5% significance level.
The call ttest2(x, y, 0.01) tests whether the true
means of the populations corresponding to samples x
and y, respectively, are different at the 1% significance
level. If ttest2 returns a value
of 1, you can conclude there is less than a 1% probability that the
true means of the populations are the same. |
What is the difference between the value of
alpha and the significance level? |
The value of alpha is always a fraction between 0 and
1. The significance level is the corresponding percentage. For
example an alpha of 0.05 corresponds to a 5% significance
level, while an alpha of 0.01 corresponds to a 1%
significance level. The two terms are often used interchangeably. Note:
the MATLAB functions always requires a fraction rather than a
percentage when specifying significance. |
EXAMPLE 8: Look at the p-value and confidence interval for the two-sample t-test
fprintf( ... '\nIs the true population mean of Daphne finches different from SantaCruz?\n'); [h, p, ci] = ttest2(Daphne, SantaCruz); fprintf('\t hypothesis = %g\n', h); % Truth of alternative hypothesis fprintf('\t pvalue = %g\n', p); % Lower value indicates more support for alt hyp fprintf('\t 95%% confidence interval for difference of population means: '); fprintf('[%g, %g]\n', ci);
Is the true population mean of Daphne finches different from SantaCruz? hypothesis = 1 pvalue = 2.77109e-11 95% confidence interval for difference of population means: [-1.4464, -0.795025]
| Questions | Answers |
What is the pvalue? |
The pvalue is the probability that your sample's
test statistic would have been observed if the populations
were actually the same. A smaller pvalue lends
more support to the population means being different. |
What is the relationship between the
pvalue and the significance level? |
The signficance level is a threshold set by the user. If
the probability your sample's test statistic would have been
observed if the means were actually equal is less than the
significance level, ttest2 returns an h
value of 1. The pvalue is the actual probability
that your sample's test statistic would have been observed
if the means were actually equal. In another words, the
pvalue reflects how likely your sample was due to
a bad draw if the means were actually equal. |
How should I interpret ci? |
The ci holds the confidence interval for the test. The
confidence interval provides an estimate of the difference
between the two population means. Since ci is
[-1.4464, -0.7950], you can be 95% confident that the
mean beak size of Daphne finches is between 0.7950 and 1.4464 mm
bigger than the mean beak size of Santa Cruz finches.
More technically, 95% of the samples will produce a confidence
interval that contains the difference of the populations. Thus, you
can be 95% certain that the difference in the means, (Daphne - SantaCruz),
is in [-1.4464, -0.7950]. |
Can I use ttest2 to find whether
the mean of the population corresponding to sample A is greater than the
mean of the population corresponding to sample B? |
Yes, the following one-sided call:
ttest2(A, B, 0.05, 'right') has an alternative
hypothesis "The mean of the population represented by sample
A is greater than the mean of the population
represented by sample B". |
EXAMPLE 9: Load the consolidated sleep diary data
load diaries.mat; % Load the sleep diaries
EXAMPLE 10: Calculate wake-up hours, separated by gender
wakeupHours = (wakeTimes - floor(wakeTimes))*24; % Get fractional part of wakeTimes men = strcmp('male', gender); % 1's where gender is 'male' mensWHours = wakeupHours(:, men); % Pick columns corresponding to men women = ~men; % 1's where gender is 'female' womensWHours = wakeupHours(:, women); % Pick columns corresponding to women numSubjects = size(wakeupHours, 2); % Also need number of subjects
EXAMPLE 11: Do men get up earlier than women on average?
fprintf('\nThe average wake-up time for men '); [h, p, ci] = ttest2(mensWHours(:), womensWHours(:)); if h == 1 fprintf('is likely to be different from that of women\n'); else fprintf('could be the same as that of women\n'); end; fprintf('\t hypothesis = %g\n', h); % Truth of alternative hypothesis fprintf('\t pvalue = %g\n', p); fprintf('\t 95%% confidence interval for difference = [%g, %g]\n', ci);
The average wake-up time for men is likely to be different from that of women hypothesis = 1 pvalue = 0.0114848 95% confidence interval for difference = [-0.421104, -0.0533088]
EXAMPLE 12: Compare average wake-up time of instructor to that of a random student
randStudent = randi(numSubjects - 1, 1, 1) + 1; % Pick a random student fprintf('\nThe average wake-up time of the instructor '); if ttest2(wakeupHours(:, 1), wakeupHours(:, randStudent)) == 1 fprintf('is likely to be different than '); else fprintf('could be the same as '); end; fprintf('the average wake-up time for subject %d\n', randStudent);
The average wake-up time of the instructor is likely to be different than the average wake-up time for subject 74
| Questions | Answers |
What does the first argument of randi represent? |
The randi function produces "random" integers 1 and
the value of the first argument. |
| Why use the number of subjects - 1 to pick a random student? | The cohort consists of 143 students and 1 instructor. The instructor is subject 1. Pick a random student by picking a random integer between 1 and 143 and then add one to i. |
What do the second and arguments of randi represent? |
The randi function produces an array of
"random" integers. The second and third arguments give the number of
rows and columns of this array. This example only required one value,
so both arguments were 1. |
EXAMPLE 13: Output IDs of the subjects whose average wake-up is similar instructor's
averWakeup = mean(wakeupHours);
fprintf(['\nThe following students have average wake-up times ' ...
'indistinguishable from instructor''s (%g):\n'], averWakeup(1));
for k = 2:numSubjects
if ttest2(wakeupHours(:, 1), wakeupHours(:, k)) == 0
fprintf('\t Subject %g''s average wake-up time = %g\n', k, averWakeup(k));
end;
end;
The following students have average wake-up times indistinguishable from instructor's (6.38685): Subject 6's average wake-up time = 6.44752 Subject 10's average wake-up time = 6.26992 Subject 13's average wake-up time = 6.95582 Subject 32's average wake-up time = 5.87253 Subject 37's average wake-up time = 6.963 Subject 47's average wake-up time = 7.18662 Subject 50's average wake-up time = 7.03739 Subject 56's average wake-up time = 6.71099 Subject 60's average wake-up time = 7.12994 Subject 71's average wake-up time = 7.87514 Subject 75's average wake-up time = 6.92933 Subject 79's average wake-up time = 7.06183 Subject 81's average wake-up time = 7.9929 Subject 82's average wake-up time = 7.30281 Subject 91's average wake-up time = 5.50243 Subject 100's average wake-up time = 6.38685 Subject 102's average wake-up time = 6.14901 Subject 111's average wake-up time = 7.35553 Subject 114's average wake-up time = 6.30092 Subject 129's average wake-up time = 6.63787 Subject 133's average wake-up time = 7.05643 Subject 136's average wake-up time = 7.09694 Subject 138's average wake-up time = 6.90769 Subject 140's average wake-up time = 5.68481
_This lesson was written by Kay A. Robbins of the University of Texas at San Antonio and last modified on 31-Dec-2010. Please contact krobbins@cs.utsa.edu with comments or suggestions. The photo is of Sir Ronald Fisher, founder of modern statistics and namesake of the Fisher Iris dataset. (See http://en.wikipedia.org/wiki/File:R._A._Fischer.jpg._