LESSON 15 QUESTIONS: Hypothesis testing

FOCUS QUESTION: How can I tell whether the experimental group is different from the control group?

Contents


EXAMPLE 1: Load the Fisher iris data and extract sepal lengths of Iris setosa

   load fisheriris;
   sLenSetosa = meas(strcmp(species, 'setosa'), 1);  % Sepal lengths of Iris setosa


EXAMPLE 2: Test Iris setosa mean sepal length using a one sample t-test

   fprintf('Estimated population mean is %g\n', mean(sLenSetosa));
   testMean = 5.936;    % Could the true mean sepal length be this value?
   fprintf('True population mean of Iris setosa sepal length ');
   if ttest(sLenSetosa, testMean) == 1
       fprintf('is likely to be different from %g\n', testMean);
   else
       fprintf('could be %g\n', testMean);
   end;
Estimated population mean is 5.006
True population mean of Iris setosa sepal length is likely to be different from 5.936

Questions Answers
What is the purpose of the ttest function? The ttest uses knowledge about a sample to determine whether a population mean is likely to be different from a specified value. The sample should be drawn at random from a normally distributed population.
What should I conclude if this ttest returns 1? You can conclude that the true mean sepal length of Iris setosa is unlikely to be 5.936.
What should I conclude if this ttest returns 0? Based on the evidence provided by this sample, it is possible that the true mean sepal length of Iris setosa is 5.936. However, you do not have evidence to make a definitive conclusion. (Presume innocent until proven guilty!)
Does ttest make any assumptions about the distribution of the population? Yes, ttest assumes that the population is normally distributed.
When should I use ttest? Use ttest when you want to know whether a sample comes from population that has a different mean than a specified value.
Should I apply the ttest to the sample mean? No, this a point of confusion. The test function requires the entire sample not just the sample mean.


EXAMPLE 3: Apply one sample t-test (standard statistial terminology)

   testMean = 5.936;    % Could the true mean sepal length be this value?
   fprintf(['\nNull hypothesis: The true mean Iris setosa sepal length ' ...
            ' is %g\n'], testMean);
   fprintf(['Alt hypothesis: The true mean Iris setosa sepal length ' ...
           'is different from %g\n'], testMean);
   if ttest(sLenSetosa, testMean) == 1
       fprintf('\tReject null hypothesis in favor of alternative ');
   else
       fprintf('\tCannot reject the null hypothesis ');
   end;
   fprintf('at the 0.05 significance level\n');
Null hypothesis: The true mean Iris setosa sepal length  is 5.936
Alt hypothesis: The true mean Iris setosa sepal length is different from 5.936
	Reject null hypothesis in favor of alternative at the 0.05 significance level

Questions Answers
What is the null hypothesis for this ttest? The null hypothesis is that the true mean sepal length for Iris setosa is 5.936.
What is the alternative hypothesis for this ttest? The alternative hypothesis is that the true mean sepal length for Iris setosa is not 5.936.
Why bother with the strange null hypothesis terminology? Many researchers report the results of standard statistical tests in terms of the null hypothesis, p-values, and confidence intervals. In order to understand these results, you must learn to work with the terminology.
What does a ttest return value of 1 mean in terms of the null hypothesis? A return value of 1 indicates that you should reject the null hypothesis in favor of the alternative hypothesis. In other words, you should conclude that the true mean sepal length of Iris setosa is unlikely to be 5.936.
How unlikely? The default level of significance for ttest is 5%, meaning that less than 5% of the samples randomly selected from a normal distribution with mean 5.936 would have this statistical characteristic. Roughly, you can think of this as being "95% certain" that the true mean sepal length for Iris setosa is not 5.936.
How is the value of h related to the level of significance? The level of significance is a threshold set by you. By default the ttest uses a 5% significance level corresponding to an alpha of 0.05. A test statistic for your sample is compared with the results from a normal distribution with mean 5.936. If fewer than 5% of the samples from this normal distribution have a test statistic more extreme than that of your sample, the ttest returns a value of 1. Otherwise, the ttest,/code>returns a value of 0.
What does a ttest return value of 0 mean in terms of the null hypothesis? A return value of 0 indicates that you cannot reject the null hypothesis. This does not mean that the null hypothesis is true, just that you don't have enough information on which to make an informed decision.
Is it necessary to state null and alternative hypotheses when applying the ttest? No, you can call the ttest function without explicitly stating these hypotheses.


EXAMPLE 4: Test for inequality using the one sample t-test

   testMean = 5.936;    % Could the true mean sepal length be this value?
   fprintf('\nTrue population mean of Iris setosa sepal length ');
   if ttest(sLenSetosa, testMean, 0.05, 'left') == 1
       fprintf('is likely to be less than ')
   else
       fprintf('could be be greater than or equal to ');
   end;
   fprintf('%g\n', testMean);
True population mean of Iris setosa sepal length is likely to be less than 5.936

Questions Answers
Can I use the ttest to determine whether the mean of the population corresponding to sample x is greater than 5.2? Yes, the one-sided test: ttest(x, 5.2, 0.05, 'right') has the alternative hypothesis: "The mean of the population represented by sample x is greater than 5.2".
Could I have asked for a more statistically significant answer?

Yes, the ttest function has a third argument, alpha, that specifies the significance level. When omitted, this argument is assumed to have a value of 0.05, meaning the 5% significance level.

For example, ttest(x, 5.2, 0.01) tests whether the true mean of the populations corresponding to sample x is 5.2 at the 1% significance level. If the call to ttest returns a value of 1, you can conclude there is less than a 1% probability that the true population mean is 5.936. (The actual interpretation is that fewer than 1% of the samples would have the observed statistic if the true mean were actually 5.936.)

What is the difference between the value of alpha and the significance level? The value of alpha is always a fraction between 0 and 1. The significance level is the corresponding percentage. For example an alpha of 0.05 corresponds to a 5% significance level, while an alpha of 0.01 corresponds to a 1% significance level. We use the two interchangeability, noting that MATLAB requires a fraction rather than a percentage when specifying significance.


EXAMPLE 5: Look at the p-value and the confidence interval

   testMean = 5.936;    % Could the true mean sepal length be this value?
   fprintf( ...
      '\nIs the true population mean of Iris setosa different from %g?\n', ...
       testMean);
   [h, p, ci] = ttest(sLenSetosa, testMean);
   fprintf('\t hypothesis = %g\n', h);  % Truth of alternative hypothesis
   fprintf('\t pvalue = %g\n', p); % Lower value indicates more support for alt hyp
   fprintf('\t 95%% confidence interval for population mean: [%g, %g]\n', ci);
Is the true population mean of Iris setosa different from 5.936?
	 hypothesis = 1
	 pvalue = 6.6085e-24
	 95% confidence interval for population mean: [4.90582, 5.10618]

Questions Answers
What does the pvalue tell me? The pvalue gives the probability that a sample drawn at random from the true population could have produced the observed test statistic if the null hypothesis were actually true. In other words, the pvalue is a measure of how probable the observed differences could be due to a bad draw of the random sample. The pvalue in this example is miniscule, so it is very unlikely that such a sample could have been drawn by chance from a distribution whose mean was actually 5.936.
How is the ttest return value related to the pvalue? The ttest has a cut-off value called the level of significance or alpha. If the pvalue is less than alpha, then the ttest return value is 1.
How should I interpret ci? The ci holds the confidence interval for this test. 95% of the samples will have a confidence interval that contains the actual population mean. Since ci is [4.90582, 5.10618], you can be 95% "sure" that this confidence interval holds the true mean of the Iris setosa. In other words, you can be 95% sure that the true mean is in [4.90582, 5.10618].
What is h? The h indicates whether to reject the null hypothesis in favor of the alternative. If h is 1, reject the null hypothesis. If h is 0, you don't have sufficient evidence to reject the null hypothesis.
How is h related to the ttest return value of EXAMPLE 2, 3 and 4? The h value corresponds to the ttest return value of these examples.
How is the value of h to the? The h value corresponds to the ttest return value of these examples.
How should I interpret ci? The ci holds the confidence interval for this test. Since ci is [4.90582, 5.10618], you can be 95% confident that the true mean of the Iris setosa is in [4.90582, 5.10618].


EXAMPLE 6: Load the Daphne Island and Santa Cruz Island beak size data

    Daphne = load('DaphneIsland.txt');
    SantaCruz = load('SantaCruzIsland.txt');


EXAMPLE 7: Are Daphne finch beak sizes different from those of Santa Cruz finches?

   fprintf(['\nNull hypothesis: The true mean beak sizes of Daphne and ' ...
            'Santa Cruz finches are equal\n']);
   fprintf(['Alt hypothesis: The true mean beak size of Daphne finches ' ...
            'is different from the true mean beak size of Santa Cruz finches\n']);
   if ttest2(Daphne, SantaCruz) == 1
       fprintf('\tReject null hypothesis in favor of alternative ');
   else
       fprintf('\tCannot reject the null hypothesis ');
   end;
   fprintf('at the 0.05 significance level\n');
Null hypothesis: The true mean beak sizes of Daphne and Santa Cruz finches are equal
Alt hypothesis: The true mean beak size of Daphne finches is different from the true mean beak size of Santa Cruz finches
	Reject null hypothesis in favor of alternative at the 0.05 significance level

Questions Answers
What is the purpose of the ttest2 function? The ttest2 determines whether the means of two distinct populations are likely to be different based on a random sample from each population.
When would I use ttest rather than ttest2? Use the ttest function when you have a particular mean value in mind and want to determine whether the mean of a single population is likely to be different from that value. Use the ttest2 when you want to determine whether the means of two populations are likely to be different.
Should I apply the ttest2 to the sample means? No, this a point of confusion. The test2 function requires the sample values, not just their respective means.
How should I interpret a ttest2 return value of 1? A ttest return value of 1 indicates that the true means of the sepal lengths for Iris setosa and Iris virginica are likely to be different.
How likely is likely? The default level of significance for ttest2 is 5% (alpha is 0.05), meaning that there is less than a 5% probability that you would observe the test statistic if the means of the two species were the same. Roughly speaking, you can be "95% sure" that the two species have different mean sepal lengths.
What if the test2 returns 0? A ttest2 return value of 0 means the test has not provided evidence that the means are different. You cannot then conclude that the means are the same. (Innocent until proven guilty again!)
Does ttest2 make any assumptions about the distribution of the population? Yes, ttest2 assumes that the samples were drawn at random from normally distributed populations.
Can I get a more statistically significant answer? Yes, the ttest2 function has a third argument, alpha, that specifies the significance level. When omitted, this argument is assumed to be 0.05, meaning the 5% significance level. The call ttest2(x, y, 0.01) tests whether the true means of the populations corresponding to samples x and y, respectively, are different at the 1% significance level. If ttest2 returns a value of 1, you can conclude there is less than a 1% probability that the true means of the populations are the same.
What is the difference between the value of alpha and the significance level? The value of alpha is always a fraction between 0 and 1. The significance level is the corresponding percentage. For example an alpha of 0.05 corresponds to a 5% significance level, while an alpha of 0.01 corresponds to a 1% significance level. The two terms are often used interchangeably. Note: the MATLAB functions always requires a fraction rather than a percentage when specifying significance.


EXAMPLE 8: Look at the p-value and confidence interval for the two-sample t-test

   fprintf( ...
      '\nIs the true population mean of Daphne finches different from SantaCruz?\n');
   [h, p, ci] = ttest2(Daphne, SantaCruz);
   fprintf('\t hypothesis = %g\n', h);  % Truth of alternative hypothesis
   fprintf('\t pvalue = %g\n', p); % Lower value indicates more support for alt hyp
   fprintf('\t 95%% confidence interval for difference of population means: ');
   fprintf('[%g, %g]\n', ci);
Is the true population mean of Daphne finches different from SantaCruz?
	 hypothesis = 1
	 pvalue = 2.77109e-11
	 95% confidence interval for difference of population means: [-1.4464, -0.795025]

Questions Answers
What is the pvalue? The pvalue is the probability that your sample's test statistic would have been observed if the populations were actually the same. A smaller pvalue lends more support to the population means being different.
What is the relationship between the pvalue and the significance level? The signficance level is a threshold set by the user. If the probability your sample's test statistic would have been observed if the means were actually equal is less than the significance level, ttest2 returns an h value of 1. The pvalue is the actual probability that your sample's test statistic would have been observed if the means were actually equal. In another words, the pvalue reflects how likely your sample was due to a bad draw if the means were actually equal.
How should I interpret ci? The ci holds the confidence interval for the test. The confidence interval provides an estimate of the difference between the two population means. Since ci is [-1.4464, -0.7950], you can be 95% confident that the mean beak size of Daphne finches is between 0.7950 and 1.4464 mm bigger than the mean beak size of Santa Cruz finches. More technically, 95% of the samples will produce a confidence interval that contains the difference of the populations. Thus, you can be 95% certain that the difference in the means, (Daphne - SantaCruz), is in [-1.4464, -0.7950].
Can I use ttest2 to find whether the mean of the population corresponding to sample A is greater than the mean of the population corresponding to sample B? Yes, the following one-sided call: ttest2(A, B, 0.05, 'right') has an alternative hypothesis "The mean of the population represented by sample A is greater than the mean of the population represented by sample B".


EXAMPLE 9: Load the consolidated sleep diary data

    load diaries.mat;  % Load the sleep diaries


EXAMPLE 10: Calculate wake-up hours, separated by gender

   wakeupHours = (wakeTimes - floor(wakeTimes))*24; % Get fractional part of wakeTimes
   men = strcmp('male', gender);           % 1's where gender is 'male'
   mensWHours = wakeupHours(:, men);      % Pick columns corresponding to men
   women = ~men;                          % 1's where gender is 'female'
   womensWHours = wakeupHours(:, women);   % Pick columns corresponding to women
   numSubjects = size(wakeupHours, 2);     % Also need number of subjects


EXAMPLE 11: Do men get up earlier than women on average?

    fprintf('\nThe average wake-up time for men ');
    [h, p, ci] = ttest2(mensWHours(:), womensWHours(:));
    if h == 1
        fprintf('is likely to be different from that of women\n');
    else
        fprintf('could be the same as that of women\n');
    end;
    fprintf('\t hypothesis = %g\n', h);  % Truth of alternative hypothesis
    fprintf('\t pvalue = %g\n', p);
    fprintf('\t 95%% confidence interval for difference = [%g, %g]\n', ci);
The average wake-up time for men is likely to be different from that of women
	 hypothesis = 1
	 pvalue = 0.0114848
	 95% confidence interval for difference = [-0.421104, -0.0533088]


EXAMPLE 12: Compare average wake-up time of instructor to that of a random student

    randStudent = randi(numSubjects - 1, 1, 1) + 1; % Pick a random student
    fprintf('\nThe average wake-up time of the instructor ');
    if ttest2(wakeupHours(:, 1), wakeupHours(:, randStudent)) == 1
        fprintf('is likely to be different than ');
    else
        fprintf('could be the same as ');
    end;
    fprintf('the average wake-up time for subject %d\n', randStudent);
The average wake-up time of the instructor is likely to be different than the average wake-up time for subject 74

Questions Answers
What does the first argument of randi represent? The randi function produces "random" integers 1 and the value of the first argument.
Why use the number of subjects - 1 to pick a random student? The cohort consists of 143 students and 1 instructor. The instructor is subject 1. Pick a random student by picking a random integer between 1 and 143 and then add one to i.
What do the second and arguments of randi represent? The randi function produces an array of "random" integers. The second and third arguments give the number of rows and columns of this array. This example only required one value, so both arguments were 1.


EXAMPLE 13: Output IDs of the subjects whose average wake-up is similar instructor's

    averWakeup = mean(wakeupHours);
    fprintf(['\nThe following students have average wake-up times ' ...
            'indistinguishable from instructor''s (%g):\n'], averWakeup(1));
    for k = 2:numSubjects
       if ttest2(wakeupHours(:, 1), wakeupHours(:, k)) == 0
          fprintf('\t Subject %g''s average wake-up time = %g\n', k, averWakeup(k));
       end;
    end;
The following students have average wake-up times indistinguishable from instructor's (6.38685):
	 Subject 6's average wake-up time = 6.44752
	 Subject 10's average wake-up time = 6.26992
	 Subject 13's average wake-up time = 6.95582
	 Subject 32's average wake-up time = 5.87253
	 Subject 37's average wake-up time = 6.963
	 Subject 47's average wake-up time = 7.18662
	 Subject 50's average wake-up time = 7.03739
	 Subject 56's average wake-up time = 6.71099
	 Subject 60's average wake-up time = 7.12994
	 Subject 71's average wake-up time = 7.87514
	 Subject 75's average wake-up time = 6.92933
	 Subject 79's average wake-up time = 7.06183
	 Subject 81's average wake-up time = 7.9929
	 Subject 82's average wake-up time = 7.30281
	 Subject 91's average wake-up time = 5.50243
	 Subject 100's average wake-up time = 6.38685
	 Subject 102's average wake-up time = 6.14901
	 Subject 111's average wake-up time = 7.35553
	 Subject 114's average wake-up time = 6.30092
	 Subject 129's average wake-up time = 6.63787
	 Subject 133's average wake-up time = 7.05643
	 Subject 136's average wake-up time = 7.09694
	 Subject 138's average wake-up time = 6.90769
	 Subject 140's average wake-up time = 5.68481


_This lesson was written by Kay A. Robbins of the University of Texas at San Antonio and last modified on 31-Dec-2010. Please contact krobbins@cs.utsa.edu with comments or suggestions. The photo is of Sir Ronald Fisher, founder of modern statistics and namesake of the Fisher Iris dataset. (See http://en.wikipedia.org/wiki/File:R._A._Fischer.jpg._