LESSON 19: Input and output

FOCUS QUESTION: How can I save and retrieve my data?

Data from the real world is often complicated by messy formats, missing data, and input errors. This lesson uses the MATLAB data import tools to read and manipulate real data. The lesson also introduces the MATLAB vectorized logical operations for manipulating data.

In this lesson you will:
  • Read and write data files using formatted I/O.
  • Create and test directories.
  • Remove an identifying header.
  • Use the isnan function to test rows for invalid numerical values.
  • Work with realistic data.
5.25 floppy diskette

Contents


DATA FOR THIS LESSON

File Description
diaries.mat contains the extracted, cleaned, and consolidated sleep diary data in MATLAB variables. The arrays have a column for each person. The vectors have an element for each person. The values in column n correspond to the same person as the value in position n of each vector. The file contains the following variables:
  • bedTimes - array of bed times in decimal-date format.
  • dayCaffeine - array of daytime caffeine indicators.
  • gender - vector of male/female gender designators.
  • nightCaffeine - array of evening caffeine indicators.
  • section - vector of section indicators. The possible section numbers are 0, 1, 3 and 4. Section 0 contains only a single instructor. The remaining section numbers correspond to the actual course section numbers.
  • toSleepMinutes - an array of number of minutes to fall asleep.
  • useAlarm - array of alarm use indicators.
  • wakeTimes - array of wakeup times in decimal-date format.
Doe_Jack_male.csv contains a sample sleep diary. The third entry in the wake-up to alarm field is a NaN.
Doe_Jane_female.csv contains a sample sleep diary. The third entry in the wake-up to alarm field is empty.

SETUP FOR LESSON 19



EXAMPLE 1: Load the sleep diary data and output the number of rows and columns

Create a new cell in which you type and execute:

    load diaries.mat;  % Load the sleep diaries
    [numDays, numDiaries] = size(bedTimes);  % How many rows and columns?
    fprintf('\n\n%g Sleep diaries kept for %g days\n', numDiaries, numDays);

You should see bedTimes, dayCaffeine, gender, nightCaffeine, section, toSleepMinutes, useAlarm, wakeTimes, numDays, and numDiaries variables in your Workspace. You should also see the following output in the Command Window:


144 Sleep diaries kept for 21 days


EXAMPLE 2: Create a diaries directory in the current directory

Create a new cell in which you type and execute:

    directoryName = './diaries';
    if ~isdir(directoryName)
       mkdir(directoryName);
    else
       fprintf('The %s directory already exists\n', directoryName);
    end;

You should see a directoryName variable in your Workspace Browser. The first time you execute this cell, a diaries directory will appear in your Current Directory. Subsequent executions of this cell produce the following output in the Command Window:

The ./diaries directory already exists


EXAMPLE 3: Write the individual sleep diaries as tab-delimited text files

Create a new cell in which you type and execute:

    sleepHours = (wakeTimes - bedTimes)*24;  % Also include hours sleep
    for k = 1:numDiaries                     % For each subject
        thisFile = ['./diaries/subject' num2str(k), '.txt'];
        fid = fopen(thisFile, 'w');          % Create a file
        fprintf(fid, '%g subject is a %s from section %g\n\n', ...
            k, gender{k}, section(k));       % Write gender and section infp
        fprintf(fid, '%20s\t%20s\t%s\t%s\t%s\t%s\t%s\n', ...
            'Bed-time', 'Wake-time', '2Sleep', 'Hours', 'Alarm', 'Day', 'Night');
        for j=1:numDays                      % For each day
           fprintf(fid, '%s\t %s\t %d\t %6.3g\t %d\t%d\t%d\n', ...
                datestr(bedTimes(j, k), 0), datestr(wakeTimes(j, k), 0), ...
                toSleepMinutes(j, k), sleepHours(j, k), ...
                useAlarm(j, k), dayCaffeine(j, k), nightCaffeine(j, k));
        end;                                % Done writing days for this subject
        fclose(fid);
    end;                                    % Done with all of the subjects

You should see sleepHours, thisFile, fid, k, and j variables in your Workspace Browser. You should also see an individual sleep diary file for each subject in the diaries subdirectory.


EXAMPLE 4: Catch the error when trying to open a bad file

Create a new cell in which you type and execute:

   badFile = 'BadName.csv';
   try
      fid = fopen(badFile);   % Trying to open a non-existent file
      fclose(fid);
   catch theError;
      fprintf('%s for file %s\n', theError.identifier, badFile);
   end;

You should see badFile and fid variables in the Workspace Browser and the following output in your Command Window. The try-|catch| allows you to handle errors without terminating your script.

MATLAB:FileIO:InvalidFid for file BadName.csv


EXAMPLE 5: Read a sleep diary containing a NaN

Create a new cell in which you type and execute:

   fName = 'Doe_Jack_male.csv';    % File is in the current directory
   fid = fopen(fName);   % The fid is a handle to the open file
   formatString = '%s %s %n %s %n %n %n'; % Assume 7 items on each line
   dataJohn = textscan(fid, formatString, 'HeaderLines', 1, 'Delimiter', ',');
   fclose(fid);

You should see fName, fid, formatString, dataJohn variables in the Workspace Browser.


EXAMPLE 6: Read a sleep diary containing an emtpy field

Create a new cell in which you type and execute:

   fName = 'Doe_Jane_female.csv';    % File is in the current directory
   fid = fopen(fName);   % The fid is a handle to the open file
   formatString = '%s %s %n %s %n %n %n'; % Assume 7 items on each line
   dataJane = textscan(fid, formatString, 'HeaderLines', 1, ...
        'Delimiter', ',');
   fclose(fid);

You should see fName, fid, formatString, dataJane variables in the Workspace Browser.


EXAMPLE 7: Check myAlarm is valid

Create a new cell in which you type and execute:

   myAlarm = dataJane{5};    % myAlarm for this diary is in 5th column of data;
   if sum(isnan(myAlarm))
       fprintf('Alarm field has missing or NaN values\n');
   end;
   if sum(myAlarm ~= 0 & myAlarm ~= 1)
       fprintf('Alarm field is not all 0''s or 1''s\n');
   end;

You should see a myAlarm variable in the Workspace Browser and the following output.

Alarm field has missing or NaN values
Alarm field is not all 0's or 1's


EXAMPLE 8: Output the invalid positions in myAlarm

Create a new cell in which you type and execute:

   myAlarm = dataJane{5};    % myAlarm for this diary is in 5th column of data;
   nanPositions = find(isnan(myAlarm));  % Find the positions of NaN's
   if ~isempty(nanPositions)
       fprintf('Alarm field has NaN''s in positions ');
       fprintf(' %g', nanPositions);
       fprintf('\n');
   end;
   badPositions = find(myAlarm ~= 0 & myAlarm ~= 1);
   if ~isempty(badPositions)
       fprintf('Alarm field has bad values in positions ');
       fprintf(' %g', badPositions);
       fprintf('\n');
   end;

You should see myAlarm, nanPositions, and badPositions variables in the Workspace Browser and the following output.

Alarm field has NaN's in positions  3
Alarm field has bad values in positions  3 8 14


EXAMPLE 9: Count the lines in a sleep diary using low-level MATLAB I/O

Create a new cell in which you type and execute:

   fName = 'Doe_Jane_female.csv';    % File is in the current directory
   fid = fopen(fName);   % The fid is a handle to the open file
   lineCount = 0;
   tLine = fgetl(fid);   % Read the first line
   while (tLine ~= -1)   % End of file when  tLine is -1
        lineCount = lineCount + 1;    % Successfully read a line
        fprintf('%g: %s\n', lineCount, tLine);  %Output it
        tLine = fgetl(fid);   % Get another line
   end;
   fclose(fid);          % Must close the file when done

You should see fName, fid, lineCount, and tLine variables in your workspace and the following output in the Command Window:

1: Wake-up Date,Bed-time       (24-hour time),Minutes to fall asleep,Wake-up time           (24 -hour time),"Alarm wake-up?    (1 = Yes, 0 = No)","Daytime caffeine? (1 = Yes, 0 = No)","Evening caffeine? (1 = Yes, 0 = No)"
2: 9/23/2009,22:20,10,5:49,0,1,0
3: 9/24/2009,22:30,120,6:00,1,1,0
4: 9/25/2009,22:00,5,4:30,,1,0
5: 9/26/2009,22:00,5,7:30,0,1,0
6: 9/27/2009,0:58,10,7:18,0,1,0
7: 9/28/2009,23:00,10,6:00,1,1,0
8: 9/29/2009,22:10,10,6:00,1,1,0
9: 9/30/2009,22:50,10,6:00,9,1,0
10: 10/1/2009,21:38,40,6:00,0,1,0
11: 10/2/2009,23:00,10,6:00,1,1,0
12: 10/3/2009,22:00,10,7:18,0,1,0
13: 10/4/2009,22:00,10,6:00,0,1,0
14: 10/5/2009,21:50,10,5:00,0,1,0
15: 10/6/2009,22:30,60,6:00,2,1,0
16: 10/7/2009,22:40,10,6:00,1,1,0
17: 10/8/2009,22:25,10,6:00,1,1,0
18: 10/9/2009,22:35,10,6:00,1,1,0
19: 10/10/2009,23:00,5,7:41,0,1,0
20: 10/11/2009,22:00,5,7:30,0,1,0
21: 10/12/2009,22:30,5,6:00,1,1,0
22: 10/13/2009,0:15,5,6:15,0,1,0


SUMMARY OF SYNTAX

MATLAB syntax Description
fclose(fid) Closes the file represented by the handle fid, causing system resources to be released and allowing other programs to access the file.
tline = fgetl(fid) Returns the next line of the open file represented by fid after discarding newline characters.
Y = find(X) Returns the indices of the nonzero elements of X.
fid = fopen(filename) Readies the file represented by filename for reading (opens the file) and returns a handle for future operations. If the fid handle is -1, MATLAB could not successfully open the file..
fid = fopen(filename, 'w') Readies the file represented by filename for writing (opens the file) and returns a handle for future operations. If the fid handle is -1, MATLAB could not successfully open the file..
Y = isnan(X) Returns a logical array that is the same size as X. The array Y has 1's (true) where the corresponding elements of X are NaN and 0's elsewhere.
isdir(filename) Returns 1 (true) if filename represents a directory and 0 (false) otherwise.
mkdir(directoryName) Creates a directory with the name directoryName.
data = textscan(fid, formatString, 'HeaderLines', 1, 'Delimiter', ',') Returns a cell array containing the result of a formatted read. The formatString specifies the format. This example specifies the first line should be treated as a header and ignored. The values of the file are treated as comma-delimited.
The while loop:
while (expression)
statements to execute each time
end;
Repeatedly execute statements as long as expression is true (non-zero).


This lesson was written by Kay A. Robbins of the University of Texas at San Antonio and last modified on 31-Dec-2010. Please contact krobbins@cs.utsa.edu with comments or suggestions. The image is of a 5.25" floppy diskette taken by Andreas Frank on July 29, 2005 and available at http://commons.wikimedia.org/wiki/File:5.25%22-Diskette.jpg.