echo on % *********************************************************************** % INTRODUCTION TO MATLAB % *********************************************************************** % Matlab is a system for doing numerical and graphical computation. % It was initially written to provide a interface for the LINPACK % and EISPACK linear algebra packages. It has evolved over the years % to become one of the premier high level languages for doing % mathematical computation. % The goal of this tutorial to describe some basic features of the % Matlab language with particular emphasis on its abilities in the % areas of data manipulation, data analysis, and graphing. %*********************************************************************** % NOTE %----------------------------------------------------------------------- % WHENEVER YOU SEE THE WORD PAUSE THE M-FILE WILL PAUSE % PRESS ANY KEY TO CONTINUE. %*********************************************************************** pause %------------------------------------------------------------------------ % MATRIX %------------------------------------------------------------------------ % The basic structure in Matlab is a matrix. Let's look at a dataset % that is discussed in Chapters 3 and 4. For 30 students in a statistics % class, we record the grade (A, B, C, D, F) and the score on the SAT % exam that was taken during high school. % We represent this data by a matrix named 'class' that consists of 30 rows % and 2 columns -- the first column contains the grades (coded 1 = F, 2 = D, % 3 = C, 4 = B, 5 = A) and the second column contains the SAT scores. % We use square brackets are used to describe the matrix. Carriage % returns to describe different rows. Alternatively, a semicolon % can be used to separate rows. Thee command is concluded with % a semicolon. echo off class=[ 3 525 2 533 3 545 4 582 2 581 1 576 3 572 4 609 2 559 1 543 3 576 4 525 0 574 1 582 2 574 3 471 3 595 2 557 4 557 4 584 3 599 2 517 4 649 2 584 1 463 3 591 2 488 3 563 3 553 4 549 ]; echo on pause % We display 'class' by typing it without a semicolon at the end. class pause %------------------------------------------------------------------------ % MATRIX MANIPULATIONS %------------------------------------------------------------------------ % Let's illustrate some basic manipulations of this data matrix. % We define a list of numbers by a colon: 1:5 pause % We look at the data for students 1-5 by use of the command class(1:5,:) pause % Here we are extracting rows 1-5 and all columns (indicated by a :). % Similarly, suppose we wish to extract the first column of the % data matrix and assign it a column vector called 'grade'. We extract % the second column of 'class' and assign it to a variable called 'sat'. grade=class(:,1); sat=class(:,2); pause % Note that we can have multiple statements on a line -- we separate the % statements by ; (or ,) % We can perform various computations on vectors and matrices. % Suppose that we wish to scale the grades by multiplying by 10 and adding 5 % -- we'll call the new grade 'new_grade': new_grade=10*grade+5 pause % Suppose that we wish to square each grade -- we do this by the .^ operator. grade.^2 pause % grade is currently a column vector, we can convert it to a row vector by % the transpose (') operator: grade' pause % Suppose that we wish to add 'new_grade' to 'sat'. The addition (+) % operator works for matrices if the dimensions line up. new_grade+sat pause % Matlab also supports logical operators. % Suppose that we want to define a passing grade -- if grade is 3 or higher, % then pass = 1, otherwise pass = 0. We define this using >= pass=(grade>=3) pause % We want to define a variable 'C', which is equal to 1 if the grade is 3 % and 0 otherwise. We use the logical equal (==) operator: C=(grade==3) pause %------------------------------------------------------------------------ % DATA ANALYSIS COMMANDS %------------------------------------------------------------------------ % Matlab supports many data analysis operations. The ones described below % are intrinsic to Matlab. Many others are available in the Statistics % Toolbox. % The 'length' command gives the number of entries in a vector: length(grade) pause % The commands below find the mean and standard deviation of the column % vector 'sat': mean(sat), std(sat) pause % Here is an alternative way of computing a standard deviation -- it % illustrates the 'sum' and 'sqrt' commands: sqrt(sum((sat-mean(sat)).^2/(length(grade)-1))) pause % Suppose we wish to tally the different grades in the vector 'grade'. % This can be done using the Matlab 'hist' function (the graphical version % of this command will be illustrated later). freq=hist(grade,0:4) % We see that there is 1 F, 4 D's, 8 C's, 10 B's, and 7 A's. pause % We can find the number of D's by use of a == logical operator and a 'sum': num_D=sum(grade==1) pause % Likewise, the proportion of A's is given by prop_A=sum(grade==4)/length(grade) pause %------------------------------------------------------------------------ % GRAPHING COMMANDS %------------------------------------------------------------------------ % Matlab has a large number of graphing commands. Here we illustrate some % basic Matlab commands that are helpful in data analysis. % A histogram of the SAT scores is constructed by the 'hist' command. hist(sat) pause % Suppose that we wish to use bin midpoints of 460, 480, ..., 660. We first % define a variable, called 'bins', which contains 460 to 660 in steps of 20: bins=460:20:600; pause % The histogram using these bin midpoints is given by hist(sat,bins) pause % Suppose that we wish to construct a scatterplot of 'grade' (vertical axis) % against 'sat' (horizontal axis). We use the 'plot' command. The horizontal % variable goes first, and we indicate a plotting symbol at the end. plot(sat,grade,'o') pause % We label this plot by the 'xlabel', 'ylabel', 'title' commands: xlabel('SAT'); ylabel('GRADE'); title('SCATTERPLOT OF GRADE AGAINST SAT') pause % We look at this graph -- we see a slight positive relationship between % SAT and GRADE. This motivates us to fit a line. % We first form a regression matrix. The matrix X contains two columns - the first % column is a ones vector (called ones(30,1)) and the second column are the sat scores. X=[ones(30,1) sat]; pause % A least-squares fit of SAT on GRADE is found in Matlab using a matrix divide notation -- % the column vector 'b' contains the regression coefficients: b=X\grade pause % Let's put this least-squares fit on the graph. We define points on the line: xpt=[450 750]; ypt=b(1)+b(2)*xpt; pause % The 'line' command will connect the points and put the line on the current plot: line(xpt,ypt) pause %------------------------------------------------------------------------ % WORKING WITH SIMULATED DATA %------------------------------------------------------------------------ % Let's review some of the above commands and introduce some new ones by % doing some data analysis on simulated data. % We simulate numbers from a standard normal distribution. We create a % matrix of 10 rows and 100 columns and assign it to the matrix 'rand_norm': rand_norm=randn(10,100); pause % Let's construct a histogram of the first row of this matrix -- it should % look bell shaped and centered about the value 0. hist(rand_norm(1,:)) pause % Another property of these simulated values is that they are uncorrelated. % To show this, we plot the entire stream of simulated values. We collapse % the matrix 'rand_norm' to a column vector 'rand_c' (using the colon % operation and then plot the elements of 'rand_c' as a line graph. rand_c=rand_norm(:); plot(rand_c) % Note that there is no pattern or trend in the graph which indicates that % they are uncorrelated. pause % Next, let's explore characteristics of sample means of normal samples. % The following command finds the mean of each column of 'rand_norm' -- the % resulting row vector is stored in 'means': means=mean(rand_norm); pause % The values in 'means' are normally distributed with mean 0 and standard % deviation sqrt(1/10) -- let's display the distribution. hist(means) pause % We can use the random matrix 'rand_norm' to illustrate other functions of % normals. First we'll square each element of the matrix (by the .^ operator) % and put the result in 'chisqs' -- the elements are independent % chi-square(1): chisqs=rand_norm.^2; pause % If we sum across rows, we'll get a random sample from a chi square % distribution with 10 degrees of freedom. chisq10=sum(chisqs); pause % We'll display a histogram of these values. hist(chisq10) pause % Looking at a chi-square table, we see that the upper 10 percentage point % of a chi-square (10) is 14.68. So we expect that roughly 10 percent of % our random chi-square(10)'s are larger than 14.68. Let's check: sum(chisq10>14.68) % This is the number of simulated values larger than 14.68 out of 100. % Hopefully this is about 10 percent. echo off