Here's one more statistic that we can use to identify relationships between variables. This is called the Pearson correlation coefficient. Now, the formula presented here includes lots of information that you are now familiar with. The n in this case represents the number of observations in your sample. The x and y represent two different variables, x and y could be height and weight or something like this. All right, now x bar represents the sample mean for the x variable and y bar represents the sample mean for the y variable. The s sub x and the s sub y represent the sample standard deviations of the x and the y variable respectively. So if you take the product of your x minus x bar and your y minus y bar and you sum them over x and over y, divided by the product of standard deviations for x and y, and then multiply that times 1 over the number of observations minus 1, you will get the Pearson r. Now, the Pearson r is a number between zero and one. If the number is between 0.5 and 1 or -0.5 and -1, you have a relatively high correlation. A medium correlation would show a statistic of 0.3 to 0.5 or -0.3 to -0.5, all right, that's a medium correlation. A low correlation is 0.1 to 0.3 or a -0.1 to -0.3. Let me show you how to calculate this Pearson correlation coefficient in Excel. Here we have some data, and it's for a basketball player, Karl Malone. And he played with Utah for a number of years until moving to Los Angeles for the last year. He was a very good player, he's a Hall of Famer and so I've grabbed some statistics from him. Now, what we're going to do is we're going to look at some correlations between a couple of his different statistics. We'll first do one correlation or a correlation between two variables and then we can look at correlations between multiple variables just to show you how these things work in concert. So I'm going to highlight his points and his field goal percentage, okay? These are these last two columns AE and columns AF. And then what I'm going to do is I'm going to use the Data tab in the top of Excel, and I'm going to click on Data Analysis. Now, I've already highlighted Correlation here. There's a number of different tools we can use in this Data Analysis tab, but here, I've got Correlation. So now, when I've highlighted these columns and I click on Correlation, it will allow me to identify the import range, right? And make sure that I have my labels in my first row, so it knows what's going on. And now, it's going to out port, if you will, some of the correlation coefficient in a new worksheet, here. So let's make this a little bit larger and I'll tell you what's going on. So here we have two different variables, an x and a y, points and field goal percentage. And here underneath this, we've got on the rows I labeled points and field goal percentage and the columns are labeled points and field goal percentage. So the correlation coefficient matches up the row with the column. Now of course, this makes complete sense, but points are perfectly correlated to points, of course, they should be. Field goals are perfectly correlated to field goals, of course, they should be. Now, look here, this says that field goal percentage and points are correlated at 0.684, so this is a high correlation. So it means that there is some statistical evidence that these two variables move together in a positive way. So the higher Karl Malone's field goal percentage is, the higher his points are likely to be. Or when Karl Malone, during a season, has scored more points, it's also likely that his field goal percentage was also very high. Now, one of the things that we should note here is that correlation does not mean causation. So we cannot say that as the result of a high field goal percentage he had lots of points. Or as a result of lots of points, he had a high field goal percentage. We can draw inferences. So in the years where he had low field goal percentage, he had low number of points. In the years where he had high number of field goal percentage, he had high number of points. But this doesn't imply causation. Let's look at a number of variables using his statistics and look at a much larger correlation coefficient table. So we've got some offensive statistics for Karl Malone here towards the end of this chart. We have offensive rebounds, ORB, defensive rebounds, DRB, total rebounds, assists per game, steals, blocks, okay, we have free throw percentage. So I click on my correlation coefficient and then I have to click on this word Import Range, here. I'm going to delete the current import range and I'm going to identify also free throw percentage all the way through, let's say, total blocks, right here. I'm going to highlight this whole range, all right, and it's grouped by columns and I've got my labels and first row here and I'm going to hit OK. Now, let's make this a little bit larger and try to evaluate what's going on here, okay? So, free throw percentage is obviously perfectly correlated with free throw percentage. And we'll notice that we've got this diagonal of ones that identifies that the row and the column of the same variable are perfectly correlated, as they should. Now, some things are more highly correlated. Some of them are going to be negative correlated. Some of them are much larger and some of them are much smaller, right? So if we were to look at my rebounds, so my defensive rebounds are correlated with my offensive rebounds at about 0.5. So what this means is that there's a medium correlation between the number of rebounds that I get when I'm playing defense and the number of rebounds that I'm getting when I'm playing offense. But look, there's a much higher correlation between my total rebounds and my offensive rebounds and my total rebounds and my defensive rebounds. And this makes sense because offensive rebounds goes into the total and defensive rebounds goes into the total. It also makes sense that maybe defensive rebounds and offensive rebounds aren't perfectly correlated because a basketball player might be much stronger playing defense and going for the ball versus somebody who is playing offense, okay? And we'll notice there's a very low correlation between defensive rebounds and the free throw percentage. Again, this make sense, there's nothing that suggests that being able to block out a player or go up for a ball makes you a good shot from the free throw line. What you'll notice is that some of these things have very strong negative correlations. So for this player, the number of assists per game and the number of offensive rebounds in the game are very highly negatively correlated. So there's a high negative correlation. So in the years where Karl Malone was getting more offensive rebounds, he was not getting as high assists per game. And the years where he had higher assists per game, he was getting lower offensive rebounds, okay? There is much lower negative correlation between assists and defensive rebounds, all right? So here it's the number of assists and defensive rebounds is negatively correlated at -0.27. Now this statistic is used quite a bit in MBA programs. It is easy to just point, click, grab your data and have a table displayed. And then, it's much more difficult to tell a story using these because what you have to do is you have to make sure that you are communicating that the correlation doesn't necessarily mean causation, right? And so, you have to make that very, very, very clear, but it can tell a very compelling story, right? Here the story might be that, in this case, look, if Karl Malone's getting lots of offensive rebounds, and there's a negative relationship between assists. So if Karl Malone dishes the ball off to somebody and they miss and he gets the rebound, well, then he didn't get the assist because there wasn't a basket, right? And he's getting lots of assists, he's getting less chances for rebounds because the assist is dishing the ball off and then the other player has to actually make the basket. If you make the basket, there's no chance for an offensive rebound. So there's some reason why these are negatively correlated and there's a reason why this might be very highly negatively correlated. So you can tell that story, you could look into the data some more to see if what you seem to be seeing is in fact how the game played out. The coalition coefficient is used quite a bit and it is a good place to start to start understanding and analyzing relationships between data. A little later in this program, we're going to look at regression analysis, which is another way to identify relationships between variables.