Let's look at the median. The median is the value that is derived from the absolute middle observation of a ranked order in the population. On this first sheet here, I have already ranked all of the games from the shortest game to the longest game. So if I know the number of games, I can then identify what the absolute median is here. So what I'm going to do here is I'm going to equals count, and when I use the term count in Excel, it's going to tell me the number of observations that I have. Of course I know that I have 162, but I want to confirm that. In fact, I do. So I said equals count of my population, and it tells me I have 162 games. So the median is the one that's absolutely in the middle. If I take this and divide it by two, I will get my median value. So game 81. Game 81 is absolutely in the middle of this thing, right? But since we have an even number of games, really, the problem is that 81 is the first half and games 82 through 162 are the second half. So if I have an even number of games, I have to take, in this case, it would be the 81st and 82nd game and add up their values, and then divide by two. If I have an odd number of values, then I have an equal number, like an odd number of values in the population, have an equal number on either side and I can take the observation that is smacked up in the middle with the same number on either side. So let's go up to games 81 and 82 and look at what the game durations were to find our median. Right here, you'll notice that these two right here, this is games 81 and 82, and the duration of each one of these games is three hours, period. So the median game duration for the Chicago Cubs was three hours. Now in this situation, our median value is not that dramatically different than our average or our mean value. The mean value for game duration was three hours and four minutes. The median value game duration was three hours. In this case, the mean and the median are both giving me a pretty consistent value of how long the game is taking place. Now this isn't always the case, but in this situation it is. Let's use our Excel spreadsheet to find the last calculation, and this is called the mode. Now this is the observation that occurs with the most frequency. Now, we could look at each of these values, three hours, three hours and 21 minutes, three hours and 18 minutes, three hours and three minutes, two hours and 57 minutes, and count each time that each one of these observation comes up. I'm going to use a slightly different technique, and what I'm going to do is I'm going to do a pivot table that allows me to do a count on each one of these game lengths. So the first thing I'm going to do is I'm going to highlight an area of the table, and this is my time and attendance, all the way down up to 162 games, right here. Now, then I'm going to go to insert and I'm going to insert a pivot table, given my range. Now, what I'm going to do here is I'm going to have time as my rows, and I'm going to put the value here as my count. So now what I've done is I've essentially created a a table that allows me to look for the highest value. So when we do this here, I've got, so the total number of games and the game time is in first column, A, and the total number of times that that game time occurred is in the second column, B. So in this case, we look here and say, an hour and 15 minute game. That only occurred one time during the season. A two hour and 17 minute game, that occurred twice during the season. So I'm going to scroll down here and I'm going to look for what is the game time that occurred with the highest frequency, and a very quick glimpse will show me that there are a couple of values that occurred with very, very high frequency. This one right here. Two hours and 56 minutes and three hours and eight minutes occurred with a higher frequency than any other game time. Is it possible for us to have two modes in a population? Yes. Like in this case, we've got two game times that occurred with the same highest frequency; two hours and 56 minutes and three hours and eight minutes. So what's the mode? We've got two modes here. In this case, we notice that our mean, our median and our mode are all slightly different values. Which one is a better representation of what's going on in this sample? I would say probably the mean in this example is a very good representation of what we could expect if we were to go to a Cubs baseball game, and say, "Okay, the average length of the game is three hours and four minutes. That's my arithmetic mean." That's probably a good estimate of what's going on. My median is three hours, so would it be unreasonable for somebody to suggests that, "Hey, the Cubs baseball game is going to last three hours,"? No, it's right on the money. They would be completely accurate. They would say, " Well, it's going to last between two hours and 56 minutes and three hours and eight minutes, something like that." I mean, those occur with integrator frequency. Yes, maybe a little bit longer or a little bit less than three hours. But each one of these is identifying a slightly different piece of information. Let me give you another example where it would be much clearer that the mean and the mode would not be identical, and it might not be appropriate to use the mean, and might be more appropriate to use the mode. All right. I want to show you an example of where maybe the mean and the median and mode may be clearer that there is a a better answer. In the baseball example, the mean and the median and the mode all give us a very good understanding of the central tendency of that data. So there may not be a clear winner in terms of which one of those descriptive statistics is the best way to describe your data. Here I have the winners of the Indianapolis 500, and from 2014 dating all the way back to 1946. I also have the starting position of the winner and the average speed that the winner actually was traveling. So in this case I might say, " All right, so what's the average starting position in the race of the Indianapolis 500 eventual winner?" Now, if I took the average of that, that is the arithmetic average, the mean, I would get this number right here, 6.188. All right. There's a problem with this. Not that it's not calculated correctly because it absolutely is calculated correctly. The average starting position of Indianapolis 500 winner is in position 6.18. The problem is a car can't exactly start in position 6.18, a car can find itself in position 6 and it can find itself in position 7, but it can't find itself in position 6.18. Should we ignore the 0.18 and say, okay, the average starting position of Indianapolis 500 winners is position 6? Maybe. Let's look at the median and let's look at the mode. Now, again, I want to go to the mode because I think this is going to tell me much more relevant information, much quicker, and then we'll go back to the median. So I'm going to do another pivot table here with my starting positions over here. This is my year and driver. Now, I'm going to highlight all of these right here, and I'm going to insert a pivot table. Now what I'm going to do is my starting position is going to be both my rows, and my starting position is going to be my values with my count. Okay. So here I've created a pivot table. Each one of my rows identifies the starting position and the count tells me how often or the frequency that that starting position has come up. Now look, when we look at this, we notice that the mode, the winning position, the starting position that comes up most frequently of the winners of the Indianapolis 500 right here is right here, and so that's what we call the pole position. So the driver that had started in the pole position in Indianapolis 500 has 117 times between, what was it? 1943 and 2015? This is the mode. So if I was a betting person, I would say, "Look, I would bet on the pole position rather than driver in position 6 or 6.1882 to win the Indianapolis 500." As a matter of fact, you'll notice that the driver in position 6 has only one, the Indianapolis 500 two times. How could the mean, the arithmetic mean and the mode be so radically different? Well, because of the way that this value works. See the pole position has a value of 1, right, while the ending position has a value of 25. So even though 1 has come up 17 different times, that barely offsets the one time that position 25 has come up. So trying to get an average value is different than the average starting position, because position 1 and position 25, they don't really match up. I mean not in a way that the numbers reveal something about what that data is trying to explain. When you are calculating your measures of central tendency, you have to be aware of the system, what the numbers are representing. So by just taking a raw average here, the arithmetic average, you're calculating, so I'm giving the same kind of weight, if you will, to position 1 and position 25, and I am just adding all these numbers up and dividing by the total number of racers and the total number of races here, right? So it's just giving me a different kind of a glimpse of the data. In this example, I would stay away from the mean and I would move towards the mode to describe what's happening in this data. I did promise you that we would look at the median, so we're going to go back to the data and I'm going to rank them from highest to lowest or lowest to highest and find the absolute middle starting position.