So that brings me to what we call as principal components analysis. It has its own restriction because it works only with numerical data, but it's a cool tool to have in your toolkit which allows you to use a lens to project. So to make this more visual, I'll give you some examples. Let's take a coordinate system. This example again [inaudible] let's take a point in three-dimensions and we'll call these dimensions i, j and k. So i, j and k correspond to each of these axes you can see, i-axis, j-axis, k-axis. Now, if you take a point in the three-dimensional space we can see what is the value it takes on the width, how much is the value it takes on the length and what is the value it takes on the height and that describes a point in the space. So basically you are looking at a point and representing it as Mu x1 in the direction of i, Mu x2 in the direction of j, and Mu x3 in the direction of k and ignore the rest of the information on this slide if you want. That we know what to. But let us say, one of these dimensions is more important. So for example, look at the number 3,528. Well, it's not 3,000 that's important. Maybe it's more or less important that it's 3,500. But 3,500 communicates a lot about this data. Twenty eight? Twenty may be important, but what about eight, least important. So basically, the way we are representing a number says the one-dimension is more important than the next dimension is more important than the third and then that is most important. More important than the fourth. So what we really want to do is take that data and extract dimensions out of it. So that in the first one captures a lot of information about these data like 3,000. The next one captures the next most important amount of this information, like 500 and so forth. So this idea is that we want to do this. But there is a word there which I want you to notice, we call it orthogonal. Basically, the first dimension captures all the variation in one direction. This next one captures all the variation in the second direction. The third one, it's a 90-degree storage. So the first and the second one do not measure the same type of variation. The third one captures variation but in a different direction, which is orthogonal or 90 degrees if you will to the other two. I know it's 90 degrees to both. So this is the idea of orthogonal basis. So here's the basic idea. This, you will understand. Say I have triplet cricket. This is stumps and you have a torch and you're shining a torch on the stumps. The stumps are three sticks stuck into the ground, which are like that, three sticks. Imagine shining light on them. Now, when you shine light on them, they will project as three lines on the surface. You can imagine. So that's the idea of projection. The projection is, you take a higher dimension shine a light on it so that it projects the same object. It's a shadow of this object if you will, in a lower dimension. So the direction in which I project matters. Isn't it? So you have data of the stumps in three-dimensional space but where your torch is will affect the shadow you see. So you have the projected data in 2D space but that changes with where your torch is. As your torch moves, the projection gets distorted. Look at this. The stumps are three and you use a torch on the side. So all you see is one line and that doesn't give you a lot of information. So obviously from this example, I want you to think of an idea that some projections are more useful and convey lots more information than other protections. That doesn't mean that the predictions are not useful. We will talk about it. So some projections almost mimic the object, whereas some projections do not give as much information. Here you go. So which one is the best? We say the one that is the best is the one that preserves the most amount of information. As you can see, A probably is the best in the sense that it's saying, "Okay. They are parallel lines, they're straight lines, they are equal lines in three-dimensional space." D is the worst because it's saying, "There's only one line." Actually, there are three. So projection is the best which captures the most variability in that data. So how does that data vary? The actual three-dimensional data is varying because the stumps are at equal distances, but they're shifted in a longer line. So what do you really trying to capture is that variability in three dimensions in two dimensions. Now, let's think of a data in a non-data cloud. Now, interestingly, with all my telling you, there is no best projection here, because it's all circular. So whether your torch is shining, it's a cloud, it's a sphere and you shine light through it, what it's going to do is going to just project it on a circle 1, 2 dimensions. It doesn't matter where your torch is, the torch goes to the left, on the top, you get the same image. So as far as we're concerned for an object like this, I don't benefit at all from where we project. So the measure we are using here is, the variation in the cloud unlike capturing it and we are saying, "For the sort of symmetric object doesn't matter." So we want to find a projection which maximizes the information content. Instead, what we do is, let's say I distort the sphere, and make it sort of elliptical sphere. Now you'll agree with me that one of these is the best projection and one of this is the worst. I don't think we'll have to debate very long that the one that projects along the long axis of the ellipse, the one at the bottom is preserving the most amount of variation in the data, whereas the other one that is projecting it along the shorter axis which is on the circle on the top there, that is not capturing as much variability of the data. So you have the best projection and the worst projection. Are we done? No. We're not done because what we really want to do is project and project and project until we capture the entire variation of the data. So this is called a Complete and Orthogonal projection. So basically, we are projecting it on to the left and we're projecting onto the top. Just using those two projection, I can recover the ellipse. These two projections are orthogonal because they're at 90 degrees to each other. They're complete because using the two, I'm going to recover the data. So the idea is take the data and try to project it into complete and orthogonal projections with one more caveat. You'll agree this set of projection has something special going about. It's complete, it's orthogonal, but on top of it, the first projection which is along the major axis of this ellipse captures most of the data. The next one is after it has captured that, it captures the rest. So basically, this is what is the idea that projected in such a way such that the first component of the projection captures the most amount of variation, the next one orthogonal to it captures the next amount of variation and so forth. That is called the principal component method in words.