So let's going back, what we have seen is that we can use the principle component analysis and regression that has caveats in books about it, about dropping components. So is it right to only just pick the first two components? There are examples, where the least important component actually is more significant because remember, the components were extracted from the data. But what you're really trying to do, is relate that to some response variable. So it's quite likely, that the least important of the two components or [inaudible] together, might be able to predict the specific response variable, like how you do well in football. These principal components may not be very good. But how you do and graduation, maybe the first two components are good. But still, what the principal components do is at least tell you where is the maximum variability data and puts them into orthogonal columns. So I would like to probably do two things. First of all, you got to do it yourself. So my suggestion is that do an exercise, go through what we did, but try to run a regression with the first three components, not the two one. So you will have to change the script in R. Now, this is a difficult exercise I know, but instead of PC1 plus PC2, just make it PC1 plus PC2 plus PC3, and you should see your adjusted R squared becomes 66 percent. So that's a little exercise for you. I would be remiss if I did not run this on the Iris dataset. So I'm going to quickly run it and show you what it does. What I'm going to do is I'm going to run it on the Iris dataset and just show you what is the result. So if I run it, and you just have to step through the command, I'll let you read it, it has some fancy stuff there. Nice graphics. You may like to read it. The Iris dataset doesn't seem to be there. So let me copy it. There is no Iris dataset. So don't worry about it. The Iris dataset will work because it is in the add library. So don't worry if it gives us error. It knows what the Iris dataset is, but one thing I will do is I will clean all the data, and I'm going to run it. So you see it took the command. The reason it took the command is the Iris dataset is in the add library. So it know where it is, it goes searches and finds it. I'm going to run it. It shows you that there are four principal components. Yes there are four features remember sepal, width, and length and there are four principal components. The first two components account for 95 percent of the variability. Now, I am just plotting them and this plot I just want to show you what it does, is it plots the same data, but only in two dimensions. So what we have done is we have taken the four dimensional data and projected it into two dimensions. Which other two-dimensions we are chosen? Principal component one versus principal component two. You can clearly see that there are three clusters and you can see they're all separated out. So this is very different from the scatter plots that you saw. But, once again you will see that the first cluster is very different, which is the setosas are very different from the versicolors and the virginica. But, you can't separate out the last two clusters unless you knew that class variables. So going back, so what about Iris is? Even the principal components, you separate them out, two components are as good as four. So two legs are better than four, is that the way to remember this session? That's what it means. But even that doesn't solve the problem of identifying the clusters because you need labels. Finally, when do you use PCA? My friend Shyly provides this guidance to you, and I'm going to pass it on. First of all, it must be numeric. Second, when there are lots of features. Third, when the data is unimodal. Unimodal means it has gone one peak. When class labels are ignored, so we don't care about the class labels, we only look in the data and just looking at how it has to be projected. To visualize the data, use the top two or three principal components as we did in the Iris dataset. It is also used to remove the number of dimensions but you have to be very careful that you're not throwing out things which are actually important, because remember you are only doing the projections on the data without looking at the purpose. It is also used sometimes to remove the noise in the data because the noise may get canceled out and when you create these components, you get a much more cleaner data than what you had before. In closing, there a couple of interesting other methods of projecting of data from a higher-dimensional space to a lower dimensional space, and one is MDS and a very interesting version of it is called t-SNE, which just come up very fast. It's a variation of multidimensional scaling, which is very effective and is used to reduce the dimensionality. There is another method called self-organizing maps, which is another popular method of doing exactly what we did. So there are many projection methods. I just selected two of them because they are interesting for you. In this module, what have we done? We have looked at the curse of dimensionality. We have said, look there are problems when there are too many features, we'll have them. But there are problems because we may not be able to extend the results to extra additional data. Second, if you want to visualize, you can visualize but it may be better to project and visualize. Scatterplots are good, but there's a limitation. So you could project and visualize, you can visualize and project. PCA is one method we used, and I talked about other methods but generally it's useful numeric data to identify the important features. So the main point about this lecture was that this module was with big data comes big responsibilities. You have to make sure that you're not using garbage. So one of the ideas is to understand which features you can combine, you can eliminate and you can use for prediction.