Here, we want to classify Iris flowers. This is what's called a multivariate dataset of three classes of the Iris flower called Setosa, Virginica, and Versicolor. This data was introduced in 1936 by Ronald Fisher, a Botanist or something that went off and was studying iris flowers in some region of the World. He collected 150 instances of each of the three variations, 50 of each flower. He measured four feature types; length and width of the petals. When I first got to this I didn't know what a flower petal is. Then, I saw sepals, I'm like, "What's a sepal?" So, sepal is this part that sticks out down here below the petal. So, he measured the length and width of the petals, and the length and width of the sepals. You can see in this graph, we'll see how we generate this data in a moment. There's 150 samples total and I extracted 135 of them, and plotted them to look for similarities in the features. You can see they clustered quite nicely. The Setosa's are way up over here and a group by themselves, and the Versicolor and the Virginica's still separate pretty well. You can imagine a plane passing through there. So, there is some structure in the data set. Without all the math, it's just sort of a summary comment here. A linear support vector machine is a type of classifier where the features fall into one of several categories. That's my little cartoon example there with two categories shown here. This happens in multiple dimensions. Andrew, referred to this as a hyperplane, but as we'll see in an example in a minute, that can be curved surfaces as well. Potentially complex curved surfaces that divide the classes, locations of feature sets one from another. It turns out that these features that are close to this hyperplane that separates the categories, are what are called the support vectors. So, here's yet another library of learning algorithms. This is live at SVM. Go up by the Skype person. We can play with it. You pick your colors here. So, I'll use a yellow here. I'm defining features and then changing color. Find some more features over here. Then, you run it and it creates this hyperplane that divides those of one set of features from another, using a support vector machine classifier. You can clear that and say, "Wow, that's an easy one to solve." Okay. What if it looks like this? Back to blue here. Some up in here. Curved hyperplane. It's not really a plane. A surface, a hypersurface, divide those features from each other. My students last year asked me, "Oh, what about if it's a doughnut? One set of features all around the outside and we put some more in the center?" Well, we did that. We're almost there. Changed the blue. A bunch in the middle like a doughnut. Again I can discriminate those. So, that was cool. Only a grasp. So, I use a support vector machine on the Iris dataset. So, I import the Iris dataset. It's available through SKLearn. It's just a module. You can suck it in and you get all the data. You can go poke around in it, and look at the individual features. I imported the linear support vector classifier. Created an instance of it. Now, don't get too hung up on this right now. I'm just using a method called cross validation. That's a way when you have a set of data. Let's think about it like this. You have a whole bunch of data points. What cross validation does is selectively and randomly pulls a certain set of, let me use a different color, certain set of features, out and pulls all of those together to form your training data. The other one's, this one, this one, that one and I imagine there's a couple of other ones in there, it needs formal test dataset. Let's say this big dataset and you just randomly select some for training, and you use the other ones to test your algorithm. That's all cross validation does. So, I pass any Iris dataset with some parameters, and it passes me back a feature matrix for training, and feature matrix for testing, and y train values and some y test values. That's what happens here. So, then, I trained the model. So, I pass it the training sample along with the corresponding expected values and I call fit. Again, that's where the learning is happening. Okay. Then, I run the test data on it to make predictions. So, I call predict and I pass it the x test matrix. Then, there's quite a bit in these libraries. At this point now, we want to measure the performance of the prediction. So, I import a metrics library and I calculate an accuracy score by passing in the y values and the predicted values, and then I print out the accuracy. There's a thing also called the confusion matrix. I have no idea who made that up. I don't know why it wasn't called an error matrix. It's a confusion matrix. So, it got one wrong, the expected was, oh, I skipped over one thing. So, the features are assigned numerical values. So, say's Setosa was zeros, and Virginica was one, and Versicolor was two. Okay. They're just assigned. I supposed they're called labels in machine learning, and we have to turn those labels into some kind of a numerical representation for the algorithms to work with. Should I had that in the slide earlier. So, this is what we were expecting. This is the real data. So, whatever flower, this was represented by zeros. We expected 0022110022120. You can see it was different in one occasion. So, I had an accuracy of 0.93333 percent. I made one wrong calculation. You could go back. I was really bent on this. You could go back and play with variations on the SVM classifier. Change around the hyperparameters to it. So, I load the Iris data set again, suck it in. I create an instance of the linear support vector classifier. Oh, there's that random state variable. I knew I'd seen it in here. I started using it at one point. Because I was learning as I was doing machine learning. Those holding the random state constant here for this set of runs. Then, I use cross-validation again to extract, and create a training and test feature vectors, and training and test expected outcomes. I pass the training data values into the fit method to train it. I pass in the test matrix to the predict method, and it makes us predictions. I can print these values out here. Then, I measured the performance, and calculated some error metrics and so forth. Printed the confusion matrix there. The second thing is, is don't get too hung up on this thing called PCA. This is called Principal Component Analysis. Raise your hands if you've heard of Principal Component Analysis. Okay. I never had either. Oh, you did? You've heard of it? Okay. We're going to get into what that is in the big data analytics part. But for right now, it's a way of reducing the dimension. Just think about it as a way of reducing the dimension on it. If you have a dataset that has many many dimensions, sometimes it's useful to reduce the number of dimensions to make the dataset more manageable. So, I squeeze the training data down and I plot it. This is what it looks like when I run it, and produced this graph. This is the training dataset with three classes, and known outcomes. We expected them to divide into three.