So, in this case, the next algorithm we're going to look at is what's called Naive Bayes and we are going to say, suppose we want to detect spam emails, as a learning example. You can use Naive Bayes to do that. First, I want to talk about Bayes Theorem. It's named after Reverend Thomas Bayes, who's heard of this before? This was new to me. You guys know? Did you learn about it in statistics? It was my problem, I never took statistics when I was at university for sure. This was new to me last year when I was learning about this stuff, so, I'll buzz through it pretty quickly here. So, in essence, it's an equation that allows new evidence to update a set of beliefs. So, A and B are events, this nomenclature means the probability of B is not equal to zero, probability of A, probability of B. This is the probability of B given A and this is the probability of A occurring given B. The equation is the probability of A given B is equal to the probability of B given A divided by the probability of B times the probability of A. In this scheme, probability measures the degree of belief and the theorem links the degree of belief in a proposition before and after accounting for evidence. So, for a proposition A and evidence B, P of A is the initial degree of belief in A called the prior and the probability of A given B is the degree of belief in A having accounted for B after some evidences then discovered or uncovered or become or made aware of. The ratio of these two expressions here represents the support B provides for A. So, I snapped an example from Wikipedia here. If the entire output of a factory is produced by three machines accounting for 20%, 30% and 50% of the output respectively, the percentage of defective items produced is 5%, 3% and 1% respectively for each of the machines. An output item coming out of the factory is chosen at random and is found to be defective. We asked what is the probability that it was produced on the third machine? So, let A sub i denote that a randomly chosen item was made on the ith machine. I from 1,2,3 and B denote that randomly chosen item was defective. So, we have our probabilities 20%, 30%, and 50% for the three machines. This is the prior proposition. The probability that A produced a defective was 0.05, machine one was 0.05, machine two produced a defective part was 0.03 and the probability of a defective part on machine three was 0.01. So, we compute probability of B and crank that out and multiply those numbers together and we get 0.24. So, 2.4% of the factory output is defective. We are given that B has occurred. We've sampled output from the production line and we found that we have a defective product or defective component, and we ask what is the conditional probability that it was machine three given B? So, we write out the equations, and we fill in all the numbers from the previous calculation, and we find that the probability of a defective part produced on machine three was 0.21. So, given knowledge that the items collected was defective allows us to replace the prior probability 50% or 0.5 with a posterior probability of 0.21.