[SOUND] This lecture is about the syntagmatic relation discovery and conditional entropy. In this lecture, we're going to continue the discussion of word association mining and analysis. We're going to talk about the conditional entropy, which is useful for discovering syntagmatic relations. Earlier, we talked about using entropy to capture how easy it is to predict the presence or absence of a word. Now, we'll address a different scenario where we assume that we know something about the text segment. So now the question is, suppose we know that eats occurred in the segment. How would that help us predict the presence or absence of water, like in meat? And in particular, we want to know whether the presence of eats has helped us predict the presence of meat. And if we frame this using entrophy, that would mean we are interested in knowing whether knowing the presence of eats could reduce uncertainty about the meats. Or, reduce the entrophy of the random variable corresponding to the presence or absence of meat. We can also ask as a question, what if we know of the absents of eats? Would that also help us predict the presence or absence of meat? These questions can be addressed by using another concept called a conditioning entropy. So to explain this concept, let's first look at the scenario we had before, when we know nothing about the segment. So we have these probabilities indicating whether a word like meat occurs, or it doesn't occur in the segment. And we have an entropy function that looks like what you see on the slide. Now suppose we know eats is present, so now we know the value of another random variable that denotes eats. Now, that would change all these probabilities to conditional probabilities. Where we look at the presence or absence of meat, given that we know eats occurred in the context. So as a result, if we replace these probabilities with their corresponding conditional probabilities in the entropy function, we'll get the conditional entropy. So this equation now here would be the conditional entropy. Conditional on the presence of eats. So, you can see this is essentially the same entropy function as you have seen before, except that all the probabilities now have a condition. And this then tells us the entropy of meat, after we have known eats occurring in the segment. And of course, we can also define this conditional entropy for the scenario where we don't see eats. So if we know it did not occur in the segment, then this entry condition of entropy would capture the instances of meat in that condition. So now, putting different scenarios together, we have the completed definition of conditional entropy as follows. Basically, we're going to consider both scenarios of the value of eats zero, one, and this gives us a probability that eats is equal to zero or one. Basically, whether eats is present or absent. And this of course, is the conditional entropy of meat in that particular scenario. So if you expanded this entropy, then you have the following equation. Where you see the involvement of those conditional probabilities. Now in general, for any discrete random variables x and y, we have the conditional entropy is no larger than the entropy of the variable x. So basically, this is upper bound for the conditional entropy. That means by knowing more information about the segment, we want to be able to increase uncertainty. We can only reduce uncertainty. And that intuitively makes sense because as we know more information, it should always help us make the prediction. And cannot hurt the prediction in any case. Now, what's interesting here is also to think about what's the minimum possible value of this conditional entropy? Now, we know that the maximum value is the entropy of X. But what about the minimum, so what do you think? I hope you can reach the conclusion that the minimum possible value, would be zero. And it will be interesting to think about under what situation will achieve this. So, let's see how we can use conditional entropy to capture syntagmatic relation. Now of course, this conditional entropy gives us directly one way to measure the association of two words. Because it tells us to what extent, we can predict the one word given that we know the presence or absence of another word. Now before we look at the intuition of conditional entropy in capturing syntagmatic relations, it's useful to think of a very special case, listed here. That is, the conditional entropy of the word given itself. So here, we listed this conditional entropy in the middle. So, it's here. So, what is the value of this? Now, this means we know where the meat occurs in the sentence. And we hope to predict whether the meat occurs in the sentence. And of course, this is 0 because there's no incident anymore. Once we know whether the word occurs in the segment, we'll already know the answer of the prediction. So this is zero. And that's also when this conditional entropy reaches the minimum. So now, let's look at some other cases. So this is a case of knowing the and trying to predict the meat. And this is a case of knowing eats and trying to predict the meat. Which one do you think is smaller? No doubt smaller entropy means easier for prediction. Which one do you think is higher? Which one is not smaller? Well, if you at the uncertainty, then in the first case, the doesn't really tell us much about the meat. So knowing the occurrence of the doesn't really help us reduce entropy that much. So it stays fairly close to the original entropy of meat. Whereas in the case of eats, eats is related to meat. So knowing presence of eats or absence of eats, would help us predict whether meat occurs. So it can help us reduce entropy of meat. So we should expect the sigma term, namely this one, to have a smaller entropy. And that means there is a stronger association between meat and eats. So we now also know when this w is the same as this meat, then the conditional entropy would reach its minimum, which is 0. And for what kind of words would either reach its maximum? Well, that's when this stuff is not really related to meat. And like the for example, it would be very close to the maximum, which is the entropy of meat itself. So this suggests that when you use conditional entropy for mining syntagmatic relations, the hours would look as follows. For each word W1, we're going to enumerate the overall other words W2. And then, we can compute the conditional entropy of W1 given W2. We thought all the candidate was in ascending order of the conditional entropy because we're out of favor, a world that has a small entropy. Meaning that it helps us predict the time of the word W1. And then, we're going to take the top ring of the candidate words as words that have potential syntagmatic relations with W1. Note that we need to use a threshold to find these words. The stresser can be the number of top candidates take, or absolute value for the conditional entropy. Now, this would allow us to mine the most strongly correlated words with a particular word, W1 here. But, this algorithm does not help us mine the strongest that K syntagmatical relations from an entire collection. Because in order to do that, we have to ensure that these conditional entropies are comparable across different words. In this case of discovering the mathematical relations for a targeted word like W1, we only need to compare the conditional entropies for W1, given different words. And in this case, they are comparable. All right. So, the conditional entropy of W1, given W2, and the conditional entropy of W1, given W3 are comparable. They all measure how hard it is to predict the W1. But, if we think about the two pairs, where we share W2 in the same condition, and we try to predict the W1 and W3. Then, the conditional entropies are actually not comparable. You can think of about this question. Why? So why are they not comfortable? Well, that was because they have a different outer bounds. Right? So those outer bounds are precisely the entropy of W1 and the entropy of W3. And they have different upper bounds. So we cannot really compare them in this way. So how do we address this problem? Well later, we'll discuss, we can use mutual information to solve this problem. [MUSIC]