So, as we now start to become more comfortable with the concept of word embeddings, or the idea of mapping every word to a vector, we might want to think about how we might utilize those word embeddings or those word vectors to do analysis, to do natural language processing. What we mean by natural language processing is to take natural text or language and to try to make inferences or to make predictions based upon that text. Elsewhere in our lessons, we've introduced this concept of a Convolutional Neural Network. So, what we would like to do just for a moment is to examine how we might use this concept of a Convolutional Neural Network in the context of natural language processing, utilizing this Word2Vec or word to vector representation. So, at this point, let's assume that we have a codebook, which maps every word to a vector. So, we still haven't discussed how we are going to learn those embeddings or how we're going to learn the mapping from a word to a vector, we'll get to that. But before we get to that, it might be worthwhile to examine what we might do with those word vectors if we had them, because that will help motivate us for the work that it takes to understand how we learn these word embeddings, so these word vectors. So, let's assume for now that we have learned the codebook. So, we have learned a mapping from every word to a vector. If we've done that, then any document that is composed of N words can be mapped to N vectors, each of which is m-dimensional and size. Once we have achieved this, we now have taken our natural language, and we have mapped it into vectors numbers, and then once we have that we can do various types of analyses on them. For example sentiment analysis, question and answer, translation between languages, et cetera. So, let's think a little bit about how we might achieve this. So, let's assume that we have a document, which is composed of N words, a sequence of N words. W_1, W_2, W_3 to W_N through this Word2Vec concept, those N words will be mapped to N m- dimensional vectors. So, what we would like to do is apply, the concept of a CNN, a Convolutional Neural Network. You want to apply this concept to text. So, what we're going to do, if you recall the the Convolutional Neural Network, which we've considered elsewhere in the context of images, now we're going to consider the Convolutional Neural Network in the context of text. So, recall that with a Convolutional Neural Network, what we do is we take a filter, and we shift it through the data. In the context of images, that corresponded to a two-dimensional shift in the space of images. Here, we're going to consider filters, each of which is m by m times d in size. So, m corresponds to the dimension of the vector, and d corresponds to the number of words, the length, the number of words in the filter. So, for example, if we consider d equal to three, that corresponds to convolutional filters. Here, we have k convolutional filters, each of which is length three in the dimension of words, or you might think in terms of time, three time points along a sequence of words. Then, the m dimensions, which is the height corresponds to the dimensionality of the word vector. So, what we're gonna do is we're gonna take our filter here, the filter is highlighted, and we're going to convolve it or shift it to multiple positions along the length of our text. So, the way to think about this is that that filter in this case is of dimension three, and it corresponds to a concept which is related to three consecutive words. Then what we're gonna do is, we're gonna take that filter and shifted through our text. Whenever the filter, the concept reflected in the filter of three consecutive words is aligned or related to the associated taxed at the corresponding shift location in that text, we would expect a high correlation or connection between the filter and the text. So, through this process, we're going to take each of our k filters, and we're going to shift it to each possible location in our text. For every shift location, we're going to get a number, and that number is reflective of the match between the filter and the text. Remember now the text is represented by the word vectors. So therefore, through this process, we take the original text, and we map it into a set of N vectors each of which is k-dimensional. Then, we might do something like pooling. So, the idea here is that for each of those k dimensions of the output of this convolutional process, we might take the maximum value across all N positions. So, for each row of this matrix, each row corresponds to the degree of match between the corresponding filter in the text at the associated shift location. For each row of this n by k matrix, we might take max. We might take the maximum value, which means, and this is called a pooling step, that's called max pool, which would basically quantify which value across the text gives us the largest correlation between our filter, and the original text. We do this for each of the filters, and then we get a k-dimensional vector which is telling us the maximum correlation across the entire text, between each of the K filters and that text. So, this then gives us a k-dimensional vector, and then ultimately, once we have taken the text and mapped it to a k dimensional vector, that can then be sent through tools that we already have, which are mappings of vectors of feature vectors to classification decisions. So, we can take that k dimensional vector, send it through logistic regression, or a multi-layer perceptron to make a decision. So, the the idea here is that once we have mapped words to vectors, we then have significant opportunity to leverage tools that we have developed elsewhere in our lessons particularly, Convolutional Neural Networks. So, the Convolutional Neural Network, which previously was applied to images through two dimensional convolutions, when we analyze text is manifested in terms of one dimensional convolutions. Once we can do this, we can then use the power of the Convolutional Neural Network to analyze text. One thing that I'll just briefly note is that, if we're going to do learning, if we're going to learn using a Convolutional Neural Network, we will require labeled text, which means we need to for example, know the sentiment of every document that we're analyzing. Because with a Convolutional Neural Network, if you recall, we need labeled data for the training process. So, this is actually expensive in practice because it implies that every document that we're going to learn with, we would have to have the corresponding label for, for example, the sentiment. It is very expensive or time-consuming for humans to read every document, and then provide what we call the truth for the label, for example, the sentiment. So, therefore learning in a supervised way using labeled data is expensive. So, as we move forward in our analysis of natural language processing, we're going to be particularly interested in situations for which we can learn, for example, learn the word embeddings or the word vectors based upon unlabeled data. What that means is this is just natural text, we directly take the natural texts without requiring any human labeling, and to do learning from that. So, in the discussion that we've had thus far, it was assumed that the word vectors were available i.e. they were already learned. However, you may think that we can generalize this. In particular, we can treat the word vectors as additional parameters that we need to learn. Then, in the context of the Convolutional Network, we can learn the word vectors, and the other parameters of the Convolutional Network simultaneously. The challenge of this as was hinted, which I'll now highlight further is that to do this requires labeled data. So, what that means is is that for every document that we analyze, we have to have the so-called true label that we wish to predict from that text. As I said, the access to label data is very expensive because it requires a human to read every document, and then to provide a label. So, as we move forward in our analysis of methods for learning word vectors, we're going to be particularly interested in methods that do not require labeled data, which means that we can learn the word vectors directly on a corpus of text without the need for any human to do labeling. It turns out that there are some very powerful methods to do that, which we will consider next.