In this lecture, we take the development of the convolutional neural network further still, focusing just on a single band of data, by considering the evolution of its topology. The concept we adopted in the last lecture for the connections between layers is similar to the common process of convolution used to filter an image to detect spatial features. We haven't covered that material in this course, but it is moderately straightforward. In spatial convolution, a window called a kernel is moved over an image row by row and column by column. A new brightness value is created for the pixel under the center of the kernel by taking the products of the pixel brightness values and the kernel entries and then summing the result. That is exactly the same operation implemented by processing element in the hidden layer of the convolutional neural network just before the offset is added and the activation function is applied. It is because of that similarity that the partially connected neural network as described is called a convolutional neural network. However, in the convolutional neural network, the kernel is usually called a filter, and the set of input pixels covered by the filter is called a local receptive field. Note that any size of filter and receptive field can be used. Even though we are exploring the smaller number of connections as a way of simplifying the network, and thus the number of unknowns that need to be found during training, it is of interest to think a bit further about the practical significance of choosing a spatial neighborhood kernel of weights for that purpose. While important analysis of special context, this has particular relevance to picture processing and object recognition, fields in which the convolutional neural network have been used extensively over the past five years or so. In spatial filtering a site for detecting the edges in an image, the kernel or filter entries are selected by the analysts for that purpose as seen in this very simple example. A three by three filter can be used to find the edges in an image. In the convolutional neural network, the kernel entries, that is the weights, prior to the application of the activation function, are initially chosen randomly. However, by training, they take on values that match the image features that are characterized by the spatial nature of the training samples. If the training images strongly feature edges, it is expected that the weights will tend towards those of an edge detecting filter, for example. The strength of the convolutional neural network is that with sufficient numbers of layers, it can learn the spatial characteristics of an image. That is why it is an important tool for performing context classification and for picture processing in general. We now introduce some more operations used in convolutional neural networks along with the associated nomenclature. The first is the concept of stride. When we looked at feeding just nine outputs from one layer into a single processing element of the next layer, we did so with a single pixel shifts along rows and down columns. Some authors choose to have larger shifts, the result of which is that the number of nodes in the next layer is reduced. The number of pixel shifts is what defines stride. This slide shows a stride of two. Another topological element often used is to add so-called pooling layers, as seen on the right hand side of this slide. This strengthens the dependence on neighborhood spatial information and reduces further the number of parameters to be found through training, particularly when more than a single convolutional or hidden layer is used. Pooling is sometimes called downsampling. We now have a decision as to how to proceed further and ultimately construct an output for the convolutional neural network. There are four common options: First, we can keep going by feeding the output of the pooling layer into another convolutional layer, to provide a deeper network. We can, in principle, have as many layers as we wish. Just like with a fully connected network, we can have as many hidden layers as we like. Secondly, we could feed the output of the pooling layer into a set of output layer, processing elements, and thus terminate the network. Thirdly, we could have the output of the pooling layer act as the inputs to normal, fully connected neural network. In this case, the convolutional neural network access a feature selector for the fully connected network. This is a common approach, especially in remote sensing. Finally, we could have the output of the convolutional neural network generate a set of class probabilities. In the next slide, we are going to examine the last two options. Here we show a network with two convolutional layers and one pooling layer, feeding into a much smaller fully connected neural network. At the top we considered a couple of lectures ago. In effect, the convolutional neural network is acting as a feature selector for the fully connected network. Note that we have introduced another term, flattening. That is just the process of straightening out the matrix into a vector as needed for the neural network input. Note also here the last convolutional layer is not followed by a pooling layer. The output of a convolutional neural network can come from either layer top. After flattening, rather than feeding the results into a fully connected network, another very common option is to use the convolutional neural network outputs to generate a set of shuteye probabilities called softmax probabilities. They are defined in the slide. The convolutional neural network outputs are exponentiated and normalized as shown so, that the set of softmax values replicate a set of posterior probabilities. Finally, the sigmoid activation function is usually replaced by a simpler activation function called the ReLU, the Rectified Linear Unit, which has the characteristic shown here. This choice speeds up training by improving the efficiency of the gradient descent operation used in back propagation. Note that the use of stride and pooling successively reduces the number of unknowns to be found by training. Also note that convolutional and pooling layers can be cascaded. The second question here leads to one of the design equations used with convolutional neural networks.