In this lecture, we confront the problem of multidimensional images, color pictures made up of the three color primaries, and multispectral and hyperspectral images in remote sensing. In this slide, we see the three color primaries of a color picture. Alternatively, they could be three bands of a multispectral image. We describe the image pixels as shown by the three equations on the top of the slide. On the bottom, we share the corresponding three filter entries. In both cases, we have used three indices. The first refers to the individual band, while the others are the pixel position index. The simplest way to treat the three band image, is to carry out three separate convolutions as shown by the equations on the top of the slide. Generally, only a single offset, theta, is used. The three convolution calculations are added, to which the offset is also added, and then the activation function is applied. We now have three times the number of whites to learn by training. While this is the approach most often adopted for color pictures, we will see later, how multispectral and hyperspectral images are treated. Here we show another variation, often used in the convolutional neural network. Several convolutions can be performed in parallel in order to extract more spatial information from an image. As noted, the filters can be of the same or different sizes. Because of the complexity introduced by the various options we have discussed, it is difficult to come up with a standard form of diagram with which to represent the convolutional neural network. Most authors use their own forms of diagram. But the representation shown here is common to many and simple to understand. He we show convolutions in parallel, as just discussed on the previous slide. We also show several layers, each of which is composed of a convolution operation followed by pulling. Of course, the pulling operations are not essential, but are included here for completeness. Finally, we show the flattering operation often used at the output. As indicated, some authors even have crossed connections between the parallel paths. But that can defeat one of the benefits of the convolutional neural network by having several separate parallel paths. The network can be programmed to run on a multiple process and machine. We now come to an important practical consideration, similar to that we met with the maximum likelihood classifier, when considering the Hughes phenomenon. And that is the problem of over-fitting, which is illustrated on this slide. The concern arises because we have so many weights and offsets to be found through training. And the availability of training data determines how effectively those unknowns can be found. We must have sufficient training samples available to get reliable estimates of the unknown parameters. Otherwise the network will not generalize well. In other words, it will not perform well on previously unseen pixels. It is not sufficient to have a minimum of samples to estimate the unknowns, otherwise over-fitting will occur. This is illustrated in the example from curve fitting shown in the diagrams on the slide. Fitting a high order curve through just three points, will guarantee good fits for those points. But the behavior between the points can be way out in terms of being able to represent intervening points not used in generating the curve. If many training samples are used, then the function found interpolates or generalizes well, as indicated on the right-hand diagram. Clearly, we need many more training pixels than the minimum, to ensure we do not struck the same problem when training the neural network. Consider now the numerical complexity of analyzing hyperspectral image data. So we can make use of both spectral and spatial properties. Several approaches have been used in practice, as we will see shortly in some examples. One is to analyze the spectral information content alone. Another is to analyze the spatial information content alone, that is spatial context. Another is to do both together, but there is a processing challenge. We could treat the problem of processing hyperspectral data with a convolutional neural network by allocating one convolutional filter to each band, as we did previously for the three band color picture. But that requires about 200 times as many weights as for a single band image. For an image with 200 bands, and 3x3 kernels, the total number of unknowns, that is weights plus offsets, connecting the input image to the first convolutional layer is 2,000. Noting that the same weights are used in each filter right across a particular band. This, of course, gets multiplied upwards by the number of filters used in the convolutional layer. Often we take the path of reducing the spectral dimensionality of the hyperspectral image before applying the convolutional neural network. Although that partly defeats the purpose of using hyperspectral imagery in the first place, transforms such as the principle components transform, do allow us to concentrate the variance or information content in a small number of components. Three as shown here, but more might be necessary if we wish to retain, say, at least 95% of the image variance. If we want to analyze hyperspectral data for spectral properties alone, we can use the convolutional neural network defined the label for each pixel, based just upon its spectrum, and thus implicitly the correlations between bands. This, of course, ignores any benefit of spatial context. Here we summarize how multiband images can be handled right through to dats as complex as hyperspectral imagery. Importantly, the need to avoid over-fitting must be kept in mind at all times. The first question here asks you to propose a simple formula based on the discussion in this lecture on using principal components analysis.