Hi everyone. Welcome to this course called Amazon SageMaker: Build an Object Detection Model Using Images Labeled with Ground Truth. My name is Denis Batalov, I'm a worldwide tech leader in Artificial Intelligence, Machine Learning at AWS. My twitter handle is shown here, please follow. In this course, we're going to go through an end-to-end example of using Amazon SageMaker to build an object detection model, specifically identifying the location of honey bees in photos. To train the model, we will use images that are labeled by SageMaker Ground Truth, and you will see how to go from raw images to a trained and deployed model. First, I will tell you about the dataset and where to get images of honey bees. Then, I will do a quick overview of the SageMake and Ground Truth functionality, giving us enough of foundation to start the hands-on work with the tool. We will use Ground Truth to identify the exact location of bees on each image in the dataset. I will then give an overview of how object detection algorithms work. This is because we will use the labeled images to train an object detection machine-learning model built into SageMaker so that bees can be automatically identified in new images. Finally, we will cover hyperparameter optimization or automated model tuning showing you how to find an optimal set of hyperparameters in a smart way. Let us now talk about machine learning problem that we'll be solving and the dataset that we'll use. To begin, let's take a look at a quick overview of typical tasks in computer vision. It's not an exhaustive list, but gives you a good idea of the domain. At the simpler end of the spectrum, we want to merely determine if the image does or does not contain a certain single object, in this case a kitten. The next level of complexity is actually identifying where in the image the object is, for example, with a simple bounding box. More generally, instead of a single object, we would like to identify as many different objects in a scene as possible, giving rise to the object detection problem. This is precisely what we're after in this course. Though we would be focusing only on one object class namely honeybees. In some applications, it might be necessary to be more precise than the rectangular bounding box and this leads to an instance segmentation task where each pixel is classified as either belonging or not belonging to a particular object. For the purposes of this course, we will use 500 photos of honeybees. Where can we get such photos? There's an easier way than spending a lot of time near a beehive. Turns out there's an iNaturalist crowd sourcing project that comes with a website and a handy smartphone app allowing people to upload their photos of plants, animals, mushrooms, tagged with location, date, etc. Here, you see an image of the California puppy that I have submitted via the app earlier this year. The purpose is to record the sighting, but also to help identify the species based on established biological taxonomy relying on community of experts. iNaturalists also exposes a handy expert functionality where you can download the observation details based on selection criteria such as species, the geography, etc. This is exactly how I obtained the 500 bee images. When people upload their photos, they choose what license they prefer to share them under. For the purposes of this course, I only used photos with the CC0 license, known as Creative Commons public domain license. Here is the iNaturalist website and if you choose to explore the observations, you can get to the export screen by clicking the download button. Say we want to get all images of honeybees in Canada, we can specify additional filters insisting on observations that do have images. When it comes to downloading, we can choose all the desired attributes of an observation. For instance, to know under which license the images are provided, you should choose the license attribute. Here's the iNaturalist website. You can choose to explore observations by specifying the species that you're looking for, honey bees in our case, and then the location say Canada, and additionally, you can specify more filters. You can insist on observations that have photos, and finally, you can choose to download. You get kicked to the screen where you can provide additional details. You could see the count of observations, 1,600 roughly speaking, and you can choose specific attributes. If you're looking for a license, make sure you select the license under which the images are exported. There are many useful attributes as you can see. Then finally, create the export, you will obtain the excel file that contains all the information that you need. Before we jump into the hands-on part, let's spend a few minutes reviewing the Amazon SageMaker service and it's Ground Truth functionality specifically. Amazon SageMaker is designed to eliminate the heavy lifting from all parts of a typical process for a building, training, and deploying machine-learning models. First off, it comes with a Jupyter notebook server that lets you manage note books. There are dozens of pre-built notebooks to help you start solving many common machine learning problems. You can customize this to your task, taking advantage of the many built-in ML algorithms, or create your own machine learning recipe using one of many popular ML frameworks such as MXNet, TensorFlow or PyTorch. Second, you can kick off training of a developed model using as much computational power as you need by choosing the right size cluster of GPU or CPU based machines available in the AWS cloud. Even the most sophisticated algorithms still have many parameters, one must specify in order to start training. Finding the right values for such parameters feels often more like an art than science. This is why SageMaker provides the so-called hyperparameter optimization, abbreviated as HPO, also known as automated model tuning. Finally, once a satisfactory model is trained, SageMaker makes it easy to create a scalable production inference endpoint by taking care of model deployment. As you have come to expect from AWS, autoscaling is supported out of the box. SageMaker is a powerful tool and that's the reason many customers choose to adopt it including the well-known companies shown on screen. In addition to the three main functional blocks in SageMaker of notebooks, training, and inference, many additional features are built into the servers. In this course, we will use the SageMaker Ground Truth to label the raw image dataset, identifying the exact location of honeybees in each. For many practical problems, we may have collected the data needed for training, yet this data is missing the correct target attribute which is required for machine learning. In our case, such a target attribute is the bounding box or generally, many bounding boxes around each honeybee in an image. Labeling a dataset requires human input. After all, we're training a machine learning model to mimic human decisions. Not only is there the labor cost involved in labeling each image, but we want to involve many human labelers in parallel so that a large dataset can be processed as fast as possible. SageMaker Ground Truth makes all of this possible and that is why for many real life problems, Ground Truth lets you turn a potential intractable problem due to cost and needed orchestration into a solvable one, enabling more and more machine learning applications. SageMaker Ground Truth supports these labeling tasks straight out of the box. You will recognize the three different image-based tasks namely, image classification, object detection, and semantic segmentation. This includes the labeling user interfaces required for such tasks. Additionally, Ground Truth supports labeling jobs for text classification, and if you have different need, SageMaker also lets you build a custom labeling workflow. For example, you may need to simultaneously label an image and a related piece of text. An accustomed labeling template would be required in such a case. The UI can be constructed using crowd HTML elements, tag library defined by Amazon Mechanical Turk. Moreover, you can provide logic in the form of Lambda functions that do pre and post-processing of labeling data. For example, you may want to use a custom algorithm to score the results of the labeling task. If you need the help of many humans to label the dataset, where do you get these helpers? SageMaker Ground Truth supports three different types of labelers. First off, you can rely on a public workforce and on-demand 24/7 workforce of over 500,000 independent contractors worldwide powered by the Amazon Mechanical Turk. Second, you can create a private pool of workers. These could be employees of your company for example. Sometimes, the data that needs labeling contains sensitive information that is not to be exposed publicly. Finally, Ground Truth supports external third party vendors of labeling services who you can contract via the tool. Another important value out of Ground Truth is the ability to learn while the human labeling is in progress. This is called active learning. If the active learning model built in real time is confident that it can automatically label image it will do so, and if it's not, it will still send it to a human labeler. For large datasets, this feature can dramatically reduce the overall labeling cost. All right, after this quick overview we're ready to jump into AWS console and actually start the hands-on part. The very first thing we need to do is download the Jupyter notebook from the link shown. Jupyter notebooks are also known as IPython notebooks and hence the file extension of IPY and B. We will use this file shortly. Once you're inside AWS console, the first thing to look at is the region. In my case, I'm using the Oregon region. That's where I will also create all the buckets to access my data. The next thing we want to do is go to Amazon SageMaker. I have it conveniently here in the recently visited services but you can also type it directly. Looking in the left navigation bar, you see the four sections that we've already talked about. Ground-truth allows you to label the data. Notebook section manages the notebooks. Training is about kicking off model training and finally, inference is about deployment of models and running predictions against these end points. What we need to do first is create a notebook. So we'll click on notebook instances. You could see that I already have some notebooks here in a stopped state. I will then go to create a notebook instance, give it a name. Something like SageMaker-course. We can leave the instance type as the default and other parameters as well. The important thing for us is to supply the IAM role. If you already have a role that's suitable, great but usually you may need to create one. So we'll click Create a new role. In this case you can leave all the settings as the default except for this one, 'Choose any S3 bucket'. This will allow you to access S3 buckets and not be restricted to some specific ones. Later on, you can be more restrictive. So we'll create this role. Success. Leave the other attributes to be the defaults. We'll click the 'Create notebook' instance. You could see that this notebook is starting. It may take a couple of minutes for it to start. As we can see our notebook SageMaker-course is ready to go. So we'll just click on 'Open Jupyter lab' link. Let it all load. This is the place where we need to upload the Demo.ipynb file. So we'll click on the upload button. Choose demo. This is our notebook. We just need to double-click on it. Close the Launcher off and there we go. So what you're seeing here is a Jupyter notebook. If you're not familiar with Jupyter notebooks is just a way for you to combine text explanations with code that you can run right inside the notebook. We're not going to be spending time reading the text here. Instead, I'll be explaining things and you can take time and go through the explanations at home. So the very first thing that we need to do is to scroll down to the very first cell and you can see that cells are highlighted with the blue bars on the left. There's our first code cell. In fact it's not even Python. It's just shell commands to download the dataset.zip file and unzip all the contents. For us to execute it, we need to press Shift and Enter. You will see an asterisk. Once the asterisk is showing, it means the code is running. Once that's done. You get to see the number which is the first cell to be executed and of course the output as well. On the left, you see all the images that have been unzipped. In fact, you could just take a look. Double-click on any of these. You could see images of bees. That's a good sign. These are the images we want. Just to understand exactly what was in that zip. To understand the structure a little bit better, we can go to the next cell and just list the contents. In addition to the 500 images of bees. We'll just see the tail of them here. We also have a test sub folder. In there, we have 10 test images that we'll use for testing later on. Additionally, we have the output.manifest file. This contains the results of a ground truth labeling job for all 500 images of bees that are present in this dataset. You see as we'll try to kick off a labeling job through SageMaker. We'll just have it labeled assemble of images, say 10. But for you to train a model, we need many more images and therefore a complete labeled dataset is provided with all 500 images labeled. Moving right along, in the next cell we need to supply the S3 bucket that will be used for both reading the images and as well, writing the output of labeling job and model training and so on. So this is our working bucket. We must create it in the same region as SageMaker. Remember we chose Oregon. So I already have my denisb-SageMaker-Oregon bucket created. You will need to create your own bucket with your own name in the region that you chose. So let's execute this cell. Finally, what we need to do before we start labeling is to upload all of those 500 images into the bucket using the S3 sync command which I've just done. Now, in fact because I've uploaded these images into this bucket before, the sync doesn't need to do much work. In your case, you may see many, many lines. Essentially one line for every one of those files. Great. We're now ready to move on to labeling the images. I have provided a handy cheat sheet where you can grab some pieces of useful information when we're configuring the labeling job but we'll see that in a moment. So I'm going to go back to the console. Instead of the notebook section. I'm going to look at the labeling jobs section. You can see some previously executed labeling jobs that have been completed but we will click Create labeling job button. Give it a name. Something like 'Bees sample'. The next thing we need to do is provide the location of all those images. Well, remember we've just uploaded those images. But instead of providing just the location, usually, a manifest file is required that describes that information or contains that information. We have a handy wizard link here. So we click on Create manifest file. We'll point to the location of images denisb-SageMaker-oregon/ input/. Make sure to add a forward slash at the end and the default images is right. So we'll just click Create. We need to wait a little bit before the manifest file is created and we will confirm it. All right. We can see that the manifest has been effectively created, 500 objects were found which is great. These are the images we want. So we'll just say Use this manifest. You could see immediately that this manifest file was filled into this box for us. Next, we need to provide the location of the output. I can just copy this location and say Output. For IAM role, choose the same role you've created earlier. Now, for additional configuration, recall that instead of doing a full 500 images labeling job, we're going to pick a subsample just for demonstration purposes. So I'll say random sample of size two, and that's in percentages. So two percent of 500 is ten images, and we'll click the Create subset button. There you go, ten objects selected, we'll hit the use this subset button. It's actually important that you do that because otherwise the subset is not chosen. So once we click here, you should see that the manifest file is replaced with the sample manifest. The reason I say that that's important is because if you don't do it, then the labeling job for the entire set of 500 images will be kicked off, and that's potentially more costly and we'll see what the cost is in a moment. Next, we need to supply the task type. As we've discussed earlier, we're talking about object detection here and bounding boxes around objects. So we'll need to pick the bounding box task and click the Next button. So the next decision for us is which worker pool to choose, who are going to be the people labeling your images. We can choose from the public workforce, these are the contractors who are powered by Amazon Mechanical Turk, 24/7 workforce. You can also choose a private workforce. It's possible that your images contain some sensitive information that you don't want to reveal to the public, and that could be a better option. You can create that pool yourself by inviting people and then curating it. So fully managed by you. Then you also have an option of contracting a vendor. There are several vendors that integrate with sage maker ground-truth and each vendor has their own specific features and benefits that you could choose from. We're going to go with the public workforce for this demonstration, and clearly we need to decide how much we're going to pay the labelers for each labeling task. There is a guideline here, that if your time estimate as let say eight to ten seconds then 3.6 cents is a reasonable price to pay, and and so on and so forth. So for some very challenging tasks you can pay more money, and of course the more money you're paying for a job, the more labelers are going to be motivated to complete the task faster, and the more labelers are going to be drawn to your task and therefore your labeling will be completed faster overall. The next thing we need to do is certify that there's no adult content in those images, and that we understand that this is a public workforce, we don't have any personally identifiable information in the images and so on. So that's what we do. There's a small box here to enable automated data labeling. Now, this is useful when you have a very large dataset or a moderately large dataset, and in our case we only have a 500 images, and in fact for this sampling labeling job we'll just use 10. So we are not going to be able to benefit from automated data labeling so we'll leave this off. Clicking on additional configuration, this is a bit of interesting information because we can decide how many workers get to see the same exact image and get to label it. So by default, five different people will label the same exact image, and then a consolidated label is computed to provide a better overall label, a more accurate label. But of course you're going to be paying every one of those five workers the same amount specified here, 3.6 cents. Therefore, you need to decide what's the right number of labelers to get the accuracy that you need to achieve, and five is a reasonable option. You can go with three, you can go with more but because we're doing a demonstration here we'll just drop it to one and leave it as such. Moving on to the next section, this is actually unexciting section because we're configuring the visual tool that the labelers will use and you can see the different sections here. First we need to describe what is the task. So there's a brief description, and here's where the handy cheat sheet is useful to copy and paste. Those task descriptions and other information. So we're asking labelers to draw a bounding box around a bee, then we need to provide the label here while we only have one class, so it's going to be bee, and for good and bad examples we need to provide descriptions as well. What constitutes a good example? We want to make sure that all parts of the insect are included, and the bounding box is tight around the insect. Whereas, a bad example is when parts of the bee are excluded or the box is too big. Okay, so these are just text descriptions but we can also provide images, and what you need to do is click to the right of this gray box, hit the Delete button, and then insert a new image. We need to copy the URL. There is the example of good label, and do the same here, delete, insert. That's an example of a bad label, you see the boxes a bit wider and some parts of the bee; the wing and the leg is not included. Great. Not only we can configure it, but we can also preview it. So if I hit the Preview button, we see the UI that the labelers will be able to see and moreover, we can try to draw a bounding boxes. So the box tool is setup and I can just try to imitate this, make the box a little tighter, and now this looks good. But again, I can hit the submit button, but nothing happens since it's just a demonstration, we only see the JSON document that will be produced as a result with the details of the bounding box, dimensions, the class label, and overall image dimensions. Great. So once we are satisfied that the UI looks good, we can hit the Submit button, and our bees-sample labeling job is now in progress. This may take some time. It could be minutes, it could be even longer, much depends on how available are the lablers in the public pool. So there's a bit of a gamble here. But usually, they complete in a matter of minutes. So we can wait until this is done. But I can also look at the results of my previous labeling job, a similar labeling job, so we can move on. If I look in the console for example, there is the bees job, I can click on it. You can see that all 10 images have results bounding boxes and they're pretty good. This is an interesting image because it's an entire beehive. So there's lots and lots and lots of tiny bees, so we can't really expect a labeler to draw thousands of boxes here and as a result, I think that person just chose to draw a big box around the beehive itself. Again, this is a normal situation that you will encounter these exceptional cases on any dataset, that's real machine learning. So we can review the results here in the console, but I've also prepared some cells in the notebook to achieve the same thing just to show you that you can do a lot of things programmatically and through the console. The one thing we need to do is supply the labeling job name. I believe that was bees, exactly. We just need to run the cell. One thing the cell is doing is, actually, printing the lines from the manifest file. This is the manifest file from the labeling job. This manifest is in the JSON lines format, where every line is a well-formed JSON document, and each line corresponds to the result of a particular image being labeled. So you can see over here that there is a source-ref attribute that contains the s3 location of the image. Then we have the name of the labeling job, which is bees, pointing to a structure that contains the results of labeling, and that's called annotations. You have the class_id, which is zero, zero is just the first class, we only have one class bees. We have the dimensions in the position of the bounding box, and then of course, we have the same image size. So this is not very different from what we've seen earlier in the output from the labeling UI. All right. Let's define in the next cell a function to show the annotated image, we're using the matplotlib library in Python. Now, that the function is defined, we can actually go through all the lines in the manifest file and show the 10 images that were labeled, and there it is. Now, that we have labeled the dataset, in preparation for model training, it's time for us to review some details about the computer vision problem of object detection. Given a photo containing one or more objects, our goal is to find that tight bounding box around every object that the model can identify together with the corresponding class label. In this picture, three objects of different classes are identified: a dog, a bicycle, and a car. Each bounding box is described by the position of the top-left corner: x and y coordinates, as well as height and width. These could be expressed in absolute numbers of pixels, or as percentages relative to the overall image width and height. This means that for a neural network to detect many different objects, it must be able to produce a separate output for each object, incorporating the bounding box location and the class label, as well as the corresponding classification confidence. Intuitively, we can imagine the process of object detection as occurring in two separate steps. One step is proposing interesting regions, where the object might be, and another is classifying objects in the region by also generating bounding boxes. One approach is to first apply pixel-level sub-segmentation and then apply a greedy algorithm for merging together similar regions. Once larger sub-regions have been obtained, candidate regions for where the object might reside are proposed. We can see that a TV, a TV stand, and a woman are already contained wholly within their own respective sub-regions. Turns out, such a two-step process can be optimized to be completed in a single forward pass. Namely, via the so-called single-shot detectors or SSDs. All of this happens within a single neural network. Here's an example of a network topology for an SSD network based on VGG 16 convolutional neural network. Different SSD topologies also exist. For example, based on ResNet-50 network. Such SSD systems have shown superior performance, both in terms of accuracy and the inference speed. Speaking of accuracy, how do we measure if the model produced good results? Remember, that unlike traditional machine learning classifiers, not only the model needs to produce an accurate classification, for example a dog, but also generate an accurate tight bounding box around the object. The bounding box could be off by a few pixels, or could be emitting part of the object or being a completely wrong place. How do we then compare accuracy of different algorithms? One idea is to take the predicted bounding box and the true bounding box and measure the degree to which these boxes overlap. For instance, we could compute the ratio of two areas; intersection area divided by Union area or IoU. It's obvious to see that for perfectly matching bounding boxes, this ratio will be one, whereas for non-intersecting bounding boxes, it will be zero. Now, in fact the metric that is actually used is called MAP or mean average precision. It's outside the scope of this course to describe exactly how map metric is computed. But suffice it to say that it's also based on the IoU concept. So you should have the right intuition about how it works. Okay. Now we're ready to train the object detection model in Amazon sage Maker. We're now ready to start training our object detection model. For that, we're going to use a built-in object detection algorithm based on the ResNet-50 neural network topology. We will see the details of how to configure that in a moment. But first, we need to split our dataset in two parts: one is going to be used for training the model. Out of the 500 images, we will take 400 for training, and then 100 remaining images are going to be used for validation, which means that every time a new epoch has gone by in the training, we will evaluate the training model based on that validation dataset and to see how well it's doing. This gives us an assessment of whether we should continue training or stop. So I will execute this cell. We see training samples, 400, validation samples, 100, as expected. Now we need to upload these two manifest files that we've produced in the process to our S3 Bucket, which is the standard location where SageMaker is looking for things. We can see that our uploads have succeeded. We'll be using these handy S3 URLs shortly. I will be using the console to start the training job. But I've also provided a code option where you could just execute the cell, and exactly the same things I will do in the console will be run in code. That's often faster. So once you've done this once, you can rerun the cell many times to kick off many different training jobs with exact same configuration. So from the console, we've gone through the Labeling section, we've seen the Notebook section. Now, we move on to Training section. I click on the Training jobs, and click Create training job. I need to, as usual, give it some kind of name, bees-training-job. I will pick the same IAM role, and we will be using a built-in SageMaker algorithm that we'll need to specify out of the many algorithms that we have. We're going to use Object Detection. Next, we need to select an input mode. This is defining how your training algorithm is obtaining data, images in our case. If you choose File, it means that the entire dataset needs to be downloaded onto the box that is performing the training. But we can choose Pipe mode, which delivers the data in just-in-time streaming fashion. That's, in fact, what we need to supply here for the object detection algorithm. Based on the algorithm that was selected, SageMaker already knows what metrics are going to be published. For us, the two metrics are going to be especially interesting. First of all, it's the train:progress metric. How many epochs off training have already been completed, so I can know how many more to go, and validation:mAP, mean average precision. This determines, as I was trying to explain, the accuracy. So the closer this to one, the more accurate the predictions and the bounding boxes are, and the closer to zero, the less accurate or completely inaccurate the results are. Next, we need to select the instance type, which will be used for training. Remember, when we're kicking off training in SageMaker, it's going to start a completely new instance of the type that was specified here. For object detection, we actually need a machine with a GPU. So I'll select p2.xlarge, leave the other attributes the same. We're now moving to the Hyperparameter section. This is an important section because we get to decide here how to configure the training algorithm. First of all, we get to choose the network topology. There are two available: VGG16 and ResNet-50. I've already mentioned ResNet-50. This is what we're going to go with. Next, we need to decide if we're going to train from scratch, in other words from no previous knowledge, or start with a pre-trained model. Now, we only have 400 images that we're going to be using for training. That's usually not enough to achieve meaningful results. Just imagine that a machine has never seen any images of anything, and you are showing 400 images of bees expecting it to identify bees. That's an extremely difficult task. What typically is done, is that we train or pre-train this model on a large variety of images. Sometimes millions of images of different objects. It could be cars, birds, bicycles, and so on. This network has learned to recognize certain features in those images. Maybe certain curves, certain shapes, and these become the rudimentary building blocks out of which other objects can be recognized. So now, we're going to use a pre-trained network, we'll choose one for yes. Next, we need to supply the number of classes, that's simple. We only have one class of bees. Remember, I was talking about epochs, how many epochs? How many rounds of training are needed? We can leave the 30 here as the default. We'll leave learning rate, and optimizer, and momentum, and others as the defaults. The minibatch size, I'll adjust. This basically defines how many images are used in a minibatch during a training round. If you have lots and lots of images, it's more efficient to have a larger batch size. It's also utilizing your GPU better. But because we have a relatively small dataset here, I'll just pick a minibatch size of one image being shown to the network at a time, and need to supply the number of training samples. Well, it's 400. We'll leave the rest of the attributes to be the defaults. Great. In the next section, we need to supply the location of the training data, where they all are. So we need to define two channels. One is the train channel. Again, the input mode is going to be Pipe. We're going to wrap the records in the RecordIO format. Don't worry about this. This is just a requirement of the algorithm. That if you're using object detection, the records need to be wrapped. Luckily, this is all handled for us automatically by SageMaker. Next, the data type is AugmentedManifestFile. That's how the data is provided. We need to explain to SageMaker what attributes should be expected in that augmented manifest file. Remember, we looked at them earlier, one was source-ref. The other was the name of the labeling job. Because I'm using the results of the full labeling job a full 500 images, the labeling job name at that time was B's 500, and that's exactly what you need to supply given the data set and that output manifest file that is provided to you. Finally, we need to supply the S3 location of those manifest files. Remember, in our Jupyter notebook, the URLs were printed, so I can just copy, paste the train, and then I will need to add a second channel, which is the validation channel, change the name to validation, choose pipe again, record IO, augmented manifest file, same exact attributes, and the location we can copy our notebook and we're done with that section. Now, for the output of our training job, this is going to be the model artifacts. We can just supply the output folder that we've used before. No need to supply encryption, leave tags empty, and I'll just hit create training job. So we'll see our training job is starting. It definitely can take some time if you dealing with very large data sets, some training jobs could take weeks even. This is probably expected to be completed in tens of minutes, so instead of us waiting for the job to complete, I can show you the results from previous job that was completed. If we go back to our notebook, I can provide a job name here, and describe a status. So this job is currently running, I pasted here, it's in progress, and I can find some previous job that I've run, for example bees training, and that has certainly completed, and we can try to review the results of this training job. What does that mean? Training has completed, which means we have the model artifacts. We need to then create that model, package that model, and deploy to an endpoint for example, to be able to run some predictions against it, or some object detection tasks against it. So in my next cell, this is exactly what happens. Given the training job name which are provided earlier, the model artifacts will be fetched, and we'll create a configuration for an end point, which is a real-time endpoint against which you can run your predictions, and once the configuration is ready, we need to actually invoke create endpoint, this step may take some minutes, so the next section simply prints the status of the endpoint that I've defined earlier. If we want this one that was just created, we can enter it here, comment out the previous one, and we can see that this is now creating. Let's wait for our endpoint to finish the creation. Let's refresh it a few times, and we can see that the endpoint now is in service, therefore we can start running predictions against it. So the first thing that we're going to do, is look at the 10, test images just refresh that they're there for a reason. These are the 10 test images we're going to use. They were never seen by the machine learning model before or during training, completely new. I need a bit of a helper function that will unpack the results of a service call, endpoint call, so we can then plot those results nicely, figure out what is the class label? Where is the height? The width, the length, and so on. Now, moving on to the actual plotting, let's just execute the cell, and there are the images. Well, if you look carefully, we don't see that many bounding boxes. There's one here that's not too terrible, but then in a lot of other photos you don't see much. There's one that's not so bad here as well. The reason is, that we have a parameter here, that determines what is the minimum confidence threshold to use, and 0.5 could be a bit high, so we could try seeing bounding boxes with lower confidence, and then the other parameter here is how many of those best bounding boxes that are over the threshold should be shown. So if we rerun this again, we can see bounding box has started to appear and this red one is not so bad, there clearly is some misses like that green one, there's no B there. These two point to the right direction, and here we still see no bounding box even at the lower confidence. This image is also a bit difficult, clearly. That one is pretty accurate. Similar here, one of them is doing a good job. There are many bees here but it sees some in the image. We can control the output of the prediction. We can decide what confidence to use for our purposes. But again, remember that we've just had 400 images of bees for training, and when we configured our training job, all those hyperparameters, we pretty much kept the defaults, we didn't change much and in fact, when you don't know where to start, stick with the defaults and see what the results are. All right. At this point, what we typically do is try to figure out if there's a better set of hyperparameters that we could use to achieve more accurate results, or maybe 400 images is just not enough, and you need to go back and obtain more images and train for longer time, for many more epochs before we get the desired level of accuracy. This is exactly the time for us to talk about hyperparameter optimization which is the next section. As you have seen in the demo, we need to supply many different hyperparameters for our object detection built-in algorithm. If you don't have lots of experience for a particular algorithms, and machine learning problem, it will likely be very difficult for you to decide what set of parameters is best. The reality is that even the most experienced and trained practitioners, often need to explore the space of hyperparameters to see which ones affect the model performance the most. That's why we have added a way to automatically find the best set of hyper-parameters, a feature called automated model tuning. In the industry this is also known as hyperparameter optimization or HPO. Hyperparameters exist in most machine learning algorithms. In the case of deep learning which is based on neural networks, typical hyperparameters consists of the learning rate, the network topology, for instance the number of layers and dropout or regularization as means of dealing with overfitting. For more traditional decision tree-based algorithms such as popular XG boost, these could be the number of trees in the ensemble, the maximum depth of the tree or boosting step size. For clustering, it could be the number of clusters and so on. Each hyperparameter has a corresponding set of possible values and altogether the hyperparameters for a particular algorithm for my hyperparameter space that we need to explore to find the best point in that space. The effect on performance can be quite dramatic. What you see here is the effect of different combinations of two hyperparameters of embedding size and hidden size on the resulting validation F1 score which characterizes the accuracy of the resulting model. Embedding size in this case could be the dimensionality of the word embedding in a natural language processing problem, while the hidden size could be the size of the hidden layer in a neural network. The resulting difference between a validation F1 score of 87 versus 95, could be the difference between an acceptable or an unacceptable Machine Learning model. What's shown here is the space for just two hyperparameters, but usually we have many more. It quickly becomes difficult to understand and explore the space by hand, which is what researchers often had to do before automated hyperparameter optimization. Additionally, hyperparameters are typically not independent of each other in terms of their combined effect on the model. A changing one would likely affect how another influences the model. Often the only way for you to know how the model is going to react to something different is to make changes to hyperparameters and train a new model. Exploring this space exhaustively in this way is usually going to be quite costly. So what are the typical approaches to model tuning? We've already talked about the problems with manual approach, at best it's inconsistent and relies on a lot of previous experience. Early attempts at automation relied on various brute force approaches. One idea is to simply divide the ranges of the hyperparameters into same size intervals, and then effectively do a grid search as shown in the top right picture. The problem with this approach is that usually some parameters are highly important to improving the performance, and others are not. But the grid search approach as shown in this picture only samples three different values of the important parameter. If instead, we were to use the same total number of nine measurements, but instead of grid, chose the positions randomly, we would sample the important parameter in nine different places. There's also Sobel algorithm used to generate the so-called quasi-random numbers. These are better than pseudorandom numbers, and that they sample the space more evenly, avoiding clustering. While also offering random search, SageMaker it provides a smarter approach based on Bayesian optimization. The exact details of this algorithm is outside the scope of this course, but it's based on Gaussian process regression which provides estimated bands for objective parameter in the yet to be explored areas, and the algorithm chooses the most promising area to explore next. Now that we understand the theory of hyperparameter optimization, let's see how this actually works in SageMaker. For model tuning, I will go back to the console and we'll kick off another set of training jobs actually, or a model tuning job that initiates many training jobs that sample the hyperparameter space and let you find the best combination of parameters. We'll go to the same training section, except we'll choose hyperparameter tuning jobs. We'll go and create hyperparameter tuning job, click that button. Of course as always we need to provide a name. Choose the IAM role. A lot of the things that we need to supply here are very similar to what we've done with the regular Training and I will go ahead and pick the same values and will spend more time talking about what's new and what's different about hyperparameter training. We still need to supply the algorithm, is going to be object detection. It's still going to be the pipe mode. Click Next. When we described the hyperparameter optimization process, we talked about different options. Bayesian methods, Random, both are supported in SageMaker but we'll stick with Bayesian as potentially more intelligent way of going about things. Our objective metric is automatically suggested for us which is the mean average precision, and we want to maximize that metric. We can say that training jobs could stop earlier, if SageMaker is able to determine that there is no reason to continue. It seems like our accuracy for this specific training job is dropping off. So we'll say auto, and we're now getting to the hyperparameter configuration screen except that it's a little bit more complicated than last time. Let's look at the similar parts. Well first of all for each of those hyperparameters, you can still supply your choices. So for num classes we have one class, epochs we have 30 epochs. But, if we don't choose static or fixed really value, we're able to supply a range instead. By doing so we're letting sageMaker sample values in that range to figure out what the best value is. So we'll choose continuous here, and we need to supply the min and max values, for the learning rate. Okay. So what are these values? How do I find them? If we go to the SageMaker documentation for object detection hyperparameter tuning and I have this tab opened, then it does give you suggestions. In fact it tells you that, the biggest impact is typically seen by adjusting minibatch size, learning rate and optimizer. It doesn't mean that you should never look into other parameters but it's just a suggestion. Here, for the typical hyperparameters that had changed there was a suggestion off the min and max value. Therefore, for learning rate, these are the values that we can provide. Going back to the console, 0.5. Because of a dramatic difference in scale here, I think it's good to leave a logarithmic scale. We also have the optimizer. In the previous training job, the optimizer we chose was SGD or Stochastic Gradient Descent. There are some other methods that are also available, like adam and so on. Again if you don't know what is a better optimizer, just let the system optimize for you or figure it out for you by supplying the entire list of four available optimizers, and that means, see yet another hyperparameter that SageMaker will use. Momentum and weight decay. In this case I've decided not to optimize them. Rely on the static values. These are the ones that we supplied last time. But the minibatch size is interesting, and so it can go from one to say 32. Let's see if we can get a better performance or a more optimal value than the one that we were previously supplying. Finally, we provide the same 400, a number of training samples and leave the rest of the attributes to be the same. I click the Next button, and here you see the same channel configuration that we need to go through. Let's do it quickly, and it's going to be exactly the same like last time. The location. Let me copy. There is also a handy possibility for you to clone a previous training job in which case a lot of these parameters that you are filling here, will be automatically set to the same values you've used before. Okay, hit the Next button, and we move to the next section that is somewhat familiar. First, we need to choose the instance type, and for this, we're going to choose the same p2 xlarge. The next section of resource limits is actually quite important. In the maximum of training jobs, you get to decide how many training jobs the hyperparameter optimization will kick off, and that's your way to control how much of the hyperparameter space will be explored. Also, of course it controls how much it's going to cost overall because every training job will consume a certain amount of GPU and will be contributing to the overall cost. So for the purposes of demonstration, I'll just use 10. The next section is maximum parallel training jobs, and would just configure how many of these 10 can run in parallel if you had access to many GPUs. But be careful about the limits. You may have certain limits in the accounts of the number of GPU instances you can use concurrently. So I'll keep this as one. For the purposes of the demonstration. I'll create the jobs. That's each of the training jobs. So the entire hyperparameter tuning job, will finish when all those training jobs have been completed. So we might need to wait here for quite some time. Therefore instead of waiting, we can just look at the results of the previously completed tuning job, like this one here to see what the outcome was. If I click on the test tuning job, you can see that it took an hour and 44 minutes. The interesting thing to look at, is the objective metric value for all those training jobs. We can see the duration of time that each of those took, and we can see also the best outcome is right here, 0.47. That's higher than what we had with our single training job that we examined earlier. The other attempts were not so successful. You could see that after six minutes of running, the value of the objective metric in this job was so poor that SageMaker decided to stop that there's no point in trying to continue epoch after epoch, because we get a much better results somewhere else. So this is another benefit of kicking off hyperparameter optimization. That early stopping means that you're actually saving yourself time and saving costs by stopping earlier and not continuing with exploring paths that are not going to lead to great results. Okay, so if this job produced the best results, what are the relevant hyperparameters? If we click on the job itself, and then scroll down, we can see that the learning rate for this tuning job was 0.002. So it's higher than the default that was chosen, it means that the neural network was converging faster, and then the minibatch size was picked to be two instead of the one that we supplied earlier. Okay, interesting. So how can we see the results of this? Well, we can do the same thing here. We can take the model artifacts that this training job produced, package them, deploy them to an endpoint, and examine the outcome. In fact, I have done that previously. So let's try this. If I look at the endpoints, I believe it's this one here. So I can just copy the name of my endpoint. That is in service. Go back to my notebook, scroll to the section where we checked the endpoints status. In fact, it's exactly the one that I have here in the comments too. Statuses and service. We can just move on to the plotting part where we're going to execute or run those 10 test images against the same endpoint, and see what the outcome is. We're definitely seeing better quality results. You can see that this bee is well taken care of. There is some boxes here that clearly not point to a bee. But that's also possible. This one is not so bad. This one is decent. Even in this image, it sees a bee somewhere here. The entire flower looks a little bit like a bee so I think the system is confused. This image remains to be hard. But here it did a good job. Okay, and also not so bad. Even here where you have many bees I had highlighted some, and provided a general area for the bee. So you could see that we've managed to achieve more accurate results by executing automatic model tuning, with SageMaker. Where we didn't really have to know any kind of background on what each hyperparameter means. How it affects the accuracy. We're just lead SageMaker do the job. I think the next step from here would be to maybe try different network topologies, and to try to grab more images from the iNaturalist website. So instead of 400, let's grab 2,000 more and try to rerun the training. But I will leave this as a take home exercise for you. One thing I almost forgot. It's the cleanup section. Once you've done your experimentation, there's lots of resources that you can actually shut down every endpoint that's running is actually instance, that's running and it's consuming resources. So here's a quick way to clean it up. Delete all the unnecessary resources so you don't have to pay money for unused stuff. This completes our demonstration. We have now come to the end of our course. I hope you're eager to try SageMaker ground-truth and object detection on your own dataset. To give you an idea, why don't you complement the images of honey bees with those off wasps, similarly downloaded from the same iNaturalist website, and then build an object detector that can tell apart bees from wasps. Wouldn't that be cool. So I hope you learned something useful. Thanks for watching. Again, my name is Denis Batalov, don't forget to connect on Twitter or LinkedIn, would love to hear your feedback about this course.