Hello and welcome back to this course. In this video, we're going to start talking about traffic analysis. The reason why we care about traffic analysis in this particular context is for command and control and concealing that command and control among the normal traffic flowing over a network. There's a lot of different ways in which we can send data out of the network and receive commands back from C2 infrastructure. Of all of those different ways, it's often important to find a way that hides that command and control traffic well within normal traffic. For example, if you're using some obscure protocol to send traffic or data in and out of the network, it's going to stand out from normal traffic. However, if you use something like HTTP, DNS, etc, in a network that commonly uses those, then it might be a little bit more difficult to detect your command and control traffic and differentiate it from the rest of the normal traffic on the network. In this video, we're going to look through a Python file that tries to help inform that decision of what type of traffic is best suited to command and control in this case. We're going to be looking for a few different features. One, we want a protocol that's common. Something like HTTP or DNS would be well-suited in those networks, because those are the protocols that people use when they're browsing the Internet, so you're very used to having those protocols going out to random places. However, in other networks, those two protocols may be very unusual, say if you don't have a lot of web browsing in this particular network. It's important to identify a protocol that actually works for that network. In addition to commonality, we need somewhere thing that can actually carry some data out and we need something where that data flowing out doesn't look too unusual. We'll talk about how to measure some of those values in the following few videos. In this Python file, let's go down to the bottom where we've got our main section and we can start talking about what we're seeing here. For the purposes of this demonstration, we're going to be using the sniff command in offline mode. That just means we're going to be reading from a pcap file rather than live off the wire for the purpose of making sure that we have a nice variety of traffic here to look at. Otherwise it have to be generating the traffic as I do the video, and it's a bit more difficult. We have commanded it out here a sniff command for live analysis, something that would grab 100 packets and call analyze traffic on each one. In this case, all we're changing is we're reading up packet capture file in and calling analyze traffic on each of those packets instead, and so 100 is probably a bit of a low number for real analysis anyway, but that's something that you can configure. Regardless of which version of this we're using, we're going to call this analyzeTraffic function, which is essentially just a helper function that runs everything else in this Python file and a couple of the modules that we'll be pulling in. The first step here, we're going to perform just high-level protocol analysis, looking for different network protocols that are common on the network, and that are also well-suited for command and control. Let's scroll up and take a look at that first. Here we're looking at for a few different target layers. These are just layers of a packet that could carry data. If we have just a raw payload, if we have some DNS traffic or if we have an HTTP request or response, we're going to investigate hiding data in some of the fields and those various layers. First we need to determine if a particular packet that we're looking at contains one of those layers. When we send in a packet, p to this protocol analysis function, we can use the layers command and scapy to get a list of all of the different layers contained within that packet. Here we're going to store in the variable prodose the layers that match our desired list. We might have multiple ones. For example, an HTTP request or response is likely to have a payload layer, which they call raw in scapy. We've got a response with a raw layer and we want to analyze both for potential opportunities to hide data in them. Here we're going to be using a Python command to look through the layers that we have and determine the names of the ones that match our criteria. We've got nesting here, we have an outer nest and an inner nest for a list comprehensions. We'll start in the inner layer, or let's start in the outer layer actually. In our outer layer, we're trying to get the name of a particular layer, and we're looping over our inner layer, which is another list. In this inner layer, we want to get the layer at a particular index for indices in the range of layers. For example, if we have our five layers here, we'll loop over each of those five layers, pull them out, and we're returning the name of the layer. However, we're only doing so if that layer name is in our list of target layers, so Raw, DNS, HTTP request or HTTP response. For each of these protocols, we want to know how common this particular protocol is. We might just so happen to find something that's perfect for carrying data, really good for holding the type of data that we need, but it only happens once in a blue moon and so if that's the case, then us using it for command and control is going to stick out. We're just going to loop over the list of protocols that we pulled out here and then add them to a dictionary up here. That's just going to store a count of each particular protocol that we've observed so that we know how common the various protocols are. Then at the end we're going to return the layers that we have that match our desired ones. Coming back to our analyze traffic function, we've done this, so now we're going to loop over that list of protocols and look for potential opportunities to hide data in the fields of each one, using this field analysis function. It's going to take the packet and the name of that particular layer, so something like DNS, HTTP request, HTTP response or Raw. Here, in our field analysis function, what we're going to do is we're going to identify which fields within this particular packet could be used to store useful information. Here, we're going to loop over the list of top level fields within the packet. For example, if we are at the IP layer, we'd see things like the source and destination IP address. If we're in a DNS layer, we'll see some of its various fields, and we're also going to see a couple of fields that actually contain another structure within them. So things like the queries and the responses in a DNS packet, are really a top level item that then we can expand into lower level items. We're going to deal with those down here. For each of these fields within this packet, we're going to look up the value associated with that field name and call it v. Then the first thing we want to know about this field is how much opportunity we have to store unstructured or even random or random looking data. For example, if this is a field that always has like readable text in it, then that might not be the best suited for our command and control if we need to be able to say export data or import some form of executable code. You're looking at a field that should have readable text and it has executable code in it, that's going to set off some red flags. One way to measure how much opportunity we have to put that less structured data in a field is by calculating its entropy or the amount of randomness in its values. We'll come back to this field entropy function in a later video. It's in a stand alone Python file. But that's what we're calculating here, is the amount of randomness we have in a particular field. We have a few criteria on this field entropy calculation. We need to feel to be of a certain type, for example, we would prefer to have something that's strings or byte array than just a single integer for command and control, because it's hard to move much data in and out of a network if you've only got a single integer to play with. We also have some length constraints on it. We want a minimum length to ensure that we actually can move that data efficiently. If we meet our criteria, we'll have an entropy value e, otherwise this will be set to none and we'll skip this one. If we've decided this is a usable field, we're also going to check the encoding. What this means is, some fields within packets might automatically have some form of encoding used to obfuscate their contents or to make it possible for unstructured data to be carried over a text-only protocol. This potential support for encoding in a particular field would be really useful to us, because if we can hide the data that we want behind it encoding scheme, rather than sending it out in plain text over the network, that's better for us. We're just going to perform a couple of different checks that again, we'll get to in a later video, to see if this field is likely to carry encoded data regularly. Then this n here is just building a name for storing the information that we extract in a fields variable here. We'll get back to that once we've run the code. For each record that we're creating, we're just going to store the entropy, the length of the field, because ideally, longer fields allow us to have more data transfer, and then also the result of calling the check encoding function. If that particular field we've already seen in a previous packet, we do it here. If it's a brand new field that we haven't looked at before, we'll add it to the dictionary here. As I mentioned before, we have two different types of fields in our packet. We can called them top-level fields, things like integers, where their actual value is stored at that particular variable name. Then we have some where things like DNS responses, where there's a bit of nesting going on. These nested ones, we can access by calling packet fields on this particular layer, and loop over them. Going to skip over this for now because all of this functionality is really the same as we've just looked at. We're just looking at the various parts within those packet fields, rather than the top-level ones. At the end of the day, when we're done calling protocol analysis and field analysis, we should have information on which of these layers are the most common. We also should know for each field within the layers of interest, what they're entropy length and encoding are for each of the packets where we've observed them. After we've done that, we can go and in this case, we'll print out the information so that we can take a look at it, rather than perhaps passing it on to another function for processing, and some automated decision-making for what we'll use for command and control. We're going to loop over the various fields that we've looked at. First we'll loop over the protocols and print out them in their counts, then we'll loop over the fields, we'll calculate the average entropy of the field, the average length of the field, and how many instances of each encoding we've seen, URL encoding and base-64 encoding. Then we'll print out all of that information. At this point, let's take a look at what we get when we run the code. We're going to be looking at that traffic dot, pcap file with traffic analyzer dot py. Give this a moment to actually execute our code. In the end, what we should see is that list of the various layers that we're looking at and the counts of the instances of each layer. When it's completed running, we'll also be able to see again those field values and the average values for each of them. Here we go. Scrolling up to the top here, we see we've got a lot of different fields here. Here are various protocols. We see that there were 38 DNS layers, 61 HTTP requests, 44 responses, and a bunch of just raw layers. The raw layers are very common because it can show up in a bunch of different types of packets. However, when we get into these various fields, we're specifying the exact stack of layers that gets us to a particular value. Here is an example of our first one. We have an ethernet layer with an IP packet inside that encapsulated and that is UDP packet. Finally, we find out that we're creating a DNS packet. Then within the queries, we see that the queue name field or the domain name that we're requesting information about, is what the actual field where we might be able to store some information is located. We see 38 instances of this. The reason for that is there are 19 requests and 19 responses. Add those together, 38, because the question shows up in both the request and the response. We see that in this particular case they have an average length of a little over 17 bytes and average entropy of about 2.4. In this case, entropy higher values are better. Then we see that none of it looks like it's URL encoded. About a third may or may not be base-64 encoded. I'll come back to that in a later video, why we're uncertain about that. But moving through these, we see a variety of different fields that we might be able to use, based off of their accounts. For example, like here we've got a better entropy, 2.7, but there's only one instance of this particular header within HTTP requests, so maybe not a great choice. But lower down, let's find a instance of a raw payload. Here we go. HTTP response with a payload, we see if there were 43 of them. On average, these payloads are very large, gives us an opportunity to perform a great deal of data exfiltration or having data malware commands, etc, come into the malware or into the network. We see that on average its entropy is relatively high, 4.5, and there's the potential that we have some base-64 encoding there in about half of the results. Looking at this, none of this guarantees that a particular field is good for command and control or data exfiltration. But we see a few options that we might want to look into further when building a packet for command and control, and some command and control infrastructure. In the next few videos, we're going to take a look at how we calculated some of these values, the entropy, the encodings, etc. Then after that, we'll look into building packets based off of the results that we see here. Thank you.