So now that I've introduced you the virtual network function, and I've given you load balancers one concrete example, now let's talk about the performance issues in implementing virtual network functions. Again, a preview of these performance issues in the previous lecture itself so we'll revisit that. And then see what happens when network functions are implemented as virtual network functions on top of a hypervisor. So one of the key things in implementing virtual network function is that we have to get rid of the overhead of virtualization, right? And the reason is because network function is the critical path of packet processing, as it is super important to eliminate and mitigate the overhead of virtualization. And introduce to you this technologies that are available from the vendors like Intel VT-d. Which allows the NIC to bypass the VMM, that is a virtual machine monitor or hypervisor, and directly get into the user-space buffers. And the way that is done is when the NIC gets a packet it wants to DMA it into memory. And by direct mapping, the user space buffers For DMA for the NIC, you can bypass the VMM. And similarly, we will pass the device interrupt directly to the VM above the virtual machine monitor. So, that's the way by which we can eliminate the overhead of virtualization so far as the virtual machine monitor or the hypervisor is concerned. Now, is that enough? Unfortunately, the answer is no. And in order to fully understand why this is so, we really have to look at the path of packet processing in an operating system like Unix. And remember what I told you about network functions, the typical way these network functions are implemented is implemented on top of an operating system such as Linux which is the popular platform for implementing such network functions. And therefore, it is important to understand what exactly is happening in terms of packet processing when you have an network application sitting on top of Linux. So if you look at packet processes in Linux, what happen is there is this NIC and it has a receive buffers and transfer buffers. And let's focus on the receive side for a minute. And basically what is happening is that when a network package comes in, the network interface has these buffers, receive buffers which are being allocated to the NIC. And it's gonna DMA the incoming packet into the receive buffer. And once this is done, the NIC will generate an interrupt and that'll be delivered to the operating system by the CPU. And what the operating system is gonna do is it's gonna take this packet that came in and allocate the kernel buffer for it and copy the DMA packet from the ring buffer of the NIC into the kernel buffer. So that the higher layers of the operating system stack in particular IP and TCP, they can start processing this packet. And so what we've done is we've taken this thing that was in the ring buffer and put it into a buffer that is available for these layers of the software to look at, right? So this is the OS kernel and do the hardware part. And this is the OS kernel and that's the one that is doing the copying as well as handling the interrupt from the NIC in order to do the necessary work. After the protocol processing is done by the IP and the TCP layers, the packet that has been that has been received is copied to the application buffer. Now we're getting into user-space for processing by the application. So, this is where the after this processing is done by these guys, through the socket interface, you're gonna get it into a buffer that's available for application processes. An application process that is running here it can access the packet that has come into user-space because of the operating system moving this packet into the socket buffer, for instance. Okay, so this is what typical packet processing in Linux looks like. Before I get to the problem, I'll tell you what exactly is going on. If you think about a networking app that is living on top of Linux Kernel, and for this purpose, let's think about a web server that is running on top of Linux. And so what it is going to do is, as I said, incoming packet goes through the kernel, and then goes through the TCP/IP layers. And finally, it gets delivered up to the application. And in this case, the application is the web server. So this pie chart that you're seeing is a breakdown of the time that is spent by the CPU in serving a 64-byte file by the web server on top of a particular instance of Linux 3.10. And this, of course, is from a published paper which appeared in USENIX conference. And it is giving a breakdown of the time that is spent in Linux Kernel for networking applications such as a web server. And in this case, what you see is that 80% of the CPU time is spent in the kernel. Okay, that is the accumulation of time in the kernel, the TCP account for 83%. And the application spends only 17% of the CPU time. And this is not good news, even for networking apps such as a web server. And it is really bad news for an application such as a network function whose job it is to do packet-level processing. So what we're gonna do now is look at what would happen if you think about network functions running on top of Linux Kernel. And the sources of performance hits that comes aboard, and this is something that came from the paper that I just mentioned that appeared in USENIX that quantified these different sources of overhead. If we look at the performance hits that you pay for a networking app, that is living on top of a Linux Kernel, there is a packet interrupt on every incoming packet, right? So there'll be an interrupt on every incoming packet because the packet is received by the NIC DMA into the next buffer. And once it is DMAd in there, it's gonna result in an interrupt up into the operating system. And, then at that point that is dynamic memory allocation that will be done on a per packet basis by the operating system that's a packet buffer. And then the interrupt service time that is associated with this incoming interrupt, then your context switch into the operating system kernel. In order to move the packet from the DMAd NIC buffer into the kernel space. And then again, context switch into the application implementing the network function, right? So, if you just think about a network function, this is what is going on. There's context switching to the kernel. And then your context switch up into the application, which is implementing the network function. And there's copying of packets that could happen multiple times from the DMA buffer into the kernel buffer. And from the kernel buffer into the user-space application buffer. And, depending on the type of network function that we wanna implement, you may or may not need to do the TCP/IP protocol stack traversal in the kernel depending on the functionality that you want. But, this slide is just to tell you what are all the sources of overhead that creeps in when you implement a network function on top of an operating system such as Linux.