So having identified the performance issues in implementing Virtual Network Functions, let's now talk about Performance-conscious implementation of these Virtual Network Functions, so that's going to be the next lesson. So when we circle back to this big picture of virtualizing Network Functions, what we wanna ask is what is the support that we can get from the platform? And we already know that the vendors are providing some support in terms of smarts in the NIC. So for instance, Intel VT-d provides a means by which you can bypass the hypervisor and go directly to the VM, that is the Guest Kernel, which is usually Linux. And of course, the Network Function is living on top of Linux. And so this is a good news in the sense that we bypass the VMM, right? And directly go on into the VM. Now, but the problem is, as I said, the Guest Kernel is presenting a new bottleneck, and that's the thing that we revisited in the previous lesson, just now, specifically for network functions implemented as an application on top of Linux. And remember that the sole purpose of a Network Function application, such as a load balancer, for instance, is to read and write packets from the NIC and to the NIC, right? And the slowdown due to the kernel is very prohibitive for such Network Functions, and I have to tell you a little bit about what is going on in terms of hardware technology. As we know, the networks are getting very fast, but on the other hand, CPUs are not keeping up with it. And this is partly because there was a time when we had Moore's Law working perfectly which predicted that, we will have doubling of transistor density every two years. And with the doubling of transistor density, you'll also shrinking the feature size and therefore, you can pluck them faster, and so the all of that we expect it to result in fastest CPUs. And there was also another result called Dennard scaling, which said that the power consumption in doing so, when we go from technology to technology, power consumption per unit area remains the same. And Dennard scaling actually gave a lot of hope for most load work for a long time, but unfortunately, Dennard scaling broke down in 2006. Again, Dennard scaling Is essentially saying that the power per per unit area remains the same. And therefore for a given chip size in terms of silicon, relay state, the CPU power remains the same, your clock frequency can go up, and so on. But that broke down in 2006, and the reason is because the leakage current started dominating, that's one of the primary reasons why Dennard scaling broke down. And the implication of that, is that CPU clock frequencies are not increasing significantly from generation to generation, as was expected with Moore's law, and as a result, CPU speeds are not keeping up with the network speeds. So in other words, the NICs can handle more packets per second, because the Network Interface Cards are getting faster, and fortunately that puts increasing pressure on the CPU to do the packet processing. So it is not sufficient to bypass the VMM for Network Function virtualization. In order to make this really efficient, you have to cut down the amount of CPU processing that happens, and also all the overheads associated with the packet processing that we just mentioned. So in particular, we should bypass the kernel as well, so not just enough to bypass the VMM but bypass the kernel as well. So that all the things that we mentioned as performance limitations in implementing virtual network functions on a Linux operating system can be mitigated. There are alternatives that are being offered for performance conscious packet processing. And basically, all of these alternatives have one thing in common, that is the bypass, the Linux kernel. And there have been several proposals for that, Netmap is one, PF_RING ZC is another one, and DPDK which is the Data Plane Development Kit, which is originally proposed by Intel and then taken over by the Linux Foundation. That's another way of doing performance-conscious packet processing. And all of these alternatives have certain common features, they rely on polling to read the packets instead of interrupts. That's one thing, and also they pre-allocate the buffers for the packet, so that you're not allocating the buffers on the fly, but you're pre-allocating the buffers. And also, we do zero-copy packet processing, or in other words, the NIC uses the DMA, to write the packets that are pre-allocated in application level buffers, right? So in other words, we're bypassing the kernel buffers directly getting the packet into the application buffers. And then the other thing that these alternatives also want to do is process packets in batches, as opposed to individually. So these are all the ways by which you can reduce the impact of traversing the kernel stack for every packet that comes in. That's the idea behind the performance-conscious packet processing alternatives.