VXLAN and Multiple Receive Threads explained

The (Dynamic) Netqueue feature in ESXi, which is enabled by default if the physical NIC (pNIC) supports it, allows incoming network packets to be distributed over different queues. Each queue gets its own ESXi thread for packet processing. One ESXi thread represents a CPU core.

However, (Dynamic) NetQueue and VXLAN are not the best of friends when it comes to distributing network I/O over multiple queues. That is because of the way Virtual Tunnel End Points (VTEP) are set up. Within a VMware NSX implementation, each ESXi host in the cluster contains at least one VTEP, dependent upon the NIC load balancing mode chosen. The VTEP is the component that provides the encapsulation and decapsulation for the VXLAN packets. That means all VXLAN network traffic from a VM perspective will traverse the VTEP and the receiving VTEP on another ESXi host.

Therein lies the problem when it comes to NetQueue and the ability to distribute network I/O streams over multiple queues. This is because a VTEP will always have the same MAC address and the VTEP network will have a fixed VLAN tag. MAC address and VLAN tag are the filters most commonly supported by pNICs with VMDq and NetQueue enabled. It will seriously restrict the ability to have multiple queues and thereby will possibly restrict the network performance for your VXLAN networks. VMware NSX now supports multi-VTEPs per ESXi host. This helps slightly as a result of the extra MAC addresses, because of the increased number of VTEPs per ESXi host. NetQueue can therefore have more combinations to filter on. Still, it is far from perfect when it comes to the
desired network I/O parallelism handling using multiple queues and CPU cores. To overcome that challenge, there are some pNICs that support the distributing of queues by filtering on inner (encapsulated) MAC addresses. RSS can do that for you.


Read More