Even though the jumbo frame and the possible gain and risk trade-offs discussion is not new, we found ourselves discussing it yet again. Because we had different opinions, it seems like a good idea to elaborate on this topic.
Let’s have a quick recap on what jumbo frames actually are. Your default MTU (Maximum Transmission Unit) for a ethernet frame is 1500. A MTU of 9000 is referred to as a jumbo frame.
Jumbo frames or 9000-byte payload frames have the potential to reduce overheads and CPU cycles.
Typically, jumbo frames are considered for IP storage networks or vMotion networks. A lot of performance benchmarking is already described on the web. It is funny to see a variety of opinions whether to adopt jumbo frames or not. Check this blogpost and this blogpost on jumbo frames performance compared to a standard MTU size. The discussion if ‘jumbo frames provide a significant performance advantage’ is still up in the air.
There are other techniques to improve network throughput and lower CPU utilization next to jumbo frames. A modern NIC will support the Large Segment Offload (LSO) and Large Receive Offload (LRO) offloading mechanisms. Note: LSO is also referenced as TSO (TCP Segmentation Offload). Both are configurable. LSO/TSO is enabled by default if the used NIC hardware supports it. LRO is enabled by default when using VMXNET virtual machine adapters.
Let’s put the performance aspects aside, and let us look into the possible risks involved when implementing jumbo frames. The thing is, in order to be effective, jumbo frames must be enabled end to end in the network path. The main risk when adopting jumbo frames, is that if one component in the network path is not properly configured for jumbo frames, a MTU mismatch occurs.
Your main concern should be if the network and storage components are correctly configured for jumbo frames. The situation that is interesting here in my opinion, is the following one:
I did a quick lab setup to capture the behavior of MTU mismatches in my IP storage (layer-2) network. The Guest OS in this scenario is a Windows 2012R2 VM running with a iSCSI LUN attached to it. I captured iSCSI (TCP 3260) frames on this VM on the dedicated iSCSI network adapter connected to the dedicated iSCSI VLAN.
My Cisco network switch is configured with a standard MTU size of 1500. So this is where the MTU mismatch occurs. Looking into the captured data, we immediately see TCP ACK unseen segments when doing writes to the LUN. Not what you want to see in your production environment. This can lead to misbehavior of your virtual machines and possible data corruption.
This behavior is expected though. Within a layer-2 network, the ethernet switch simply drops frames bigger that the port MTU. So you are silently black-holing traffic which exceeds the 1500 frame size. It is important to know that fragmentation will occur at layer-3, but not on layer-2 networks! This is a characteristic of layer-2 ethernet.
Typically, IP storage networks and vMotion networks are layer-2 networks (although vMotion now supports layer-3).
The same situation would occur if my storage interfaces on my NAS were mismatching in MTU. But, when I have the following situation…:…no harm would be done, because a frame of 1500 bytes simply fits very well in 9000. 🙂
Once I configured every component with jumbo frames / MTU 9000, we’re all good as the following capture shows:
To use jumbo frames or not to use jumbo frames, that is the question
Depending on the situation, I always used to implement jumbo frames where I saw fit. For instance, a greenfield implementation of an infrastructure would be configured with jumbo frames if the tests showed an improved performance. The risk that (future) infrastructure components are mis-configured, MTU-wise, can be mitigated by introducing strict change and configuration management along with configuration compliance checks.
The funny thing is, while we are adopting more and more overlay services in our infrastructures, the MTU sizes must change accordingly. Think about your transport network(s) in your VMware NSX environment. Because of the VXLAN encapsulation, you will be required to ‘up’ your MTU to 1600.
So changing the MTU on your infrastructure components is no longer a question…
Thinking about network performance; It will be good to see the impact on network performance using the LRO/LSO offloading compared to, or in combination with, jumbo-frames. Let’s see if I can define some sensible tests on that…