vSphere Networking : Traffic Marking

vSphere network quality control features like the Network I/O Control (NIOC) feature is focused on the virtual networking layer within in a VMware virtual data center. But what about the physical network layer and how the two can cooperate?

In converged infrastructures or enterprise networking environments, Quality of Service (QoS) is commonly configured in the physical network layers. QoS is the ability to provide different priorities to network flows, or to guarantee a certain level of performance to a network flow by using tags. In vSphere 6.7, you have the ability to create flow-based traffic marking policies to mark network flows for QoS.

Quality of Service

vSphere 6.7 supports Class of Service (CoS) and Differentiated Services Code Point (DSCP). Both are QoS mechanisms used to differentiate traffic types to allow for policing network traffic flows.

As related to network technology, CoS is a 3-bit field that is present in an Ethernet frame header when 802.1Q VLAN tagging is present. The field specifies a priority value between 0 and 7, more commonly known as CS0 through CS7, that can be used by quality of service (QoS) disciplines to differentiate and shape/police network traffic. Source: https://en.wikipedia.org/wiki/Class_of_service

One of the main differentiators is that CoS operates at data link layer in an Ethernet based network (layer-2). DSCP operates at the IP network layer (layer-3).

Differentiated services or DiffServ is a computer networking architecture that specifies a simple and scalable mechanism for classifying and managing network traffic and providing quality of service (QoS) on modern IP networks. DiffServ uses a 6-bit differentiated services code point (DSCP) in the 8-bit differentiated services field (DS field) in the IP header for packet classification purposes. Source: https://en.wikipedia.org/wiki/Differentiated_services

When a traffic marking policy is configured for CoS or DSCP, its value is advertised towards the physical layer to create an end-to-end QoS path.

Traffic marking policies are configurable on Distributed port groups or on the DvUplinks. To match certain traffic flows, a traffic qualifier needs to be set. This can be realized using very specific traffic flows with specific IP address and TCP/UDP ports or by using a selected traffic type. The qualifier options are extensive. (more…)

Read More

TCP Segmentation Offload in ESXi explained

TCP Segmentation Offload (TSO) is the equivalent to TCP/IP Offload Engine (TOE) but more modeled to virtual environments, where TOE is the actual NIC vendor hardware enhancement. It is also known as Large Segment Offload (LSO). But what does it do?

When a ESXi host or a VM needs to transmit a large data packet to the network, the packet must be broken down to smaller segments that can pass all the physical switches and possible routers in the network along the way to the packet’s destination. TSO allows a TCP/IP stack to emit larger frames, even up to 64 KB, when the Maximum Transmission Unit (MTU) of the interface is configured for smaller frames. The NIC then divides the large frame into MTU-sized frames and prepends an adjusted copy of the initial TCP/IP headers. This process is referred to as segmentation.

When the NIC supports TSO, it will handle the segmentation instead of the host OS itself. The advantage being that the CPU can present up to 64 KB of data to the NIC in a single transmit-request, resulting in less cycles being burned to segment the network packet using the host CPU. To fully benefit from the performance enhancement, you must enable TSO along the complete data path on an ESXi host. If TSO is supported on the NIC it is enabled by default.

The same goes for TSO in the VMkernel layer and for the VMXNET3 VM adapter but not per se for the TSO configuration within the guest OS. To verify that your pNIC supports TSO and if it is enabled on your ESXi host, use the following command: esxcli network nic tso get. The output will look similar the following screenshot, where TSO is enabled for all available pNICs or vmnics.

(more…)

Read More

Virtual Networking: Poll-mode vs Interrupt

The VMkernel is relying on the physical device, the pNIC in this case, to generate interrupts to process network I/O. This traditional style of I/O processing incurs additional delays on the entire data path from the pNIC all the way up to within guest OS. Processing I/Os using interrupt based mechanisms allows for CPU saving because multiple I/Os are combined in one interrupt. Using poll mode, the driver and the application running in the guest OS will constantly spin waiting for an I/O to be available. This way, an application can process the I/O almost instantly instead of waiting for an interrupt to occur. That will allow for lower latency and a higher Packet Per Second (PPS) rate.

An interesting fact is that the world is moving towards poll-mode drivers. A clear example of this is the NVMe driver stack.

The main drawback is that the poll-mode approach consumes much more CPU time because of the constant polling for I/O and the immediate processing. Basically, it consumes all the CPU you offer the vCPUs used for polling. Therefore, it is primarily useful when the workloads running on your VMs are extremely latency sensitive. It is a perfect fit for data plane telecom applications like a Packet GateWay (PGW) node as part of a Evolved Packet Core (EPC) in a NFV environment or other real-time latency sensitive workloads.

Using the poll-mode approach, you will need a pollmode driver in your application which polls a specific device queue for I/O. From a networking perspective, Intel’s Data Plane Development Kit (DPDK) delivers just that. You could say that the DPDK framework is a set of libraries and drivers to allow for fast network packet processing.

Data Plane Development Kit (DPDK) greatly boosts packet processingperformance and throughput, allowing more time for data plane applications. DPDK can improve packet processing performance by up to ten times. DPDK software running on current generation Intel®Xeon® Processor E5-2658 v4, achieves 233 Gbps (347 Mpps) of LLC forwarding at 64-byte packet sizes. Source: http://www.intel.com/content/www/us/en/communications/data-planedevelopment-kit.html

DPDK in a VM

Using a VM with a VMXNET3 network adapter, you already have the default paravirtual network connectivity in place. The following diagram shows the default logical paravirtual device connectivity.

(more…)

Read More

Distributed Storage Network Topology

This is a short write-up about why you should consider a certain network topology when adopting scale-out storage technologies in a multi-rack environment. Without going into too much detail, I want to accentuate the need to follow the scalable distributed storage model when it comes to designing your Ethernet storage network. To be honest, it is probably the other way around. The networking experts in this world introduced scalable network architectures, while maintaining consistent and predictable latency, for a long time now. The storage world is just catching up.

Today, we have the ability to create highly scalable distributed storage infrastructures, following Hyper-Converged Infrastructures (HCI) innovations. Because the storage layer is distributed across ESXi hosts, a lot of point-to-point Ethernet connections between ESXi hosts will be utilized for storage I/O’s. Typically, when a distributed storage solution (like VMware vSAN) is adopted, we tend to create a pretty basic layer-2 network. Preferably using 10GbE or more NIC’s, line-rate capable components in a non-blocking network architecture with enough ports to support our current hosts. But once we scale to an extensive number of ESXi hosts and racks, we face challenges on how to facilitate the required network interfaces to connect to our ESXi hosts and how to connect the multiple Top of Rack (ToR) switches to each other. That is where the so-called spine and leaf network architecture comes into play.

Spine-Leaf

Each leaf switch, in a spine-leaf network architecture, connects to every spine switch in the fabric. Using this topology, the connection between two ESXi hosts will always traverse the same number of network hops when the hosts are distributed across multiple racks. Such a network topology provides a predictable latency, thus consistent performance, even though you keep scaling out your virtual datacenter. It is the consistency in performance that makes the spine/leaf network architecture so suitable for distributed storage solutions.

An exemplary logical spine-leaf network architecture is shown in the following diagram:

(more…)

Read More

VXLAN Offloading Support

Modern physical NICs (pNIC) have several offloading capabilities. If you are running VMware NSX, which is using VXLAN, you could benefit from the VXLAN offloading feature. Using VXLAN offloading allows you to use TCP offloading mechanisms like TCP Segment Offload (TSO) and Checksum Segment Offload (CSO) because the pNIC is able to ‘look into’ encapsulated VXLAN packets. That results in lower CPU utilization and a possible performance gain. But how to determine what is actually supported by your pNIC and the used driver in ESXi?

It is recommended to follow these three steps to fully verify if the VXLAN offload feature you are looking for is supported and enabled.

Step 1: Check the support of the pNIC chipset
Step 2: Check the support of the driver module
Step 3: Check if the driver module needs configuration

The first step is to check the vendor information about the supported features on their pNIC product. Let’s take the combination of a 10GbE Broadcom QLogic 57810 NIC and the VXLAN offload feature as an example. Looking at the datasheet of the QLogic 57810 NIC, it clearly states that VXLAN offloading is supported.

(more…)

Read More

IoT – Everything and everyone smartly connected

The subject Internet of Things (IoT) is intriguing to me, both from a technical and personal point-of-view. IoT to me is where all current cloud initiatives and network virtualization innovations come together, complemented with new technologies. Today, we are realizing highly scalable cloud platforms that are secure and easy to manage as they serve our workloads and store data. Building these scalable cloud platforms and seeing traction on cloud native applications is another push to support IoT innovations. It is fun to see that we are moving towards adopting cloud technologies, but are already on the verge of entering new technology rapids.

Innovation is moving at a scarily fast pace. – Bill Gates

Looking at IoT and its applications, one will quickly notice that the possibilities are immense! The future will be data driven. All our devices, be it a mobile phone, smartwatch, car, plane or refrigerator, will gather data that will lead to more insights on how we as human beings operate and consume. That will lead to services being specifically tailored per individual. And what about enterprises and industrial components? Imagine what can be done in every industry vertical. We can only try to understand the opportunities.

Data in Edge

We may be somewhat dismissive about every device being connected to the ‘internet’, but IoT is already happening. Think about ordering sushi or fast food in selected restaurants where the order is entered using a tablet or similar device. Or what about a hospital where patient data is gathered by sensors and is accessible on mobile devices for the nurses. Think about a car that is collecting data about the driver and his driving characteristics, or the data that is generated by the vehicle itself. Each sensor in a device will collect data, we are talking about a lot of data mining here… Think Gigabytes of data from a single car drive or Terabytes per plane per flight.

The future includes Brontobytes and Geobytes – Michael Dell

That raises the question; is all this data consumable? How are we going to collect all that data? How do we process, analyze and store that data? At first looks, you may think that all the data is ingested in a central cloud environment. But that is not efficient. Because we are talking about a prodigious amount of data it only seems logical to process, analyse and even compress the data locally before sending the important catches of data over to a centralized platform.

Local data processing will require local compute and storage resources, also referred to as edge computing. So even though we have cloud computing as a tool execute data processing, it seems inevitable that data will use local compute and storage power on edge devices. It will be interesting to track the application and development of efficient edge compute solutions (like nano-servers, further usage of Field-programmable Gate Array (FPGA) chipsets, etc.) in the near future.

Moving to edge computing is interesting because today we are in the process of centralizing IT in cloud solutions while IoT innovations will lead to decentral systems, although accompanied by central systems. They will supplement each other.

NFV

A very important factor for IoT and edge devices will be the telecom providers and as they will provide 5G services. The rollout of 5G is a key driver for IoT as it allows for a very low and consistent latency while increasing the bandwidth which is required to support connectivity to the edge devices. This will be a necessity as we will be transferring a lot of data from a massive amount of IoT devices to a centralised cloud platform. The ability to create highly scalable and performant 5G platforms will depend strongly on Network Functions Virtualization (NFV).

Telco operators are working hard on moving from a monolithic telco workload landscape to fully virtualized platforms. VMware helps to thrive NFV adoption with vCloud for NFV that includes vSphere, NSX and related components. It is very important to squeeze every bit out of hardware and the ESXi  / VM layer to realize consolidation ratios and lower costs while building a consistent performant fundament for NFV. That was one of the key reasons for us (Frank and myself) to write the Host Deep Dive book.

In the process of working towards NFV and IoT, are we finally forced to adopt IPv6? I would say ‘yes’! IPv4 will simply not cut it anymore, as we all said well before 2010.

To conclude…

We have enough challenges to deal with, but we can already see new solutions being developed to deal with security and manageability of IoT and edge components. Cloud (native) platforms are gaining traction and so is NFV.

I am still trying to wrap my head around the endless possibilities and try to understand the impact of the IoT and edge wave, powered by powerful NFV and cloud platforms. Very exciting times ahead!! We are only scratching the surface of what can be done with IT.

 

Read More