Distributed Storage Network Topology

This is a short write-up about why you should consider a certain network topology when adopting scale-out storage technologies in a multi-rack environment. Without going into too much detail, I want to accentuate the need to follow the scalable distributed storage model when it comes to designing your Ethernet storage network. To be honest, it is probably the other way around. The networking experts in this world introduced scalable network architectures, while maintaining consistent and predictable latency, for a long time now. The storage world is just catching up.

Today, we have the ability to create highly scalable distributed storage infrastructures, following Hyper-Converged Infrastructures (HCI) innovations. Because the storage layer is distributed across ESXi hosts, a lot of point-to-point Ethernet connections between ESXi hosts will be utilized for storage I/O’s. Typically, when a distributed storage solution (like VMware vSAN) is adopted, we tend to create a pretty basic layer-2 network. Preferably using 10GbE or more NIC’s, line-rate capable components in a non-blocking network architecture with enough ports to support our current hosts. But once we scale to an extensive number of ESXi hosts and racks, we face challenges on how to facilitate the required network interfaces to connect to our ESXi hosts and how to connect the multiple Top of Rack (ToR) switches to each other. That is where the so-called spine and leaf network architecture comes into play.


Each leaf switch, in a spine-leaf network architecture, connects to every spine switch in the fabric. Using this topology, the connection between two ESXi hosts will always traverse the same number of network hops when the hosts are distributed across multiple racks. Such a network topology provides a predictable latency, thus consistent performance, even though you keep scaling out your virtual datacenter. It is the consistency in performance that makes the spine/leaf network architecture so suitable for distributed storage solutions.

An exemplary logical spine-leaf network architecture is shown in the following diagram:


Read More

VXLAN Offloading Support

Modern physical NICs (pNIC) have several offloading capabilities. If you are running VMware NSX, which is using VXLAN, you could benefit from the VXLAN offloading feature. Using VXLAN offloading allows you to use TCP offloading mechanisms like TCP Segment Offload (TSO) and Checksum Segment Offload (CSO) because the pNIC is able to ‘look into’ encapsulated VXLAN packets. That results in lower CPU utilization and a possible performance gain. But how to determine what is actually supported by your pNIC and the used driver in ESXi?

It is recommended to follow these three steps to fully verify if the VXLAN offload feature you are looking for is supported and enabled.

Step 1: Check the support of the pNIC chipset
Step 2: Check the support of the driver module
Step 3: Check if the driver module needs configuration

The first step is to check the vendor information about the supported features on their pNIC product. Let’s take the combination of a 10GbE Broadcom QLogic 57810 NIC and the VXLAN offload feature as an example. Looking at the datasheet of the QLogic 57810 NIC, it clearly states that VXLAN offloading is supported.


Read More

IoT – Everything and everyone smartly connected

The subject Internet of Things (IoT) is intriguing to me, both from a technical and personal point-of-view. IoT to me is where all current cloud initiatives and network virtualization innovations come together, complemented with new technologies. Today, we are realizing highly scalable cloud platforms that are secure and easy to manage as they serve our workloads and store data. Building these scalable cloud platforms and seeing traction on cloud native applications is another push to support IoT innovations. It is fun to see that we are moving towards adopting cloud technologies, but are already on the verge of entering new technology rapids.

Innovation is moving at a scarily fast pace. – Bill Gates

Looking at IoT and its applications, one will quickly notice that the possibilities are immense! The future will be data driven. All our devices, be it a mobile phone, smartwatch, car, plane or refrigerator, will gather data that will lead to more insights on how we as human beings operate and consume. That will lead to services being specifically tailored per individual. And what about enterprises and industrial components? Imagine what can be done in every industry vertical. We can only try to understand the opportunities.

Data in Edge

We may be somewhat dismissive about every device being connected to the ‘internet’, but IoT is already happening. Think about ordering sushi or fast food in selected restaurants where the order is entered using a tablet or similar device. Or what about a hospital where patient data is gathered by sensors and is accessible on mobile devices for the nurses. Think about a car that is collecting data about the driver and his driving characteristics, or the data that is generated by the vehicle itself. Each sensor in a device will collect data, we are talking about a lot of data mining here… Think Gigabytes of data from a single car drive or Terabytes per plane per flight.

The future includes Brontobytes and Geobytes – Michael Dell

That raises the question; is all this data consumable? How are we going to collect all that data? How do we process, analyze and store that data? At first looks, you may think that all the data is ingested in a central cloud environment. But that is not efficient. Because we are talking about a prodigious amount of data it only seems logical to process, analyse and even compress the data locally before sending the important catches of data over to a centralized platform.

Local data processing will require local compute and storage resources, also referred to as edge computing. So even though we have cloud computing as a tool execute data processing, it seems inevitable that data will use local compute and storage power on edge devices. It will be interesting to track the application and development of efficient edge compute solutions (like nano-servers, further usage of Field-programmable Gate Array (FPGA) chipsets, etc.) in the near future.

Moving to edge computing is interesting because today we are in the process of centralizing IT in cloud solutions while IoT innovations will lead to decentral systems, although accompanied by central systems. They will supplement each other.


A very important factor for IoT and edge devices will be the telecom providers and as they will provide 5G services. The rollout of 5G is a key driver for IoT as it allows for a very low and consistent latency while increasing the bandwidth which is required to support connectivity to the edge devices. This will be a necessity as we will be transferring a lot of data from a massive amount of IoT devices to a centralised cloud platform. The ability to create highly scalable and performant 5G platforms will depend strongly on Network Functions Virtualization (NFV).

Telco operators are working hard on moving from a monolithic telco workload landscape to fully virtualized platforms. VMware helps to thrive NFV adoption with vCloud for NFV that includes vSphere, NSX and related components. It is very important to squeeze every bit out of hardware and the ESXi  / VM layer to realize consolidation ratios and lower costs while building a consistent performant fundament for NFV. That was one of the key reasons for us (Frank and myself) to write the Host Deep Dive book.

In the process of working towards NFV and IoT, are we finally forced to adopt IPv6? I would say ‘yes’! IPv4 will simply not cut it anymore, as we all said well before 2010.

To conclude…

We have enough challenges to deal with, but we can already see new solutions being developed to deal with security and manageability of IoT and edge components. Cloud (native) platforms are gaining traction and so is NFV.

I am still trying to wrap my head around the endless possibilities and try to understand the impact of the IoT and edge wave, powered by powerful NFV and cloud platforms. Very exciting times ahead!! We are only scratching the surface of what can be done with IT.


Read More

VXLAN and Multiple Receive Threads explained

The (Dynamic) Netqueue feature in ESXi, which is enabled by default if the physical NIC (pNIC) supports it, allows incoming network packets to be distributed over different queues. Each queue gets its own ESXi thread for packet processing. One ESXi thread represents a CPU core.

However, (Dynamic) NetQueue and VXLAN are not the best of friends when it comes to distributing network I/O over multiple queues. That is because of the way Virtual Tunnel End Points (VTEP) are set up. Within a VMware NSX implementation, each ESXi host in the cluster contains at least one VTEP, dependent upon the NIC load balancing mode chosen. The VTEP is the component that provides the encapsulation and decapsulation for the VXLAN packets. That means all VXLAN network traffic from a VM perspective will traverse the VTEP and the receiving VTEP on another ESXi host.

Therein lies the problem when it comes to NetQueue and the ability to distribute network I/O streams over multiple queues. This is because a VTEP will always have the same MAC address and the VTEP network will have a fixed VLAN tag. MAC address and VLAN tag are the filters most commonly supported by pNICs with VMDq and NetQueue enabled. It will seriously restrict the ability to have multiple queues and thereby will possibly restrict the network performance for your VXLAN networks. VMware NSX now supports multi-VTEPs per ESXi host. This helps slightly as a result of the extra MAC addresses, because of the increased number of VTEPs per ESXi host. NetQueue can therefore have more combinations to filter on. Still, it is far from perfect when it comes to the
desired network I/O parallelism handling using multiple queues and CPU cores. To overcome that challenge, there are some pNICs that support the distributing of queues by filtering on inner (encapsulated) MAC addresses. RSS can do that for you.


Read More

Using CPU limits in vCloud for NFV

Looking at the VMware vCloud for NFV proposition, you will notice that vCloud Director (vCD) is one of the options for the Virtualized Infrastructure Manager (VIM) layer based on the ETSI framework.

VMware vCloud NFV supports two integrated virtualized infrastructure managers (VIMs): native VMware vCloud Director or VMware Integrated OpenStack, a full OpenStack implementation that is completely tested and integrated. Both VIMs support templated service descriptions as well as multi-tenancy and robust networking, enabling the automation of on-boarding VNFs with the acceleration of configuring and allocating compute, storage and networking resources.

As mentioned, vCD is used for multi tenancy and providing a management layer for the tenants to spin up new workloads within their own set of resources. Now how the compute resources are provided to the workloads is very interesting in some telco workload cases. vCD provides three different ways to deliver compute resources to the tenant vApps:

  • Allocation Pool
    A percentage of the resources you allocate are committed to the Organization virtual DataCenter (OvDC). You can specify the percentage, which allows you to overcommit resources.
  • Pay-As-You-Go
    Allocated resources are only committed when users create vApps in the OvDC. You can specify the maximum amount of CPU and memory resources to commit to the OvDC.
  • Reservation Pool
    All of the resources you allocate are committed to the OvDC.

More information on allocation models in vCD can be found on Duncan’s blogpost, its one of his older posts but still accurate. The Pay-As-You-Go allocation model seems to be a popular choice because it enforces the entitlement for specific resources per Virtual Machine (VM). It does so by setting a reservation and a limit on each VM in the vApp / OvDC using a configurable vCPU speed. That means a VM can only consume CPU cycles as configured in the OvDC. See the following example to get a better feeling on what is configurable within a Pay-As-You-Go OvDC.


Now, bring the fact that limits and reservations are placed on a VM back to the fact that network I/O and the required CPU time to process that network I/O is accounted to the VM. Check the Virtual Machine Tx threads explained post to get a better understanding of how the VMkernel behaves when VMs transmit network packets. Because the CPU cycles used by the Tx threads are accounted to the VM, you should be very careful with applying CPU limits to the VM. CPU limits can seriously impact the network I/O performance and packet rate capability!

A severe performance impact can be expected when you deploy NFV Telco workloads that are known for high network utilization when using CPU limits. A clear example is a virtual Evolved Packet Core (vEPC) node, for instance the Packet Gateway (PGW). VMs used in these nodes are known to have a large appetite for network I/O and may be using Intel DPDK to thrive network packet rates.

Several vCPUs in the VMs will be configured in DPDK to poll a queue. Those vCPUs will claim all the cycles they can get their hands on! You should combine that with the additional CPU time required to process the transmitted network I/Os to fully understand the behaviour of such a VM and its need for CPU time. Only then can you make the correct decisions related to allocation models.

So be aware of the possible implications that CPU limits may introduce on network performance. Maybe it is better for certain telco NFV workloads to opt for another allocation model.

More information on how the VMware vCloud for NFV proposition is helping the telco industry and the LTE/4G and 5G innovations is found here: https://vmware.regalixdigital.com/nfv-ebook/automation.html. Be sure to check it out!

Read More

VMworld 2017 session picks

VMworld is upon us. The schedule builder went live and boy am I excited about VMworld 2017!


This year Frank and I are presenting the successor of last years session at both VMworlds. We are listed in the schedule builder as SER1872BU – vSphere 6.5 Host Resources Deep Dive: Part 2. We are planning to bring even more ESXi epicness with a slight touch of vSAN and networking information that allows you to prep ESXi to run NFV workloads that drive IoT innovations. Last year we were lucky to have packed rooms in Vegas and Barcelona.

The enthusiasm about our book witnessed so far shows us there is still a lot of love out there for the ESXi hypervisor and ‘under-the-hood’ tech! We are working hard on having an awesome session ready for you!

VMworld Session Picks

This year I want to learn more about NFV, IoT and Edge as I find innovation in these areas intriguing. I found some sessions that look to be very interesting. I supplemented these with talks held by industry titans about various topics. If my schedule lets me, I want to see the following sessions:

  • Leading the 5G and IoT Revolution Through NFV [FUT3215BUby Constantine Polychronopoulos
  • vSAN at the Edge: HCI for Distributed Applications Spanning Retail to IoT [STO2839GU] by Kristopher Groh
  • VMware Cloud Foundation Futures [PBO2797BU] by Raj Yavatkar
  • Machine Learning and Deep Learning on VMware vSphere: GPUs Are Invading the Software-Defined Data Center [VIRT1997BU] by Uday Kurkure and Ziv Kalmanovich
  • Managing Your Hybrid Cloud with VMware Cloud on AWS [LHC2971BU] by Frank Denneman and Emad Younis
  • The Top 10 Things to Know About vSAN [STO1264BU] by Duncan Epping and Cormac Hogan

There are way more interesting sessions going on! Be sure to find and schedule your favorite ones as rooms tend to fill up quickly!

See you in Vegas and Barcelona!!


Read More