Assignable Hardware in vSphere 7

–originally authored and posted by me at–

A wide variety of modern workloads greatly benefit from using hardware accelerators to offload certain capabilities, save CPU cycles, and gain a lot of performance in general. Think about the telco industry for example: Network Function Virtualization (NFV) platforms utilizing NICs and FPGAs. Or customers that use GPUs for graphic acceleration in their Virtual Desktop Infrastructure (VDI) deployment. The AI/ML space is another example of workloads where applications are enabled to use GPUs to offload computations. To utilize a hardware accelerator with vSphere, typically a PCIe device, the device needs to be exposed to guest OS running inside the virtual machine.

In vSphere versions prior to vSphere 7, a virtual machines specifies a PCIe passthrough device by using its hardware address. This is an identifier that points to a specific physical device at a specific bus location on that ESXi host. This restricts that virtual machine to that particular host. The virtual machine cannot easily be migrated to another ESXi host with an identical PCIe device. This can impact the availability of the application using the PCIe device, in the event of host outage. Features like vSphere DRS and HA are not able to place that virtual machine on a another, or surviving, host in the cluster. It takes manual provisioning and configuration to be able to move that virtual machine to another host.

We do not want to compromise on application availability and ease of deployment. Assignable Hardware is a new feature in vSphere 7 that overcomes these challenges.

Introducing Assignable Hardware

Assignable Hardware in vSphere 7 provides a flexible mechanism to assign hardware accelerators to workloads. This mechanism identifies the hardware accelerator by attributes of the device rather than by its hardware address. This allows for a level of abstraction of the PCIe device. Assignable Hardware implements compatibility checks to verify that ESXi hosts have assignable devices available to meet the needs of the virtual machine.

It integrates with Distributed Resource Scheduler (DRS) for initial placement of workloads that are configured with a hardware accelerator. This also means that Assignable Hardware brings back the vSphere HA capabilities to recover workloads (that are hardware accelerator enabled) if assignable devices are available in the cluster. This greatly improves workload availability.


The Assignable Hardware feature has two consumers; The new Dynamic DirectPath I/O and NVIDIA vGPU. (more…)

Read More

Improved DRS in vSphere 7

–originally authored and posted by me at–

The first release of Distributed Resource Scheduling (DRS) dates back to 2006. Since then, data centers and workloads have changed significantly. The new vSphere 7 release is shipped with DRS enhancements to better support modern workloads by using an improved DRS logic and new accompanying UI in the vSphere Client.

The enhanced DRS logic is now workload-centric rather than cluster-centric, as it was before with DRS. The DRS logic is completely rewritten to have a more fine-grained level of resource scheduling with the main focus on workloads. This blog post goes into detail on the new DRS algorithm, and explains how to interpret the metrics as seen in the new UI.

The Old DRS

vSphere DRS used to focus on the cluster state, checking if it needs rebalancing because it could happen that one ESXi host is over-consumed while another ESXi host has less resources consumed. DRS runs every 5 minutes, and if the DRS logic determined it could improve the cluster balance, it would recommend and execute a vMotion depending on the configured settings. That way, DRS used to achieve cluster balance by using a cluster-wide standard deviation model.

The New DRS

The new DRS logic takes a very different approach. It computes a VM DRS score on each host and moves the VM to the host that provides the highest VM DRS score.

The biggest change from the old DRS version is that it no longer balances host load directly. Instead, it improves the balancing by focusing on the metric that you care most about: the virtual machine happiness. Important to note is that the improved DRS now runs every minute, providing a more granular way to calculate workload placement and balancing. This results in overall better performance of the workloads.

VM DRS Score


Read More

vMotion Enhancements in vSphere 7

–originally authored and posted by me at–

The vSphere vMotion feature enables customers to live-migrate workloads from source to destination ESXi hosts. Over time, we have developed vMotion to support new technologies. The vSphere 7 release is no exception to that, as we greatly improved the vMotion feature. The vMotion enhancements in vSphere 7 include a reduced performance impact during the live migration and a reduced stun time. This blog post will go into detail on how the vMotion improvements help customers to be comfortable using vMotion for large workloads.

To understand what we improved for vMotion in vSphere 7, it is imperative to understand the vMotion internals. Read the vMotion Process Under the Hood to learn more about the vMotion process itself. (more…)

Read More

How is Virtual Memory Translated to Physical Memory?

–originally authored and posted by me at–


Memory is one of the most important host resources. For workloads to access global system memory, we need to make sure virtual memory addresses are mapped to the physical addresses. Several components are working together to perform these translations as efficiently as possible. This blog post will cover the basics on how virtual memory addresses are translated.

Memory Translations

The physical address space is your system RAM, the memory modules inside your ESXi hosts, also referred to as the global system memory. When talking about virtual memory, we are talking about the memory that is controlled by an operating system, or a hypervisor like vSphere ESXi. Whenever workloads access data in memory, the system needs to look up the physical memory address that matches the virtual address. This is what we refer to as memory translations or mappings.

To map virtual memory addresses to physical memory addresses, page tables are used. A page table consists of numerous page table entries (PTE).

One memory page in a PTE contains data structures consisting of different sizes of ‘words’. Each type of word contains multiple bytes of data (WORD (16 bits/2 bytes), DWORD (32 bits/4 bytes) and QWORD (64 bits/8 bytes)). Executing memory translations for every possible word, or virtual memory page, into physical memory address is not very efficient as this could potentially be billions of PTE’s. We need PTE’s to find the physical address space in the system’s global memory, so there is no way around them.

To make memory translations more efficient, we use page tables to group chunks of memory addresses in one mapping. Looking at an example of a DWORD entry of 4 bytes; A page table covers 4 kilobytes instead of just the 4 bytes of data in a single page entry. For example, using a page table, we can translate virtual address space 0 to 4095 and say this is found in physical address space 4096 to 8191. Now we no longer need to map all the PTE’s separately, and be far more efficient by using page tables.


Read More