This is part 3 of the VMware Stretched Cluster on IBM SVC blogpost series.
I explained how a SVC Split Cluster reacts to certain failure conditions in part 2. Now that we know how the storage layer behaves, let’s take a closer look at how this all ties in with the VMware layer. This is by no means a complete guide to every setting/configuration option involved, more of an excerpt of the ones I consider to be important. This post is based on vSphere 5.5.
VMware Stretched Cluster isn’t a feature you enable by ticking some boxes, it’s a design built around the workings of HA, DRS and a couple of other mechanisms.
First, I would like to briefly explain the concepts APD (All Paths Downs) and PDL (Permanent Device Loss).
In an All Paths Down scenario, the ESXi host loses all paths to the storage device. The host is unable to communicate with the storage array. Examples of failures that can trigger APD are a failing HBA or a failing SAN.
figure 1. APD
PDL is triggered when the ESXi host is able to communicate with the storage array. The storage array tells the host the LUN is permanently unavailable, by returning a certain SCSI sense code. PDL can occur in certain split-brain scenario’s.
figure 2. PDL
High Level Design
figure 3. High Level Design Stretched Cluster
Hosts in SITE 1 and SITE 2 are configured to use the VRRP VIP as das.isolationaddress.
Disk.terminateVMOnPDLDefault needs to be set to “true”. The moment a VM issues I/O to a device that is in a PDL state, the VM is killed. The setting has been renamed to VMkernel.Boot.terminateVMOnPDL in vSphere 5.5.
das.maskCleanShutdownEnabled is set to “true” (default since vSphere 5.1). This settings allows HA to differentiate between VM’s that have been shut down properly and VM’s that have been killed.
Disk.AutoremoveOnPDL is a vSphere 5.5 setting and should be set to “0”. This prevents the PDL device from being removed, since in a Stretched Cluster environment we know the ‘permanently’ lost device will eventually return.
mirrorwritepriority has 2 possible settings, “latency” (default) and “redundancy”. In latency mode, the write cache destage is considered complete, when the volume in only one site has been updated.
The recommended setting in a Stretched Cluster deployment is “redundancy”. The setting is only configurable through the SVC CLI (chvdisk) and is set on a per volume basis.
Scenario 1 (isolation response)
figure 4. Scenario 1 (isolation response)
- LAN that carries the HA heartbeats fails on ESXi01
- Secondary heartbeat mechanism still functions (datastore heartbeats)
and confirms the host is isolated
- Isolation response is triggered
- HA is not triggered
figure 5. Scenario 2
- SVC node in SITE 1 fails
- ESXi01 accesses DS1 via SVC NODE 2
- SVC NODE 2 goes into write through mode, write performance is degraded
- No isolation response or HA trigger
Scenario 3 (APD)
figure 6. Scenario 3 (APD)
- Storage network in SITE 1 fails
- ESXi01 loses all storage paths, ergo APD
- HA is not triggered (primary HA heartbeat mechanism still functions)
- VM’s on ESXi01 enter a zombie state
Scenario 4 (PDL)
Let’s say all connectivity between SITE 1 and SITE 2 is lost. Let’s say the QUORUM decides to select SITE 2 as the active site and SITE 1 is placed in stand-by. This means ESXi01 is able to communicate with SVC NODE 1, but is unable to access DS1.
To simulate this event, I UNMAP DS1 from ESXi01.
figure 7. Scenario 4 (PDL)
- Connectivity between SITE 1 and SITE 2 is lost (simulated by UNMAP)
- DS1 in SITE 1 triggers a PDL
- VM’s on ESXi01 are killed and restarted on ESXi02
The most important new vSphere 6 feature in regard to Stretched Cluster, is the VM Component Protection (VMCP). VMCP let’s you select the appropriate response in case of APD and PDL. This alone makes vSphere 6 a must-have for Stretched Cluster deployments. You can read more about VMCP in this blogpost by Duncan Epping.
This concludes part 3. Stay tuned for part 4 in which I will take a closer look at vSphere’s site unawareness and the implications this may have on the site preference of SVC and networking components.
Thanks for reading.