Stretched Cluster on IBM SVC (Part 3)

This is part 3 of the VMware Stretched Cluster on IBM SVC blogpost series.

PART 1 (intro, SVC cluster, I/O group, nodes)
PART 2 (split I/O group, deployment, quorum, config node)
PART 3 (HA, PDL, APD)

I explained how a SVC Split Cluster reacts to certain failure conditions in part 2. Now that we know how the storage layer behaves, let’s take a closer look at how this all ties in with the VMware layer. This is by no means a complete guide to every setting/configuration option involved, more of an excerpt of the ones I consider to be important. This post is based on vSphere 5.5.

VMware Stretched Cluster isn’t a feature you enable by ticking some boxes, it’s a design built around the workings of HA, DRS and a couple of other mechanisms.

First, I would like to briefly explain the concepts APD (All Paths Downs) and PDL (Permanent Device Loss).

APD

In an All Paths Down scenario, the ESXi host loses all paths to the storage device. The host is unable to communicate with the storage array. Examples of failures that can trigger APD are a failing HBA or a failing SAN.

figure 1. APD

PDL

PDL is triggered when the ESXi host is able to communicate with the storage array. The storage array tells the host the LUN is permanently unavailable, by returning a certain SCSI sense code. PDL can occur in certain split-brain scenario’s.

figure 2. PDL

High Level Design

figure 3. High Level Design Stretched Cluster

vSphere settings

das.isolationaddress
Hosts in SITE 1 and SITE 2 are configured to use the VRRP VIP as das.isolationaddress.

Disk.terminateVMOnPDLDefault
Disk.terminateVMOnPDLDefault needs to be set to “true”. The moment a VM issues I/O to a device that is in a PDL state, the VM is killed. The setting has been renamed to VMkernel.Boot.terminateVMOnPDL in vSphere 5.5.

das.maskCleanShutdownEnabled
das.maskCleanShutdownEnabled is set to “true” (default since vSphere 5.1). This settings allows HA to differentiate between VM’s that have been shut down properly and VM’s that have been killed.

Disk.AutoremoveOnPDL
Disk.AutoremoveOnPDL is a vSphere 5.5 setting and should be set to “0”. This prevents the PDL device from being removed, since in a Stretched Cluster environment we know the ‘permanently’ lost device will eventually return.

SVC settings

mirrorwritepriority
mirrorwritepriority has 2 possible settings, “latency” (default) and “redundancy”. In latency mode, the write cache destage is considered complete, when the volume in only one site has been updated.

The recommended setting in a Stretched Cluster deployment is “redundancy”. The setting is only configurable through the SVC CLI (chvdisk) and is set on a per volume basis.

Scenario 1 (isolation response)

figure 4. Scenario 1 (isolation response)

Failure

LAN that carries the HA heartbeats fails on ESXi01

Outcome

Secondary heartbeat mechanism still functions (datastore heartbeats)
and confirms the host is isolated
Isolation response is triggered
HA is not triggered

Scenario 2

figure 5. Scenario 2

Failure

SVC node in SITE 1 fails

Outcome

ESXi01 accesses DS1 via SVC NODE 2
SVC NODE 2 goes into write through mode, write performance is degraded
No isolation response or HA trigger

Scenario 3 (APD)

figure 6. Scenario 3 (APD)

Failure

Storage network in SITE 1 fails

Outcome

ESXi01 loses all storage paths, ergo APD
HA is not triggered (primary HA heartbeat mechanism still functions)
VM’s on ESXi01 enter a zombie state

Scenario 4 (PDL)

Let’s say all connectivity between SITE 1 and SITE 2 is lost. Let’s say the QUORUM decides to select SITE 2 as the active site and SITE 1 is placed in stand-by. This means ESXi01 is able to communicate with SVC NODE 1, but is unable to access DS1.

To simulate this event, I UNMAP DS1 from ESXi01.

figure 7. Scenario 4 (PDL)

Failure

Connectivity between SITE 1 and SITE 2 is lost (simulated by UNMAP)

Outcome

DS1 in SITE 1 triggers a PDL
VM’s on ESXi01 are killed and restarted on ESXi02

vSphere 6

The most important new vSphere 6 feature in regard to Stretched Cluster, is the VM Component Protection (VMCP). VMCP let’s you select the appropriate response in case of APD and PDL. This alone makes vSphere 6 a must-have for Stretched Cluster deployments. You can read more about VMCP in this blogpost by Duncan Epping.

This concludes part 3. Stay tuned for part 4 in which I will take a closer look at vSphere’s site unawareness and the implications this may have on the site preference of SVC and networking components.

Thanks for reading.

Stretched Cluster on IBM SVC (Part 3)

APD

PDL

High Level Design

vSphere settings

SVC settings

Scenario 1 (isolation response)

Scenario 2

Scenario 3 (APD)

Scenario 4 (PDL)

vSphere 6

Comments

Leave a Reply Cancel reply