Stretched Cluster on IBM SVC (Part 3)

This is part 3 of the VMware Stretched Cluster on IBM SVC blogpost series.

PART 1     (intro, SVC cluster, I/O group, nodes)
PART 2     (split I/O group, deployment, quorum, config node)
PART 3    (HA, PDL, APD)


I explained how a SVC Split Cluster reacts to certain failure conditions in part 2. Now that we know how the storage layer behaves, let’s take a closer look at how this all ties in with the VMware layer. This is by no means a complete guide to every setting/configuration option involved, more of an excerpt of the ones I consider to be important. This post is based on vSphere 5.5.

VMware Stretched Cluster isn’t a feature you enable by ticking some boxes, it’s a design built around the workings of HA, DRS and a couple of other mechanisms.

First, I would like to briefly explain the concepts APD (All Paths Downs) and PDL (Permanent Device Loss).



In an All Paths Down scenario, the ESXi host loses all paths to the storage device. The host is unable to communicate with the storage array. Examples of failures that can trigger APD are a failing HBA or a failing SAN.

APD All Paths Down

figure 1. APD


PDL is triggered when the ESXi host is able to communicate with the storage array. The storage array tells the host the LUN is permanently unavailable, by returning a certain SCSI sense code. PDL can occur in certain split-brain scenario’s.

PDL Permanent Device Loss

figure 2. PDL



High Level Design


Stretched Cluster SVC High Level Design

figure 3. High Level Design Stretched Cluster


vSphere settings

Hosts in SITE 1 and SITE 2 are configured to use the VRRP VIP as das.isolationaddress.

Disk.terminateVMOnPDLDefault needs to be set to “true”. The moment a VM issues I/O to a device that is in a PDL state, the VM is killed. The setting has been renamed to VMkernel.Boot.terminateVMOnPDL in vSphere 5.5.

das.maskCleanShutdownEnabled is set to “true” (default since vSphere 5.1). This settings allows HA to differentiate between VM’s that have been shut down properly and VM’s that have been killed.

Disk.AutoremoveOnPDL is a vSphere 5.5 setting and should be set to “0”. This prevents the PDL device from being removed, since in a Stretched Cluster environment we know the ‘permanently’ lost device will eventually return.


SVC settings

mirrorwritepriority has 2 possible settings, “latency” (default) and “redundancy”. In latency mode, the write cache destage is considered complete, when the volume in only one site has been updated.

The recommended setting in a Stretched Cluster deployment is “redundancy”. The setting is only configurable through the SVC CLI (chvdisk) and is set on a per volume basis.


Scenario 1 (isolation response)

Stretched Cluster Scenario 1

figure 4. Scenario 1 (isolation response)


  • LAN that carries the HA heartbeats fails on ESXi01


  • Secondary heartbeat mechanism still functions (datastore heartbeats)
    and confirms the host is isolated
  • Isolation response is triggered
  • HA is not triggered


Scenario 2

Stretched Cluster Scenario 2

figure 5. Scenario 2


  • SVC node in SITE 1 fails


  • ESXi01 accesses DS1 via SVC NODE 2
  • SVC NODE 2 goes into write through mode, write performance is degraded
  • No isolation response or HA trigger


Scenario 3 (APD)

Stretched Cluster Scenario 3

figure 6. Scenario 3 (APD)


  • Storage network in SITE 1 fails


  • ESXi01 loses all storage paths, ergo APD
  • HA is not triggered (primary HA heartbeat mechanism still functions)
  • VM’s on ESXi01 enter a zombie state


Scenario 4 (PDL)

Let’s say all connectivity between SITE 1 and SITE 2 is lost. Let’s say the QUORUM decides to select SITE 2 as the active site and SITE 1 is placed in stand-by. This means ESXi01 is able to communicate with SVC NODE 1, but is unable to access DS1.

To simulate this event, I UNMAP DS1 from ESXi01.

Stretched Cluster Scenario 4

 figure 7. Scenario 4 (PDL)


  • Connectivity between SITE 1 and SITE 2 is lost (simulated by UNMAP)


  • DS1 in SITE 1 triggers a PDL
  • VM’s on ESXi01 are killed and restarted on ESXi02


vSphere 6

The most important new vSphere 6 feature in regard to Stretched Cluster, is the VM Component Protection (VMCP). VMCP let’s you select the appropriate response in case of APD and PDL. This alone makes vSphere 6 a must-have for Stretched Cluster deployments. You can read more about VMCP in this blogpost by Duncan Epping.


This concludes part 3. Stay tuned for part 4 in which I will take a closer look at vSphere’s site unawareness and the implications this may have on the site preference of SVC and networking components.

Thanks for reading.

Leave a Comment


No comments yet. Why don’t you start the discussion?

Leave a Reply

Your email address will not be published. Required fields are marked *


This site uses Akismet to reduce spam. Learn how your comment data is processed.