Stretched cluster VM & datastore affinity

When using a vSphere stretched cluster solution, it is important to have your VM(s) and its VMDK(s) affinity aligned in the same datacenter. So if the storage controller in datacenter 1 serves the read/write copy of the datastore, you would like the VM to run on a vSphere host in the same datacenter. This will avoid the storage read IO’s to traverse the inter-datacenter connections, resulting in an obvious impact on performance. With the VM – datastore affinity in place, you will also mitigate the risk of potential VM outage if a datacenter partition (aka split-brain scenario) will occur.

Let me show you what I mean by using a simple logical overview of a stretched cluster infrastructure. The following example is based on an uniform storage backend. More information on uniform and non-uniform metro storage solutions is read here.

What you don’t want:

VM affinity

What you do want:

VM affinity

 

It is perfectly possible to automate the alignment upon… VM creation for example. Needless to say, you will require DRS to run. Preferably in fully automated mode.

(more…)

Read More

Stretched Cluster on IBM SVC (Part 3)

This is part 3 of the VMware Stretched Cluster on IBM SVC blogpost series.

PART 1     (intro, SVC cluster, I/O group, nodes)
PART 2     (split I/O group, deployment, quorum, config node)
PART 3    (HA, PDL, APD)

 

I explained how a SVC Split Cluster reacts to certain failure conditions in part 2. Now that we know how the storage layer behaves, let’s take a closer look at how this all ties in with the VMware layer. This is by no means a complete guide to every setting/configuration option involved, more of an excerpt of the ones I consider to be important. This post is based on vSphere 5.5.

VMware Stretched Cluster isn’t a feature you enable by ticking some boxes, it’s a design built around the workings of HA, DRS and a couple of other mechanisms.

First, I would like to briefly explain the concepts APD (All Paths Downs) and PDL (Permanent Device Loss).

 

APD

In an All Paths Down scenario, the ESXi host loses all paths to the storage device. The host is unable to communicate with the storage array. Examples of failures that can trigger APD are a failing HBA or a failing SAN.

APD All Paths Down

figure 1. APD

(more…)

Read More