Stretched cluster with NSX

Last NLVMUG I was talking about stretched clusters. My presentation elaborated somewhat on how VMware NSX can help you deal with challenges that arise when deploying a stretched cluster solution. In this blogpost I want to have a closer look at this specific topic.

A quick understanding about what a stretched cluster solution actually is; it is a vSphere cluster configured in one vCenter instance containing an equal number of hosts from both sites. This allows for disaster avoidance (vMotion) and disaster recovery (vSphere HA) between two geographical separated sites. From the backend infrastructure perspective, your (synchronous replicated) storage and network solutions must span both sites.

Looking into network designs used for stretched clusters, you will typically face challenges like:

  • How do you design for VM mobility over 2 sites, requiring Layer-2 networks between the 2 sites?
  • Stretched Layer-2 networks (VLANs) introduce a higher risk of failure (think Layer-2 loops).
  • How to properly segment applications and/or tenants (customers/business units)?
  • Netwerk flows. What about your egress and ingress connections?

Let’s begin with how a VMware NSX install-base could look like if it is deployed within stretched cluster infrastructure.

Stretched cluster with NSX architecture

A stretched cluster with VMware NSX could look like the following logical overview.
(more…)

Read More

Stretched cluster VM & datastore affinity

When using a vSphere stretched cluster solution, it is important to have your VM(s) and its VMDK(s) affinity aligned in the same datacenter. So if the storage controller in datacenter 1 serves the read/write copy of the datastore, you would like the VM to run on a vSphere host in the same datacenter. This will avoid the storage read IO’s to traverse the inter-datacenter connections, resulting in an obvious impact on performance. With the VM – datastore affinity in place, you will also mitigate the risk of potential VM outage if a datacenter partition (aka split-brain scenario) will occur.

Let me show you what I mean by using a simple logical overview of a stretched cluster infrastructure. The following example is based on an uniform storage backend. More information on uniform and non-uniform metro storage solutions is read here.

What you don’t want:

VM affinity

What you do want:

VM affinity

 

It is perfectly possible to automate the alignment upon… VM creation for example. Needless to say, you will require DRS to run. Preferably in fully automated mode.

(more…)

Read More

Stretched Cluster on IBM SVC (Part 3)

This is part 3 of the VMware Stretched Cluster on IBM SVC blogpost series.

PART 1     (intro, SVC cluster, I/O group, nodes)
PART 2     (split I/O group, deployment, quorum, config node)
PART 3    (HA, PDL, APD)

 

I explained how a SVC Split Cluster reacts to certain failure conditions in part 2. Now that we know how the storage layer behaves, let’s take a closer look at how this all ties in with the VMware layer. This is by no means a complete guide to every setting/configuration option involved, more of an excerpt of the ones I consider to be important. This post is based on vSphere 5.5.

VMware Stretched Cluster isn’t a feature you enable by ticking some boxes, it’s a design built around the workings of HA, DRS and a couple of other mechanisms.

First, I would like to briefly explain the concepts APD (All Paths Downs) and PDL (Permanent Device Loss).

 

APD

In an All Paths Down scenario, the ESXi host loses all paths to the storage device. The host is unable to communicate with the storage array. Examples of failures that can trigger APD are a failing HBA or a failing SAN.

APD All Paths Down

figure 1. APD

(more…)

Read More

Stretched Cluster on IBM SVC (Part 2)

This is part 2 of the VMware Stretched Cluster on IBM SVC blogpost series.

PART 1     (intro, SVC cluster, I/O group, nodes)
PART 2    (split I/O group, deployment, quorum, config node)
PART 3     (HA, PDL, APD)


SVC split I/O group
It’s time to split our SVC nodes between failure domains (sites). While the SVC technically supports a maximum round-trip time (RTT) of 80 ms, Metro vMotion supports a RTT up to 10 ms (Enterprise Plus license).

You can split nodes in 2 ways; with or without the use of ISL’s (Inter-Switch Link). Both deployment methods are covered in detail in this document.


Deployment without ISL
Nodes are directly connected to the FC switches in both the local and remote site, without traversing an ISL. Passive WDM devices (red line) can be used to reduce the number of links. You’ll need to equip the nodes with “colored” long distance SFP’s.

SVC no ISLSource

(more…)

Read More

Stretched Cluster on IBM SVC (Part 1)

This is part 1 of the VMware Stretched Cluster on IBM SVC blogpost series.

PART 1     (intro, SVC cluster, I/O group, nodes)
PART 2     (split I/O group, deployment, quorum, config node)
PART 3     (HA, PDL, APD)

 

ibm-pc

Last year I was the primary person responsible for implementing a new storage environment based on IBM SVC and V7000 and building a VMware Stretched Cluster (a.k.a. vSphere Metro Storage Cluster) on top of that. I would like to share some of the experience I gathered, caveats I encountered and other points of interest. This is by no means a complete implementation guide (go read the Redbook 😉 ). I’ll discuss some of the implementation options as well as failure scenario’s, advanced settings and some other stuff I think is interesting. Based on the content, this will be a multi-part (probably 3) blog post.

Stretched Cluster versus Site Recovery Manager
If you’re unfamiliar with the concepts Stretched Cluster and SRM, I suggest you read the excellent whitepaper “Stretched Clusters and VMware vCenter Site Recovery Manager“, explaining which solution best suits your business needs. Another good resource is VMworld 2012 session INF-BCO2982, with the catchy title “Stretched Clusters and VMware vCenter Site Recovery Manager: How and When to Choose One, the Other, or Both“, however you’ll only be able to access this content if you’ve attended VMworld (or simply paid for a subscription).

(more…)

Read More