Stretched Cluster with NSX – NielsHagoort.com

Last NLVMUG I was talking about stretched clusters. My presentation elaborated somewhat on how VMware NSX can help you deal with challenges that arise when deploying a stretched cluster solution. In this blogpost, I want to have a closer look at this specific topic.

A quick understanding of what a stretched cluster solution actually is; it is a vSphere cluster configured in one vCenter instance containing an equal number of hosts from both sites. This allows for disaster avoidance (vMotion) and disaster recovery (vSphere HA) between two geographically separated sites. From the backend infrastructure perspective, your (synchronous replicated) storage and network solutions must span both sites.

Looking into network designs used for stretched clusters, you will typically face challenges like:

How do you design for VM mobility over 2 sites, requiring Layer-2 networks between the 2 sites?
Stretched Layer-2 networks (VLANs) introduce a higher risk of failure (think Layer-2 loops).
How to properly segment applications and/or tenants (customers/business units)?
Netwerk flows. What about your egress and ingress connections?

Let’s begin with what a VMware NSX install-base could look like if it is deployed within a stretched cluster infrastructure.

Stretched cluster with NSX architecture

A stretched cluster with VMware NSX could look like the following logical overview.

The two sites are connected via a routed (layer-3) network. This network can be used by NSX as transport network for it’s VTEPs. Note: a MTU size of 1600 (minimum) is required on this network because of VXLAN encapsulation.

Looking at the placement of the NSX components, you will see that the NSX Manager appliance and the NSX controllers are all placed in datacenter 1 following the stretched cluster approach with 1 vCenter instance. In this example scenario, we assume the management tooling is also placed on a stretched cluster.

I did wonder if having all NSX management/control components on one site is a wanted scenario. Normally I would go for an, as equal as possible, distribution of these components over both sites, but digging more into how the control plane of NSX works, it really does not add any value to do so. It makes more sense the place them on the same site to avoid unnecessary traffic between the controllers over the datacenter interconnect. Another reason to place them together is to avoid any unwanted elections among the controllers if a datacenter partition failure would occur.

All this is because of the control-plane roles in NSX works and how the election is done. I used the this blogpost by Roie Ben Haim as a source. The following roles are divided over the available controllers (copied from Roie):

api_provider: Handles HTTP web service requests from external clients (NSX Manager) and initiates processing by other controller tasks.
persistence_server: Stores data from the NVP API and vDS devices that must be persisted across all controllers in case of node failures or shutdowns.
switch_manager: Maintains management connections for one or more vDS devices.
logical_manager: Monitors when end hosts arrive or leave vDS devices and configures the vDS forwarding states to implement logical connectivity and policies.
directory_server: Manages VXLAN and the distributed logical routing directory of information.

For each role, a controller is elected as master. So the 5 roles are divided over 3 NSX controllers (max 3 controller are supported by VMware). When a controller fails, another controller is elected as master for the role(s) the failing controller was serving.

So the interesting thing here is to determine what happens to virtual networking if the control-plane is temporarily unavailable? Thinking about a NSX deployment on a stretched cluster and the related failure scenarios like site failure or datacenter partition. That together with our NSX management and controllers being placed in 1 datacenter. What is the impact on virtual machine traffic flows if those failures would occur?

From a data-plane perspective, it looks like you’re good. Even though the control-plane is not available for a short period of time, the forwarding plane or data-plane will still continue to work as it did before. The behavior will depends on which controle-plane mode you are operating in. Check this blogpost on a deep-dive on the NSX control-plane modes (multicast, unicast or hybrid).

To elaborate more on what the control- and data-planes components actually behold:

Control-plane components

NSX controllers: Stores the ARP, MAC, VTEP and Routing tables. Controls the logical networks, but does not sit in the data path.
UWA (User World Agent): Is a kernel module used for communication between the vSphere host (data-plane) and the NSX controllers.
Logical Router Control VM is the components in charge of the Routing Tables and other control functions.

Data Plane components

Kernel modules: These are the vibs installed on the vSphere hosts. These include UWA, VXLAN, Distributed routing and firewall modules. We described UWA already. The VXLAN extension is working on the encapsulation and decapsulation of VXLAN packets. It also sends collected VTEP network information to the controller via the UWA.
vSphere Distributed vSwitch: This is where all the layer-2 switching is actually performed in the virtual network(s).
Edge appliances: All VM north/south traffic is traversing the edge service appliances. Functionalities like VPN and NAT reside here.

Returning to the overall NSX deployment in a stretched cluster, the NSX logical switches and logical routing spans the entire cluster, thus both sites.

Layer-2 challenges

Because the logical switch in NSX spans both sites, the layer-2 connectivity is taken care of. As said before, your main objective with a stretched cluster is to provide layer-2 connectivity between the sites because that will allow for VM mobility between sites without the need for IP re-numbering for instance.

Traditional solutions for layer-2 ‘stretching’ or stretched VLANs are typically:

Dedicated fibers between the sites, configured as a layer-2 VLAN trunk.
VPLS instances via a MPLS backbone
Propriety protocols, i.e. Cisco OTV

All the options above are only viewed from network perspective largely missing enablement of automation!

The first ‘solution’ is actually a well adopted solution in the Netherlands because of fiber costs and availability. ‘Adopted’ might not be the right word, because I don’t know any network engineers who are actually keen on deploying this solution because of the, earlier stated, risks involved.

This is where the logical switch part comes in. This will provide the layer-2 connectivity. Because we are using a single vCenter ‘stretched’ cluster, the logical switches automatically spans both sites.

Logical switching enables and optimizes layer-2 connectivity across datacenter boundaries

Even if you do not have a stretched cluster environment, NSX can still be perfectly leveraged for bridging layer-2 over layer-3 networks. Be sure to check out this blog by Tomas Fojta for more information about this topic!

In- and outbound connections

Looking into network traffic flows; you want to know how your customers or organization connect to the applications running on the stretched cluster. Normally, you would have one ingress point for connectivity, even though both sites are ‘active’. This remains one of the key topics when designing such a infrastructure. Be aware that you will need a setup that allows for connectivity on the remaining site when site failure occurs. Dynamic routing to the rescue!

Egress traffic typically leaves your (virtual) network on one point in your infrastructure. However, there are possibilities for ‘local egress’. The following video of this VMworld 2014 breakout session by Ray Budavari elaborates on this:

To conclude…

VMware NSX is a good fit when using a stretched cluster solution. The two main use-cases for NSX are automation and (enhanced) security. But next to those use-cases, the possibility for you to create layer-2 connectivity over a routed network can be of massive importance when defining a business case for NSX in a multi site (i.e. stretched cluster) environment.

1 Comment

daniel S.


June 5, 2017, 10:19 PM

hi There,
how does OTV or VXLAN work with NSX with NSX on stretched cluster for VMotion across 2 sites. How about EMC Vplex metro ? I guess the storage piece allows support for VMSC – vsphere metro storage cluster .

Types of vSphere Metro Storage Cluster (vMSC) Implementations
Single stretched vSphere cluster
Intra-cluster vMotions are parallelized
vMotion network requirements = 622Mbps/5ms RTT, L2 equivalence for VMkernel (support requirement) and VM network traffic (operational requirement) (10 ms with vSphere 5 Enterprise Plus/Metro vMotion) This is round-trip time without factoring in replication traffic.