Distributed Storage Network Topology

This is a short write-up about why you should consider a certain network topology when adopting scale-out storage technologies in a multi-rack environment. Without going into too much detail, I want to accentuate the need to follow the scalable distributed storage model when it comes to designing your Ethernet storage network. To be honest, it is probably the other way around. The networking experts in this world introduced scalable network architectures, while maintaining consistent and predictable latency, for a long time now. The storage world is just catching up.

Today, we have the ability to create highly scalable distributed storage infrastructures, following Hyper-Converged Infrastructures (HCI) innovations. Because the storage layer is distributed across ESXi hosts, a lot of point-to-point Ethernet connections between ESXi hosts will be utilized for storage I/O’s. Typically, when a distributed storage solution (like VMware vSAN) is adopted, we tend to create a pretty basic layer-2 network. Preferably using 10GbE or more NIC’s, line-rate capable components in a non-blocking network architecture with enough ports to support our current hosts. But once we scale to an extensive number of ESXi hosts and racks, we face challenges on how to facilitate the required network interfaces to connect to our ESXi hosts and how to connect the multiple Top of Rack (ToR) switches to each other. That is where the so-called spine and leaf network architecture comes into play.

Spine-Leaf

Each leaf switch, in a spine-leaf network architecture, connects to every spine switch in the fabric. Using this topology, the connection between two ESXi hosts will always traverse the same number of network hops when the hosts are distributed across multiple racks. Such a network topology provides a predictable latency, thus consistent performance, even though you keep scaling out your virtual datacenter. It is the consistency in performance that makes the spine/leaf network architecture so suitable for distributed storage solutions.

An exemplary logical spine-leaf network architecture is shown in the following diagram:

(more…)

Read More

Jumbo frames and the risks involved

Even though the jumbo frame and the possible gain and risk trade-offs discussion is not new, we found ourselves discussing it yet again. Because we had different opinions, it seems like a good idea to elaborate on this topic.

Let’s have a quick recap on what jumbo frames actually are. Your default MTU (Maximum Transmission Unit) for a ethernet frame is 1500. A MTU of 9000 is referred to as a jumbo frame.

Jumbo frames or 9000-byte payload frames have the potential to reduce overheads and CPU cycles.

Typically, jumbo frames are considered for IP storage networks or vMotion networks. A lot of performance benchmarking is already described on the web. It is funny to see a variety of opinions whether to adopt jumbo frames or not. Check this blogpost and this blogpost on jumbo frames performance compared to a standard MTU size. The discussion if ‘jumbo frames provide a significant performance advantage’ is still up in the air.

There are other techniques to improve network throughput and lower CPU utilization next to jumbo frames. A modern NIC will support the Large Segment Offload (LSO) and Large Receive Offload (LRO) offloading mechanisms. Note: LSO is also referenced as TSO (TCP Segmentation Offload). Both are configurable. LSO/TSO is enabled by default if the used NIC hardware supports it. LRO is enabled by default when using VMXNET virtual machine adapters.

Risks?

Let’s put the performance aspects aside, and let us look into the possible risks involved when implementing jumbo frames. The thing is, in order to be effective, jumbo frames must be enabled end to end in the network path. The main risk when adopting jumbo frames, is that if one component in the network path is not properly configured for jumbo frames, a MTU mismatch occurs.
(more…)

Read More

Synology DSM6.0 VLAN support

I’ve noticed some distress on the web because, with the release of Synology DSM version 6.0, it is no longer possible to use the vconfig command. This command was used to configure VLAN tagging on your interfaces.

It is however still perfectly possible to create multiple sub-interfaces on a physical interface or bond without using the vconfig command. All you need to do is create additional config-files for each of you sub-interfaces. Each sub-interface represents a VLAN ID. The config-files are found in /etc/sysconfig/network-scripts/.

Note: shell access is required to your Synology. So you should enable SSH for instance.

In the example below, you will see my Synology has a bond using eth0 and eth1. My setup required to have some additional VLAN tagged sub-interfaces on top of my physical bond interface.

synologyVLAN
As you can see, I have a sub-interface for VLAN 100, 120, 130 and 20. You only need to copy a config-file using the naming format ifcfg-<phy int>.<vlan-id>, and adjust it to your needs. A (copied) config-file looks like this:

(more…)

Read More

Exploring Hedvig

We had the chance to sit down and have a chat with the Hedvig EMEA guys last week. They gave us a very good presentation on what Hedvig can bring and what they are working on. As we recently got to know Hedvig and their software defined storage solution, we were pretty amazed with their view on SDS and their long list of supported platforms and enterprise storage features and services. Although it is pretty hard to explain all the goods Hedvig brings in one post, we will give it a try! 🙂

 

hedvig-cloudfixNot too long ago, Hedvig Inc came out of stealth after a period of developing since June of 2012. They are opting for a slightly different approach with the general availability (GA) status compared to other SDS start-ups. When their software will be GA with version 1.0, it will be a fully developed, full feature solution which is already running production at several enterprise early adopter customers! It is likely version 1.0 is released next week (week 23)!!

Okay, so let us focus on what makes Hedvig unique. They introduce themselves using the quote below.

Put simply: Hedvig gets better and smarter as it scales. Hedvig defies conventional wisdom, transforming commodity hardware into the most advanced storage solution available today. Hedvig accelerates data to value by collapsing disparate storage systems into a single platform, creating a virtualized storage pool that provisions storage with a few clicks, scales to petabytes, and runs seamlessly in both private and public clouds.

(more…)

Read More

VMware Virtual SAN 6.0 benchmark

Last week I was going through ‘What’s New: VMware Virtual SAN 6.0‘, it seems like VSAN 6.0 is bigger, better and faster. The latest installment of VMware’s distributed storage platform provides a significant IOPS boost, up to twice the performance in hybrid mode. The new VirstoFS on-disk format is capable of high performance snapshots and clones. Time to put it to the test.

 

Disclaimer: this benchmark has been performed on a home lab setup, components used are not listed in the VSAN HCL. My goal is to confirm an overall IOPS and snapshot performance increase by comparing VSAN 5.5 with 6.0. I did so by running a synthetic IOmeter workload.

VMware has a really nice blogpost on more advanced VSAN performance testing utilizing IOmeter.

 

Hardware

My lab consists of 3 Shuttle SZ87R6 nodes, connected by a Cisco SG300.

 Chipset Z87
 Processor Intel Core i5-4590S
 Memory 32 GB
 NIC 1 1 GE (management)
 NIC 2 1 GE (VSAN)
 HDD 1 Samsung 840 Evo (120GB)
 HDD 2 HGST Travelstar 7K1000 (1TB)

 
 

ESXi/VSAN versions

  • ESXi 5.5 Update 2 (build 2068190)
  • ESXi 6.0 (build 2494585)

(more…)

Read More

Stretched Cluster on IBM SVC (Part 3)

This is part 3 of the VMware Stretched Cluster on IBM SVC blogpost series.

PART 1     (intro, SVC cluster, I/O group, nodes)
PART 2     (split I/O group, deployment, quorum, config node)
PART 3    (HA, PDL, APD)

 

I explained how a SVC Split Cluster reacts to certain failure conditions in part 2. Now that we know how the storage layer behaves, let’s take a closer look at how this all ties in with the VMware layer. This is by no means a complete guide to every setting/configuration option involved, more of an excerpt of the ones I consider to be important. This post is based on vSphere 5.5.

VMware Stretched Cluster isn’t a feature you enable by ticking some boxes, it’s a design built around the workings of HA, DRS and a couple of other mechanisms.

First, I would like to briefly explain the concepts APD (All Paths Downs) and PDL (Permanent Device Loss).

 

APD

In an All Paths Down scenario, the ESXi host loses all paths to the storage device. The host is unable to communicate with the storage array. Examples of failures that can trigger APD are a failing HBA or a failing SAN.

APD All Paths Down

figure 1. APD

(more…)

Read More