Why we chose vSAN for our business critical apps

Looking at the IT infrastructure at several production sites within my customer’s organization, we quickly noticed IT infrastructure components (mainly compute and storage related) that were not up to par from an availability and performance perspective. The production sites all run local business critical ERP application workloads that are vital to the business processes. After researching and discussing a lot, I proposed my customer a new blueprint. The blueprint consists of a new compute and storage baseline for the site local datacenters. The idea was to create a platform that allows for a higher availability and more performance while reducing costs.

We researched the possibility to step away from the traditional storage arrays and move towards a Hyper Converged Infrastructure (HCI) solution. Because IT is not the main business of the company, we were trying to keep things as simple as possible. We defined several ‘flavors’ to suit each production location to its needs. For example, the small sites will be equipped with a ROBO setup, the medium sites with a single datacenter cluster and the large factories are presented a stretched cluster solution. A stretched cluster setup will allow them to adhere to the stated availability SLA in the event of a large scale outages on the plant for their most important applications that do not offer in-application clustering/resiliency.

Benefits

Since my customer is running VMware solutions in all of its datacenters, VMware vSAN was the perfect fit. It allows the customer to lean on the already in-house VMware knowledge while being able to move towards less FTE for managing the storage backend. Implementing stretched clusters on multiple sites using storage arrays can be a daunting task. And although there are prerequisites, implementing VMware vSAN is implemented fairly easy, even if you opt for a stretched cluster configuration. This allowed for very short time from the moment of receiving hardware to a fully operational vSphere and vSAN cluster. Because the customer is in the process of renewing its IT infra for a number of sites, it really helps to tell the business we can deliver within weeks rather than months.

Using the VMware vSAN ready nodes allowed us to exceed the required storage capacity and performance requirements while being more cost efficient in comparison to traditional storage arrays. As management loves lowered costs, both capex and opex, HCI was the way to go. From a manageability point-of-view, it is a big plus that all VMware datacenters and (vSAN) clusters are managed from a centralized VMware vCenter UI. Another plus was the savings in rack units as those are scarce in some site-local datacenters.

(more…)

Read More

Distributed Storage Network Topology

This is a short write-up about why you should consider a certain network topology when adopting scale-out storage technologies in a multi-rack environment. Without going into too much detail, I want to accentuate the need to follow the scalable distributed storage model when it comes to designing your Ethernet storage network. To be honest, it is probably the other way around. The networking experts in this world introduced scalable network architectures, while maintaining consistent and predictable latency, for a long time now. The storage world is just catching up.

Today, we have the ability to create highly scalable distributed storage infrastructures, following Hyper-Converged Infrastructures (HCI) innovations. Because the storage layer is distributed across ESXi hosts, a lot of point-to-point Ethernet connections between ESXi hosts will be utilized for storage I/O’s. Typically, when a distributed storage solution (like VMware vSAN) is adopted, we tend to create a pretty basic layer-2 network. Preferably using 10GbE or more NIC’s, line-rate capable components in a non-blocking network architecture with enough ports to support our current hosts. But once we scale to an extensive number of ESXi hosts and racks, we face challenges on how to facilitate the required network interfaces to connect to our ESXi hosts and how to connect the multiple Top of Rack (ToR) switches to each other. That is where the so-called spine and leaf network architecture comes into play.

Spine-Leaf

Each leaf switch, in a spine-leaf network architecture, connects to every spine switch in the fabric. Using this topology, the connection between two ESXi hosts will always traverse the same number of network hops when the hosts are distributed across multiple racks. Such a network topology provides a predictable latency, thus consistent performance, even though you keep scaling out your virtual datacenter. It is the consistency in performance that makes the spine/leaf network architecture so suitable for distributed storage solutions.

An exemplary logical spine-leaf network architecture is shown in the following diagram:

(more…)

Read More

Jumbo frames and the risks involved

Even though the jumbo frame and the possible gain and risk trade-offs discussion is not new, we found ourselves discussing it yet again. Because we had different opinions, it seems like a good idea to elaborate on this topic.

Let’s have a quick recap on what jumbo frames actually are. Your default MTU (Maximum Transmission Unit) for a ethernet frame is 1500. A MTU of 9000 is referred to as a jumbo frame.

Jumbo frames or 9000-byte payload frames have the potential to reduce overheads and CPU cycles.

Typically, jumbo frames are considered for IP storage networks or vMotion networks. A lot of performance benchmarking is already described on the web. It is funny to see a variety of opinions whether to adopt jumbo frames or not. Check this blogpost and this blogpost on jumbo frames performance compared to a standard MTU size. The discussion if ‘jumbo frames provide a significant performance advantage’ is still up in the air.

There are other techniques to improve network throughput and lower CPU utilization next to jumbo frames. A modern NIC will support the Large Segment Offload (LSO) and Large Receive Offload (LRO) offloading mechanisms. Note: LSO is also referenced as TSO (TCP Segmentation Offload). Both are configurable. LSO/TSO is enabled by default if the used NIC hardware supports it. LRO is enabled by default when using VMXNET virtual machine adapters.

Risks?

Let’s put the performance aspects aside, and let us look into the possible risks involved when implementing jumbo frames. The thing is, in order to be effective, jumbo frames must be enabled end to end in the network path. The main risk when adopting jumbo frames, is that if one component in the network path is not properly configured for jumbo frames, a MTU mismatch occurs.
(more…)

Read More

Synology DSM6.0 VLAN support

I’ve noticed some distress on the web because, with the release of Synology DSM version 6.0, it is no longer possible to use the vconfig command. This command was used to configure VLAN tagging on your interfaces.

It is however still perfectly possible to create multiple sub-interfaces on a physical interface or bond without using the vconfig command. All you need to do is create additional config-files for each of you sub-interfaces. Each sub-interface represents a VLAN ID. The config-files are found in /etc/sysconfig/network-scripts/.

Note: shell access is required to your Synology. So you should enable SSH for instance.

In the example below, you will see my Synology has a bond using eth0 and eth1. My setup required to have some additional VLAN tagged sub-interfaces on top of my physical bond interface.

synologyVLAN
As you can see, I have a sub-interface for VLAN 100, 120, 130 and 20. You only need to copy a config-file using the naming format ifcfg-<phy int>.<vlan-id>, and adjust it to your needs. A (copied) config-file looks like this:

(more…)

Read More

Exploring Hedvig

We had the chance to sit down and have a chat with the Hedvig EMEA guys last week. They gave us a very good presentation on what Hedvig can bring and what they are working on. As we recently got to know Hedvig and their software defined storage solution, we were pretty amazed with their view on SDS and their long list of supported platforms and enterprise storage features and services. Although it is pretty hard to explain all the goods Hedvig brings in one post, we will give it a try! 🙂

 

hedvig-cloudfixNot too long ago, Hedvig Inc came out of stealth after a period of developing since June of 2012. They are opting for a slightly different approach with the general availability (GA) status compared to other SDS start-ups. When their software will be GA with version 1.0, it will be a fully developed, full feature solution which is already running production at several enterprise early adopter customers! It is likely version 1.0 is released next week (week 23)!!

Okay, so let us focus on what makes Hedvig unique. They introduce themselves using the quote below.

Put simply: Hedvig gets better and smarter as it scales. Hedvig defies conventional wisdom, transforming commodity hardware into the most advanced storage solution available today. Hedvig accelerates data to value by collapsing disparate storage systems into a single platform, creating a virtualized storage pool that provisions storage with a few clicks, scales to petabytes, and runs seamlessly in both private and public clouds.

(more…)

Read More

VMware Virtual SAN 6.0 benchmark

Last week I was going through ‘What’s New: VMware Virtual SAN 6.0‘, it seems like VSAN 6.0 is bigger, better and faster. The latest installment of VMware’s distributed storage platform provides a significant IOPS boost, up to twice the performance in hybrid mode. The new VirstoFS on-disk format is capable of high performance snapshots and clones. Time to put it to the test.

 

Disclaimer: this benchmark has been performed on a home lab setup, components used are not listed in the VSAN HCL. My goal is to confirm an overall IOPS and snapshot performance increase by comparing VSAN 5.5 with 6.0. I did so by running a synthetic IOmeter workload.

VMware has a really nice blogpost on more advanced VSAN performance testing utilizing IOmeter.

 

Hardware

My lab consists of 3 Shuttle SZ87R6 nodes, connected by a Cisco SG300.

 Chipset  Z87
 Processor  Intel Core i5-4590S
 Memory  32 GB
 NIC 1  1 GE (management)
 NIC 2  1 GE (VSAN)
 HDD 1  Samsung 840 Evo (120GB)
 HDD 2  HGST Travelstar 7K1000 (1TB)

 
 

ESXi/VSAN versions

  • ESXi 5.5 Update 2 (build 2068190)
  • ESXi 6.0 (build 2494585)

(more…)

Read More