Part 3: Testing PernixData FVP 2.0

A while ago I did a write-up about PernixData FVP and their new 2.0 release. In blogpost “Part 2: My take on PernixData FVP2.0” I ran a couple of tests which were based on a Max IOPS load using I/O Analyzer.

This time ’round, I wanted to run some more ‘real-life’ workload tests in order to show the difference between a non-accelerated VM, a FVP accelerated VM using SSD and a FVP accelerated VM using RAM. So I’m not per se in search of mega-high IOPS numbers, but looking to give a more realistic view on what PernixData FVP can do for your daily workloads. While testing I proved to myself it’s still pretty hard to simulate a real-life work-load but had a go at it nonetheless… 🙂

Equipment

As stated in previous posts, it is important to understand I ran these test on a homelab. Thus not representing decent enterprise server hardware. That said, it should still be able to show the differences in performance gain using FVP acceleration. Our so-called ‘nano-lab’ consists of:

3x	Intel NUC D54250WYB (Intel core i5-4250U / 16GB 1.35V 1600Mhz RAM)
3x	Intel DC S3700 SSD 100GB (one per NUC)
3x	Dual NIC Gbit mini PCIe expension (3 GbE NIC per NUC)
1x	Synology DS412+ (4x 3TB)
1x	Cisco SG300-20 gigabit L3 switch

Note the bold 1.35V. This is low voltage memory! While perfect for keeping power consumption down on my homelab, it makes the concession of lower performance compared to 1.5V memory. Since we are testing FVP in combination with RAM it’s good to keep this in mind.

Pre-build the lab looked something like this:

FVP version

I updated my FVP installation to the newest version extension (and management server) which contains further enhancements on the new FVP 2.0 features.

IO tests

It felt like I did fool around with pretty much every ICF (IOmeter Configuration File) out there. Eventually I customized an ICF which was based on a ‘bursty OLTP (Online Transaction Processing)’ workload. OLTP database workloads seemed like a legit IO test as they are a good example of a workload in need of low latency, high availability on data and not so much high throughput.

So, the IO test consists of 2 workers with IO Analyzer using a raw VMDK residing on a iSCSI LUN using the default vSphere iSCSI software adapter. The VMDK has a size of 10GB representing the working set of my fictional application. I made sure my Synology was pretty much idle when performing the tests.

FVP is configured with policy ‘Write Back (Local host and 1 peer)‘ in order to meet the data availability ‘requirement’. I did test with the FVP policy set to write back with zero peers and noticed an improvement because no additional latency is created by writing cache data to the network peer(s). However, I believe this isn’t a configuration which will be used when accelerating an application in a enterprise environment.

The 2 IO workers are configured with the specifications as listed below. The workers are run simultaneously during tests.

Write-worker	Constant Write Bursty Write Seq Constant Write Bursty Write Seq Etc.
Read-worker	Bursty Read Constant Read Bursty Read Constant Read Etc.

Constant Write =	8Kb	100% random write	1ms transfer delay	4 IOs burst length
Bursty Write Seq =	8Kb	100% sequential write	0ms transfer delay	1 IO burst length
Constant Read=	8Kb	100% random read	1ms transfer delay	32 IOs burst length
Bursty Read Seq =	8Kb	100% sequential read	0ms transfer delay	1 IO burst length

Results

I used the numbers given by ESXTOP, filtered out the useful numbers and did some excel work to create these graphs. The contents of these graphs could be compared to the PernixData FVP ‘VM observed‘ numbers.

I could, as I did in previous FVP posts, use the much more slick looking FVP graphs… But this time I wanted to not take the FVP graphs for granted. Next to that I wanted to be able to do a comparison on FVP modes within the graphs. A concession of not using the FVP graphs is the ability to see the network peer latency so we’ll keep focus on VM observed latency.

First let us have a look at the latency graphs:

The most important thing to notice in the graphs above is that the latency peaks are flattened and consistent when accelerated by FVP. Next to off-course being dramatically lowered!! The part of latency being lowered and consistent is a game changer for your customers’ user experience! Their application will be more responsive and again… consistent in performance!!

Now check the IOPS graphs:

When comparing the IO performance there is a vast improvement noticeable when being accelerated by FVP. I guess I don’t have to point out that a higher number of IOPS is preferred.

Although it isn’t really transparent to crunch down the numbers, it is useful to see the average numbers to indicate a difference in performance between the non-accelerated and the accelerated modes.

	avg. read IOPS	avg. write IOPS	avg. read latency (ms)	avg. write latency (ms)
No(!) FVP acceleration	184	1520	23.04	5.77
FVP2.0 SDD acceleration	1876	2028	1.20	1.28
FVP2.0 RAM acceleration	4544	2262	0.27	0.33

Conclusion

Again I’m impressed by FVP! From your customers point of view they will notice a great deal of performance improvement and performance consistency while using their applications running on your VM’s!

PernixData’s view (‘de-couple performance from capacity’) is a very interesting one. When adopting these kind of technologies us consultants/architects should rethink our current design principles and building blocks on storage performance.

As always it fully depends on your workloads and what your current experiences are when it comes to storage performance. When you are using an enterprise range array with FC connections to your hosts you probably are more used to sort-off acceptable latency numbers in comparison to when you’re using a mid range NFS array using ethernet connections to your hosts.

But even when using that enterprise array with acceptable latency/performance; what to choose when your storage and/or host assets are financially depreciated or are running out of support. Will you still go for a traditional storage array? Or will you rethink your design principles and building blocks by designing your array to deliver data services only while your performance layer resides at your hosts?

Food for thought… We can state that PernixData FVP will deliver a great job once you’ve chosen it as your performance layer and I’m glad to see it being adopted by customers.

4 Comments

Chris Andal


January 16, 2015, 10:02 PM

Thank you for the excellent series of posts! We have been strongly considering putting the technology in our data center. You’ve pretty much made the decision a no-brainer for us. We have a SAN with no flash (SAS only), and it is quickly becoming a bottleneck. Unfortunately we are in the middle of our refresh cycle, so there is no chance of getting faster drives at this point in time. The FVP represents a great, relatively inexpensive stop-gap for us. And we would most like keep the product running even after we migrate to faster drives. Why not keep that extra cheap “cache”, and relieve some of the pressure off the SAN?
Pingback: Newsletter: January 25, 2014 | Notes from MWhite
Aleks


October 28, 2015, 4:28 PM

It seems that many tests of caching software are a little bit unrepresentative, because they lasts for a short period of time and cache sizes are 50-100% of the data size. It could be more interesting to see some tests where cache sizes are 10-20% of the database size with the same amount (10-20%) of working set (hot data).
- Niels Hagoort
  
  
  October 28, 2015, 5:44 PM
  
  A lot of testing is done using synthetic workloads, just like my test. Although I did try to really simulate a good OLTP workload, the best test you can do is a PoC using an actual ‘production’ workload.