Hybrid storage – Silverton Consulting

(Storage QoW 15-003): SMR disks in GA enterprise storage in 12 months? Yes@.85 probability

Posted on January 9, 2016January 9, 2016 by Ray in Data density, Disk storage, Forecasting, QoW 2015, Storage drive, Storage standards

Hard Disk by Jeff Kubina (cc) (from Flickr)

(Storage QoW 15-003): Will we see SMR (shingled magnetic recording) disks in GA enterprise storage systems over the next 12 months?

Are there two vendors of SMR?

Yes, both Seagate and HGST have announced and currently shipping (?) SMR drives, HGST has a 10TB drive and Seagate has an 8TB drive on the market since last summer.

One other interesting fact is that SMR will be the common format for all future disk head technologies including HAMR, MAMR, & BPMR (see presentation).

What would storage vendors have to do to support SMR drives?

Because of the nature of SMR disks, writes overlap other tracks so they must be written, at least in part, sequentially (see our original post on Sequential only disks). Another post I did reported on recent work by Garth Gibson at CMU (Shingled Magnetic Recording disks) which showed how multiple bands or zones on an SMR disk could be used some of which could be written randomly and others which could be written sequentially but all could be read randomly. With such an approach you could have a reasonable file system on an SMR device with a metadata partition (randomly writeable) and a data partition (sequentially writeable).

In order to support SMR devices, changes have been requested for the T10 SCSI & T13 ATA command protocols. Such changes would include:

SMR devices support a new write cursor for each SMR sequential band.
SMR devices support sequential writes within SMR sequential bands at the write cursor.
SMR band write cursors can be read, statused and reset to 0. SMR sequential band LBA writes only occur at the band cursor and for each LBA written, the SMR device increments the band cursor by one.
SMR devices can report their band map layout.

The presentation refers to multiple approaches to SMR support or SMR drive modes:

Restricted SMR devices – where the device will not accept any random writes, all writes occur at a band cursor, random writes are rejected by the device. But performance would be predictable.
Host Aware SMR devices – where the host using the SMR devices is aware of SMR characteristics and actively manages the device using write cursors and band maps to write the most data to the device. However, the device will accept random writes and will perform them for the host. This will result in sub-optimal and non-predictable drive performance.
Drive managed SMR devices – where the SMR devices acts like a randomly accessed disk device but maps random writes to sequential writes internally using virtualization of the drive LBA map, not unlike SSDs do today. These devices would be backward compatible to todays disk devices, but drive performance would be bad and non-predictable.

Unclear which of these drive modes are currently shipping, but I believe Restricted SMR device modes are already available and drive manufacturers would be working on Host Aware and Drive managed to help adoption.

So assuming Restricted SMR device mode availability and prototypes of T10/T13 changes are available, then there are significant but known changes for enterprise storage systems to support SMR devices.

Nevertheless, a number of hybrid storage systems already implement Log Structured File (LSF) systems on their backends, which mostly write sequentially to backend devices, so moving to a SMR restricted device modes would be easier for these systems.

Unclear how many storage systems have such a back end, but NetApp uses it for WAFL and just about every other hybrid startup has a LSF format for their backend layout. So being conservative lets say 50% of enterprise hybrid storage vendors use LSF.

The other 60% would have more of a problem implementing SMR restricted mode devices, but it’s only a matter of time before all will need to go that way. That is assuming they still use disks. So, we are primarily talking about hybrid storage systems.

All major storage vendors support hybrid storage and about 60% of startups support hybrid storage, so adding these to together, maybe about 75% of enterprise storage vendors have hybrid.

Using analysis on QoW 15-001, about 60% of enterprise storage vendors will probably ship new hardware versions of their systems over the next 12 months. So of the 13 likely new hardware systems over the next 12 months, 75% have hybrid solutions and 50% have LSF, or ~4.9 new hardware systems will be released over the next 12 months that are hybrid and have LSF backends already.

What are the advantages of SMR?

SMR devices will have higher storage densities and lower cost. Today disk drives are running 6-8TB and the SMR devices run 8-10TB so a 25-30% step up in storage capacity is possible with SMR devices.

New drive support has in the past been relatively easy because command sets/formats haven’t changed much over the past 7 years or so, but SMR is different and will take more effort to support. The fact that all new drives will be SMR over time gives more emphasis to get on the band wagon as soon as feasible. So, I would give a storage vendor a 80% likelihood of implementing SMR, assuming they have new systems coming out, are already hybrid and are already using LSF.

So of the ~4.9 systems that are LSF/Hybrid/being released *.8, says ~3.9 systems will introduce SMR devices over the next 12 months.

For non-LSF hybrid systems, the effort seems much harder, so I would give the likelihood of implementing SMR about a 40% chance. So of the ~8.1 systems left that will be introduced in next year, 75% are hybrid or ~6.1 systems and they have a 40% likelihood of implementing SMR so ~2.4 of these non-LSF systems will probably introduce SMR devices.

There’s one other category that we need to consider and that would be startups in stealth. These could have been designing their hybrid storage for SMR from the get go. In QoW 15-001 analysis I assumed another ~1.8 startup vendors would emerge to GA over the next 12 months. And if we assume that 0.75% of these were hybrid then there’s ~1.4 startups vendors that could be using SMR technology in their hybrid storage for a (4.9+2.4+1.4(1.8*.75)= 8.7 systems have a high probability of SMR implementation over the next 12 months in GA enterprise storage products.

Forecast

So my forecast of SMR adoption by enterprise storage is Yes for .85 probability (unclear what the probability should be, but it’s highly probable).

~~~~

Comments?

VMware VSAN 6.0 all-flash & hybrid IO performance at SFD7

Posted on July 17, 2015 by Ray in Block Storage, Clustered storage, Disk storage, SSD storage, Storage performance

We visited with VMware’s VSAN team during last Storage Field Day (SFD7, session available here). The presentation was wide ranging but the last two segments dealt with recent changes to VSAN and at the end provided some performance results for both a hybrid VSAN and an all-Flash VSAN.

Some new features in VSAN 6.0 include:

More scaleability, up to 64 hosts in a cluster and up to 200VMs per host
New higher performance snapshots & clones
Rack awareness for better availability
Hardware based checksum for T10 DIF (data integrity feature)
Support for blade servers with JBODs
All-flash configurations
Higher IO performance

Even in the all-flash configuration there are two tiers of storage a write cache tier and a capacity tier of SSDs. These are configured with two different classes of SSDs (high endurance/low-capacity and low-endurance/high capacity).

At the end of the session Christos Karamanolis (@Xtosk), Principal Architect for VSAN showed us some performance charts on VSAN 6.0 hybrid and all-flash configurations.

Hybrid VSAN performance

On the chart we see two plots showing IOmeter performance as VSAN scales across multiple nodes (hosts), on the left we have a 100% Read workload and on the right a 70%Read:30%Write workload.

The hybrid VSAN configuration has 4-10Krpm disks and 1-400GB SSD on each host and ranges from 8 to 64 hosts. The bars on the chart show IOmeter IOPS and the line shows the average response time (or IO latency) for each VSAN host configuration. I am not a big fan of IOmeter, as it’s an overly simplified, but that’s what VMware used.

The results show that in a 100% read case the hybrid, 64 host VSAN 6.0 cluster was able to sustain ~3.8M IOPS or over 60K IOPS per host. or the mixed 70:30 R:W workload VSAN 6.0 was able to sustain ~992K IOPs or ~15.5K IOPS per host.

We see a pretty drastic IOPs degradation (~1/4 the 100% read performance) in the plot on the right, when they added write activity to the mix. But with VSAN’s mirrored data protection each VM write represents at least two VSAN backend writes and at a 70:30 IOmeter R:W this would be ~694K IOPS read and ~298K IOPS write frontend IOs but with mirroring this represents 595K writes to the backend storage.

Then of course, there’s destage activity (data written to SSDs need to be read off SSD and written to HDD) which also multiplies internal IO operations for every external write IOP. Lets say all that activity multiplies each external write by 6 (3 for each mirror: 1 to the write cache SSD, 1 to read it back and 1 to write to HDD) and we multiply that times the ~298K external write IOPS, it would add up to about a total of ~1.8M write derived IOPS and ~0.7M read derived IOPS or a total of ~2.5M IOPS but this is still far away from the 3.5M IOPS for 100% read activity. I am probably missing another IO or two in the write path (maybe Virtual to physical mapping structures need to be updated) or have failed to account for more inter-cluster IO activity in support of the writes.

In addition, we see the IO latency was roughly flat across the 100% Read workload at ~2.25msec. and got slightly worse over the 70:30 R:W workload, ranging from ~2.5msec. at 4 hosts to a little over 3.0msec. with 64 hosts. Not sure why this got worse but hosts are scaled up it could induce more inter-cluster overhead.

In the chart to the right, we can see similar performance data for systems with one or two disk-groups. The message here is that with two disk groups on a host (2X the disk and SSD resources per host) one can potentially double the performance of the systems, to 116K IOPS/host on 100% read and 31K IOPS/host on a 70:30 R:W workload.

All-flash VSAN performance

Here we can see performance data for an 8-host, all-flash VSAN configuration. In this case the chart on the left was a single “disk” group and the chart on the right was a dual disk group, all-flash configuration on each of the 8-hosts. The hosts were configured with 1-400GB and 3-800GB SSDs per disk group.

The various bars on the charts represent different VM working set sizes, 100, 200, 400 & 600GB for the single disk group chart and 100, 200, 400, 800 & 1200GB for dual disk group configurations. For the dual disk group, the 1200GB working set size is much bigger than a cache tier on each host.

The chart text is a bit confusing: the title of each plot says 70% read but the text under the two plots says 100% read. I must assume these were 70:30 R:W workloads. If we just look at the 8 hosts, using a 400GB VM working set size, all-flash VSAN 6.0 single disk group cluster was able to achieve ~37.5K IOPS/host and with two disk groups, the all-flash VSAN 6.0 was able to achieve ~68.75K IOPS/host at the 400GB working set size. Both doubling the hybrid performance.

Response times degrade for both the single and dual disk groups as we increase the working set sizes. It’s pretty hard to see on the two charts but it seems to range from 1.8msec to 2.2msec for the single disk group and 1.8msec to 2.5 msec for the dual disk group. The two charts are somewhat misleading because they didn’t use the exact same working group sizes for the two performance runs but just taking the 100|200|400GB working set sizes, for the single disk group it looks like the latency went from ~1.8msec. to ~2.0msec and for the dual disk group from ~1.8msec to ~2.3msec. Why the higher degradation for the dual disk group is anyone’s guess.

The other thing that doesn’t make much sense is that as you increase the working set size the number of IOPS goes down, worse for the dual disk group than the single. Once again taking just the 100|200|400GB working group sizes this ranges from ~350K IOPS to ~300K IOPS (~15% drop) for the single disk group and ~700K IOPS to ~550K IOPS (~22% drop) for the dual disk group.

Increasing working group sizes should cause additional backend IO as the cache effectivity should be proportionately less as you increase working set size. Which I think goes a long way to explain the degradation in IOPS as you increase working set size. But I would have thought the degradation would have been a proportionally similar for both the single and dual disk groups. The fact that the dual disk group did 7% worse seems to indicate more overhead associated with dual disk groups than single disk groups or perhaps, they were running up against some host controller limits (a single controller supporting both disk groups).

~~~~

At the time (3 months ago) this was the first world-wide look at all-flash VSAN 6.0 performance. The charts are a bit more visible in the video than in my photos (?) and if you want to just see and hear Christos’s performance discussion check out ~11 minutes into the final video segment.

For more information you can also read these other SFD7 blogger posts on VMware’s session:

Storage Field Day 7 – Day 2 – VMware by Dan Firth (@PenguinPunk)
VMware – Storage Field Day 7 preview by Keith Thompson (@CTOAdvisor)

SNWUSA Spring 2013 summary

Posted on April 4, 2013June 4, 2013 by Ray in Block Storage, DAS, Disk storage, Ethernet, FC, File Storage, iSCSI, Networking, R&D measures, SSD storage, Storage, Storage performance

For starters the parties were a bit more subdued this year although I heard Wayne’s suite was hopping to 4am last night (not that I would ever be there that late).

And a trend seen the past couple of years was even more evident this year, many large vendors and vendor spokespeople went missing. I heard that there were a lot more analyst presentations this SNW than prior ones although it was hard to quantify. But it did seem that the analyst community was pulling double duty in presentations.

I would say that SNW still provides a good venue for storage marketing across all verticals. But these days many large vendors find success elsewhere, leaving SNW Expo mostly to smaller vendors and niche products. Nonetheless, there were a\ a few big vendors (Dell, Oracle and HP) still in evidence. But EMC, HDS, IBM and NetApp were not showing on the floor.

I would have to say the theme for this years SNW was hybrid storage. It seemed last fall the products that impressed me were either cloud storage gateways or all flash arrays but this year there weren’t as many of these at the show but hybrid storage certainly has arrived.

Best hybrid storage array of the show

It’s hard to pick just one hybrid storage vendor as my top pick, especially since there were at least 3 others talking to me under NDA, but from my perspective the Hybrid vendor of the show had to be Tegile (pronounced I think, as te’-jile). They seemed to have a fully functional system with snapshot, thin provisioning, deduplication and pretty good VMware support (only time I have heard a vendor talk about VASA “stun” support for thin provisioned volumes).

They made the statement that SSD in their system is used as a cache, not a tier. This use is similar to NetApp’s FlashCache and has proven to be a particularly well performing approach to use of hybrid storage. (For more information on that take a look at some of my NFS and recent SPC-1 benchmark review dispatches. How well this is integrated with their home grown dedupe logic is another question.

On the negative side, they seem to be lacking a true HA/dual controller version but could use two separate systems with synch (I think) replication between them to cover this ground?? They also claimed their dedupe had no performance penalty, a pretty bold claim that cries out for some serious lab validation and/or benchmarking to prove. They also offer an all flash version of their storage (but then how can it be used as a cache?).

The marketing team seemed pretty knowledgeable about the market space and they seem to be going after mid-range storage space.

The product supports file (NFS and CIFS/SMB), iSCSI and FC with GigE, 10GbE and 8Gbps FC. They quote “effective” capacities with dedupe enabled but it can be disabled on a volume basis.

Overall, I was impressed by their marketing and the product (what little I saw).

Best storage tool of the show

Moving onto other product categories, it was hard to see anything that caught my eye. Perhaps I have just been to too many storage conferences but I did get somewhat excited when I looked at SwiftTest. Essentially they offer a application profiling, storage modeling, workload generating tool set.

The team seems to be branching out of their traditional vendor market focus and going after large service providers and F100 companies with large storage requirements.

Way back, when I was in Engineering, we were always looking for some information as to how customers actually used storage products. Well what SwiftTest has, is an appliance to instrument your application environment (through network taps/hardware port connections) to monitor your storage IO and create a statistical operational profile of your I/O environment. Then take that profile and play it against a storage configuration model to show how well it’s likely to perform. And if that’s not enough the same appliance can be used to drive a simulated version of the operational profile back onto a storage system.

It offers NFS (v2,v3, v4) CIFS/SMB (SMB1, SMB2, SMB3) FC, iSCSI, and HTTP/REST (what no FCoe?). They mentioned an $8oK price tag for the base appliance (one protocol?) but grows up pretty fast, if you want all of them. They also seem to have three levels of appliances (my guess more performance and more protocols come with the bigger boxes).

Not sure where they top out but simulating an operational profile can be quite complex especially when you have to be able to control data patterns to match deduplication potential in customer data, drive markov chains with probability representations of operational profiles, and actually execute IO operations. They said something about their ports have dedicated CPU cores to insure adequate performance or something similar but still it seems to little to hit high IO workloads.

Like I said, when I was in engineering were searching for this type of solution back in the late 90s and we would have probably bought it in a moment, if it was available.

GoDaddy.com, the domain/web site services provider was one of their customers that used the appliance to test storage configurations. They presented at SNW on some of their results but I missed their session (the case study is available on SwiftTests website).

~~~~

In short, SNW had a diverse mixture of end user customers but lacked a full complement of vendors to show off to them. The ratio of vendors to customers has definitely shifted to end-users the last couple of years and if anything has gotten more skewed to end-users, (which paradoxically should appeal to more storage vendors?!).

I talked with lots of end-users, from companies like FedEx, Northrop Grumman and AOL to name just a few big ones. But there were plenty of smaller ones as well.

The show lasted three days and had sessions scheduled all the way to the end. I was surprised at the length and the fact that it started on Tuesday rather than Monday as in years past. Apparently, SNIA and Computerworld are still tweaking the formula.

It seemed to me there were more cancelled sessions than in years past but again this was hard to quantify.

Some of the customers I talked with thought SNW should go to a once a year and can’t understand why it’s still twice a year. Many mentioned VMworld as having taken the place of SNW in being a showplace for storage vendors of all sizes and styles. That and the vendor specific shows from EMC, IBM, Dell and others.

The fall show is moving to Long Beach, CA. Probably, a further experiment to find a formula that works. Let’s hope they succeed.

Comments?

Fall SNWUSA wrap-up

Posted on October 19, 2012January 21, 2013 by Ray in Block Storage, Cloud storage, Data QoS, Disk storage, Ethernet, FC, File Storage, iSCSI, SSD storage, Storage, Storage performance, System effectiveness

Attended SNWUSA this week in San Jose, It’s hard to see the show gradually change when you attend each one but it does seem that the end-user content and attendance is increasing proportionally. This should bode well for future SNWs. Although, there was always a good number of end users at the show but the bulk of the attendees in the past were from storage vendors.

Another large storage vendor dropped their sponsorship. HDS no longer sponsors the show and the last large vendor still standing at the show is HP. Some of this is cyclical, perhaps the large vendors will come back for the spring show, next year in Orlando, Fl. But EMC, NetApp and IBM seemed to have pretty much dropped sponsorship for the last couple of shows at least.

SSD startup of the show

Skyhawk hardware (c) 2012 Skyera, all rights reserved (from their website)

The best, new SSD startup had to be Skyera. A 48TB raw flash dual controller system supporting iSCSI block protocol and using real commercial grade MLC. The team at Skyera seem to be all ex-SandForce executives and technical people.

Skyera’s team have designed a 1U box called the Skyhawk, with a phalanx of NAND chips, there own controller(s) and other logic as well. They support software compression and deduplication as well as a special designed RAID logic that claims to reduce extraneous write’s to something just over 1 for RAID 6, dual drive failure equivalent protection.

Skyera’s underlying belief is that just as consumer HDAs took over from the big monster 14″ and 11″ disk drives in the 90’s sooner or later commercial NAND will take over from eMLC and SLC. And if one elects to stay with the eMLC and SLC technology you are destined to be one to two technology nodes behind. That is, commercial MLC (in USB sticks, SD cards etc) is currently manufactured with 19nm technology. The EMLC and SLC NAND technology is back at 24 or 25nm technology. But 80-90% of the NAND market is being driven by commercial MLC NAND. Skyera came out this past August.

Coming in second place was Arkologic an all flash NAS box using SSD drives from multiple vendors. In their case a fully populated rack holds about 192TB (raw?) with an active-passive controller configuration. The main concern I have with this product is that all their metadata is held in UPS backed DRAM (??) and they have up to 128GB of DRAM in the controller.

Arkologic’s main differentiation is supporting QOS on a file system basis and having some connection with a NIC vendor that can provide end to end QOS. The other thing they have is a new RAID-AS which is special designed for Flash.

I just hope their USP is pretty hefty and they don’t sell it someplace where power is very flaky, because when that UPS gives out, kiss your data goodbye as your metadata is held nowhere else – at least that’s what they told me.

Cloud storage startup of the show

There was more cloud stuff going on at the show. Talked to at least three or four cloud gateway providers. But the cloud startup of the show had to be Egnyte. They supply storage services that span cloud storage and on premises storage with an in band or out-of-band solution and provide file synchronization services for file sharing across multiple locations. They have some hooks into NetApp and other major storage vendor products that allows them to be out-of-band for these environments but would need to be inband for other storage systems. Seems an interesting solution that if succesful may help accelerate the adoption of cloud storage in the enterprise, as it makes transparent whether storage you access is local or in the cloud. How they deal with the response time differences is another question.

Different idea startup of the show

The new technology showplace had a bunch of vendors some I had never heard of before but one that caught my eye was Actifio. They were at VMworld but I never got time to stop by. They seem to be taking another shot at storage virtualization. Only in this case rather than focusing on non-disruptive file migration they are taking on the task of doing a better job of point in time copies for iSCSI and FC attached storage.

I assume they are in the middle of the data path in order to do this and they seem to be using copy-on-write technology for point-in-time snapshots. Not sure where this fits, but I suspect SME and maybe up to mid-range.

Most enterprise vendors have solved these problems a long time ago but at the low end, it’s a little more variable. I wish them luck but although most customers use snapshots if their storage has it, those that don’t, seem unable to understand what they are missing. And then there’s the matter of being in the data path?!

~~~~

If there was a hybrid startup at the show I must have missed them. Did talk with Nimble Storage and they seem to be firing on all cylinders. Maybe someday we can do a deep dive on their technology. Tintri was there as well in the new technology showcase and we talked with them earlier this year at Storage Tech Field Day.

The big news at the show was Microsoft purchasing StorSimple a cloud storage gateway/cache. Apparently StorSimple did a majority of their business with Microsoft’s Azure cloud storage and it seemed to make sense to everyone.

The SNIA suite was hopping as usual and the venue seemed to work well. Although I would say the exhibit floor and lab area was a bit to big. But everything else seemed to work out fine.

On Wednesday, the CIO from Dish talked about what it took to completely transform their IT environment from a management and leadership perspective. Seemed like an awful big risk but they were able to pull it off.

All in all, SNW is still a great show to learn about storage technology at least from an end-user perspective. I just wish some more large vendors would return once again, but alas that seems to be a dream for now.