Will Hybrid drives conquer enterprise storage?

Toyota Hybrid Synergy Drive Decal: RAC Future Car Challenge by Dominic's pics (cc) (from Flickr)
Toyota Hybrid Synergy Drive Decal: RAC Future Car Challenge by Dominic's pics (cc) (from Flickr)

I saw where Seagate announced the next generation of their Momentus XT Hybrid (SSD & Disk) drive this week.  We haven’t discussed Hybrid drives much on this blog but it has become a viable product family.

I am not planning on describing the new drive specs here as there was an excellent review by Greg Schulz at StorageIOblog.

However, the question some in the storage industry have had is can Hybrid drives supplant data center storage.  I believe the answer to that is no and I will tell you why.

Hybrid drive secrets

The secret to Seagate’s Hybrid drive lies in its FAST technology.  It provides a sort of automated disk caching that moves frequently accessed OS or boot data to NAND/SSD providing quicker access times.

Storage subsystem caching logic has been around in storage subsystems for decade’s now, ever since the IBM 3880 Mod 11&13 storage control systems came out last century.  However, these algorithms have gotten much more sophisticated over time and today can make a significant difference in storage system performance.  This can be easily witnessed by the wide variance in storage system performance on a per disk drive basis (e.g., see my post on Latest SPC-2 results – chart of the month).

Enterprise storage use of Hybrid drives?

The problem with using Hybrid drives in enterprise storage is that caching algorithms are based on some predictability of access/reference patterns.  When you have a Hybrid drive directly connected to a server or a PC it can view a significant portion of server IO (at least to the boot/OS volume) but more importantly, that boot/OS data is statically allocated, i.e., doesn’t move around all that much.   This means that one PC session looks pretty much like the next PC session and as such, the hybrid drive can learn an awful lot about the next IO session just by remembering the last one.

However, enterprise storage IO changes significantly from one storage session (day?) to another.  Not only are the end-user generated database transactions moving around the data, but the data itself is much more dynamically allocated, i.e., moves around a lot.

Backend data movement is especially true for automated storage tiering used in subsystems that contain both SSDs and disk drives. But it’s also true in systems that map data placement using log structured file systems.  NetApp Write Anywhere File Layout (WAFL) being a prominent user of this approach but other storage systems do this as well.

In addition, any fixed, permanent mapping of a user data block to a physical disk location is becoming less useful over time as advanced storage features make dynamic or virtualized mapping a necessity.  Just consider snapshots based on copy-on-write technology, all it takes is a write to have a snapshot block be moved to a different location.

Nonetheless, the main problem is that all the smarts about what is happening to data on backend storage primarily lies at the controller level not at the drive level.  This not only applies to data mapping but also end-user/application data access, as cache hits are never even seen by a drive.  As such, Hybrid drives alone don’t make much sense in enterprise storage.

Maybe, if they were intricately tied to the subsystem

I guess one way this could all work better is if the Hybrid drive caching logic were somehow controlled by the storage subsystem.  In this way, the controller could provide hints as to which disk blocks to move into NAND.  Perhaps this is a way to distribute storage tiering activity to the backend devices, without the subsystem having to do any of the heavy lifting, i.e., the hybrid drives would do all the data movement under the guidance of the controller.

I don’t think this likely because it would take industry standardization to define any new “hint” commands and they would be specific to Hybrid drives.  Barring standards, it’s an interface between one storage vendor and one drive vendor.  Probably ok if you made both storage subsystem and hybrid drives but there aren’t any vendor’s left that does both drives and the storage controllers.


So, given the state of enterprise storage today and its continuing proclivity to move data around accross its backend storage,  I believe Hybrid drives won’t be used in enterprise storage anytime soon.



SCI’s latest SPC-2 performance results analysis – chart-of-the-month

SCISPC110822-002 (c) 2011 Silverton Consulting, All Rights Reserved
SCISPC110822-002 (c) 2011 Silverton Consulting, All Rights Reserved

There really wasn’t that many new submissions for the Storage Performance Council SPC-1 or SPC-2 benchmarks this past quarter (just the new Fujitsu DX80S2 SPC-2 run) so we thought it time to roll out a new chart.

The chart above shows a scatter plot of the number of disk drives in a submission vs. the MB/sec attained for the Large Database Query (LDQ) component of an SPC-2 benchmark.

As one who follows this blog and our twitter feed knows we continue to have an ongoing, long running discussion on how I/O benchmarks such as this are mostly just a measure of how much hardware (disks and controllers) are thrown at them.  We added a linear regression line to the above chart to evaluate the validity of that claim and as clearly shown above, disk drive count is NOT highly correlated with SPC-2 performance.

We necessarily exclude from this analysis any system results that used NAND based caching or SSD devices so as to focus specifically on disk drive count relevance.   There are not a lot of these in SPC-2 results but there are enough to make this look even worse.

We chose to only display the LDQ segment of the SPC-2 benchmark because it has the best correlation or highest R**2 at 0.41 between workload and disk count. The aggregate MBPS as well as the other components of the SPC-2 benchmark include video on demand (VOD) and large file processing (LFP) both of which had R**2’s of less than 0.36.

For instance, just look at the vertical centered around 775 disk drives.  There are two systems that show up here, one doing ~ 6000 MBPS and the other doing ~11,500 MBPS – quite a difference.  The fact that these are two different storage architectures from the same vendor is even more informative??

Why is the overall correlation so poor?

One can only speculate but there must be something about system sophistication at work in SPC-2 results.  It’s probably tied to better caching, better data layout on disk, and better IO latency but it’s only an educated guess.  For example,

  • Most of the SPC-2 workload is sequential in nature.  How a storage system detects sequentiality in a seemingly random IO mix is an art form and what a system does armed with that knowledge is probably more of a science.
  • In the old days of big, expensive CKD DASD, sequential data was all laid out in consecutively (barring lacing) around a track and up a cylinder.  These days of zoned FBA disks one can only hope that sequential data resides in laced sectors, along consecutive tracks on the media, minimizing any head seek activity.  Another approach,  popular this last decade, has been to throw more disks at the problem, resulting in many more seeking heads to handle the workload and who care where the data lies.
  • IO latency is another factor.  We have discussed this before (see Storage throughput vs IO response time and why it matters. But one key to systems throughput is how quickly data gets out of cache and into the hands of servers. Of course the other part to this, is how fast does the storage system get the data from sitting on disk into cache.

Systems that do these better will perform better on SPC-2 like benchmarks that focus on raw sequential throughput.



The full SPC performance report went out to our newsletter subscribers last month.  A copy of the full report will be up on the dispatches page of our website later next month. However, you can get this information now and subscribe to future newsletters to receive these reports even earlier by just sending us an email or using the signup form above right.

As always, we welcome any suggestions on how to improve our analysis of SPC results or any of our other storage system performance discussions.



Storage throughput vs. IO response time and why it matters

Fighter Jets at CNE by lifecreation (cc) (from Flickr)
Fighter Jets at CNE by lifecreation (cc) (from Flickr)

Lost in much of the discussions on storage system performance is the need for both throughput and response time measurements.

  • By IO throughput I generally mean data transfer speed in megabytes per second (MB/s or MBPS), however another definition of throughput is IO operations per second (IO/s or IOPS).  I prefer the MB/s designation for storage system throughput because it’s very complementary with respect to response time whereas IO/s can often be confounded with response time.  Nevertheless, both metrics qualify as storage system throughput.
  • By IO response time I mean the time it takes a storage system to perform an IO operation from start to finish, usually measured in milleseconds although lately some subsystems have dropped below the 1msec. threshold.  (See my last year’s post on SPC LRT results for information on some top response time results).

Benchmark measurements of response time and throughput

Both Standard Performance Evaluation Corporation’s SPECsfs2008 and Storage Performance Council’s SPC-1 provide response time measurements although they measure substantially different quantities.  The problem with SPECsfs2008’s measurement of ORT (overall response time) is that it’s calculated as a mean across the whole benchmark run rather than a strict measurement of least response time at low file request rates.  I believe any response time metric should measure the minimum response time achievable from a storage system although I can understand SPECsfs2008’s point of view.

On the other hand SPC-1 measurement of LRT (least response time) is just what I would like to see in a response time measurement.  SPC-1 provides the time it takes to complete an IO operation at very low request rates.

In regards to throughput, once again SPECsfs2008’s measurement of throughput leaves something to be desired as it’s strictly a measurement of NFS or CIFS operations per second.  Of course this includes a number (>40%) of non-data transfer requests as well as data transfers, so confounds any measurement of how much data can be transferred per second.  But, from their perspective a file system needs to do more than just read and write data which is why they mix these other requests in with their measurement of NAS throughput.

Storage Performance Council’s SPC-1 reports throughput results as IOPS and provide no direct measure of MB/s unless one looks to their SPC-2 benchmark results.  SPC-2 reports on a direct measure of MBPS which is an average of three different data intensive workloads including large file access, video-on-demand and a large database query workload.

Why response time and throughput matter

Historically, we used to say that OLTP (online transaction processing) activity performance was entirely dependent on response time – the better storage system response time, the better your OLTP systems performed.  Nowadays it’s a bit more complex, as some of todays database queries can depend as much on sequential database transfers (or throughput) as on individual IO response time.  Nonetheless, I feel that there is still a large component of response time critical workloads out there that perform much better with shorter response times.

On the other hand, high throughput has its growing gaggle of adherents as well.  When it comes to high sequential data transfer workloads such as data warehouse queries, video or audio editing/download or large file data transfers, throughput as measured by MB/s reigns supreme – higher MB/s can lead to much faster workloads.

The only question that remains is who needs higher throughput as measured by IO/s rather than MB/s.  I would contend that mixed workloads which contain components of random as well as sequential IOs and typically smaller data transfers can benefit from high IO/s storage systems.  The only confounding matter is that these workloads obviously benefit from better response times as well.   That’s why throughput as measured by IO/s is a much more difficult number to understand than any pure MB/s numbers.


Now there is a contingent of performance gurus today that believe that IO response times no longer matter.  In fact if one looks at SPC-1 results, it takes some effort to find its LRT measurement.  It’s not included in the summary report.

Also, in the post mentioned above there appears to be a definite bifurcation of storage subsystems with respect to response time, i.e., some subsystems are focused on response time while others are not.  I would have liked to see some more of the top enterprise storage subsystems represented in the top LRT subsystems but alas, they are missing.

1954 French Grand Prix - Those Were The Days by Nigel Smuckatelli (cc) (from Flickr)
1954 French Grand Prix - Those Were The Days by Nigel Smuckatelli (cc) (from Flickr)

Call me old fashioned but I feel that response time represents a very important and orthogonal performance measure with respect to throughput of any storage subsystem and as such, should be much more widely disseminated than it is today.

For example, there is a substantive difference a fighter jet’s or race car’s top speed vs. their maneuverability.  I would compare top speed to storage throughput and its maneuverability to IO response time.  Perhaps this doesn’t matter as much for a jet liner or family car but it can matter a lot in the right domain.

Now do you want your storage subsystem to be a jet fighter or a jet liner – you decide.

Latest SPC-2 results – chart of the month

SPC-2* benchmark results, spider chart for LFP, LDQ and VOD throughput
SPC-2* benchmark results, spider chart for LFP, LDQ and VOD throughput

Latest SPC-2 (Storage Performance Council-2) benchmark resultschart displaying the top ten in aggregate MBPS(TM) broken down into Large File Processing (LFP), Large Database Query (LDQ) and Video On Demand (VOD) throughput results. One problem with this chart is that it really only shows 4 subsystems: HDS and their OEM partner HP; IBM DS5300 and Sun 6780 w/8GFC at RAID 5&6 appear to be the same OEMed subsystem; IBM DS5300 and Sun 6780 w/ 4GFC at RAID 5&6 also appear to be the same OEMed subsystem; and IBM SVC4.2 (with IBM 4700’s behind it).

What’s interesting about this chart is what’s going on at the top end. Both the HDS (#1&2) and IBM SVC (#3) seem to have found some secret sauce for performing better on the LDQ workload or conversely some dumbing down of the other two workloads (LFP and VOD). According to the SPC-2 specification

  • LDQ is a workload consisting of 1024KiB and 64KiB transfers whereas the LFP consists of 1024KiB and 256KiB transfers and the VOD consists of only 256KiB, so transfer size doesn’t tell the whole story.
  • LDQ seems to have a lower write proportion (1%) while attempting to look like joining two tables into one, or scanning data warehouse to create output whereas, LFP processing has a read rate of 50% (R:W of 1:1) while executing a write-only phase, read-write phase and a read-only phase, and apparently VOD has a 100% read only workload mimicking streaming video.
  • 50% of the LDQ workload uses 4 I/Os outstanding and the remainder 1 I/O outstanding. The LFP uses only 1 I/O outstanding and VOD uses only 8 I/Os outstanding.

These seem to be the major differences between the three workloads. I would have to say that some sort of caching sophistication is evident in the HDS and SVC systems that is less present in the remaining systems. And I was hoping to provide some sort of guidance as to what that sophistication looked like but

  • I was going to say they must have a better sequential detection algorithm but the VOD, LDQ and LFP workloads have 100%, 99% and 50% read ratios respectively and sequential detection should perform better with VOD and LDQ than LFP. So thats not all of it.
  • Next I was going to say it had something to do with I/O outstanding counts. But VOD has 8 I/Os outstanding and the LFP only has 1, so the if this were true VOD should perform better than LFP. While LDQ having two sets of phases with 1 and 4 I/Os outstanding should have results somewhere in between these two. So thats not all of it.
  • Next I was going to say stream (or file) size is an important differentiator but “Segment Stream Size” for all workloads is 0.5GiB. So that doesn’t help.

So now I am a complete loss as to understand why the LDQ workloads are so much better than the LFP and VOD workload throughputs for HDS and SVC.

I can only conclude that the little write activity (1%) thrown into the LDQ mix is enough to give the backend storage a breather and allow the subsystem to respond better to the other (99%) read activity. Why this would be so much better for the top performers than the remaining results is not entirely evident. But I would add that, being able to handle lots of writes or lots of reads is relatively straight forward, but handling a un-ballanced mixture is harder to do well.

To validate this conjecture would take some effort. I thought it would be easy to understand what’s happening but as with most performance conundrums the deeper you look the more confounding the results often seem to be.

The full report on the latest SPC results will be up on my website later this year but if you want to get this information earlier and receive your own copy of our newsletter – email me at SubscribeNews@SilvertonConsulting.com?Subject=Subscribe_to_Newsletter.

I will be taking the rest of the week off so Happy Holidays to all my readers and a special thanks to all my commenters. See you next week.