Latest SPC-2 MBPS vs drive count results – chart-of-the-month

SCISPC120529-001 (c) 2012 Silverton Consulting, All Rights Reserved
SCISPC120529-001 (c) 2012 Silverton Consulting, All Rights Reserved

The above chart comes from our August performance analysis [yes, I am a bit behind] and is a scatter plot of Storage Performance Council SPC-2 submissions. In the above we plot MBPS™ on the vertical axis and the number of disk drives in the submission on the horizontal.  We have also added a linear regression line using the data with the regression formula listed.

Unlike SPC-1 performance results and IOPS™ vs. drives documented in an earlier post, SPC-2 MPBS results have a much wider variance and the regression coefficient (R**2) at ~0.42, shows it.  In the earlier SPC-1 post the IOPS-drive count linear regression had a R**2 of ~0.96.

Why would SPC-1 IOPS be more driven by drive counts than MBPS?  We can only speculate of course,  but it seems to me that SPC-2 MBPS is more a function of system caching effectiveness rather than pure IO transaction speed.

All the SPC-2 workloads (VOD, LFP, and LDQ) are sequential in nature and as such, caching sequential lookahead sophistication can make more effective use of fewer spindles. In contrast, SPC-1 IOPS workloads are almost inherently random in nature and as such, are poor cache candidates which by natur depend on high counts of spindles to perform well.

In additon, SPC-2 has never been as popular as SPC-1 and as a result, doesn’t have as many submissions.  It’s never been clear to me why this is the case as not all enterprise class workloads are random and as such, sequential activity is a necessary requirement for many enterprise storage systems.

Comments?

~~~~

The complete SPC-1 & SPC-2 performance report with more top 10 charts went out in SCI’s August newsletter.  But a copy of the report will be posted on our dispatches page sometime this month (if all goes well).  However, you can get the latest SPC performance analysis now and subscribe to future free newsletters by just using the signup form above right.

For a more extensive discussion of current SAN block system storage performance covering SPC (Top 30) results as well as ESRP results with our new ChampionsChart™ for SAN storage systems, please see SCI’s SAN Storage Buying Guide available from our website.

As always, we welcome any suggestions or comments on how to improve our analysis of SPC results or any of our other storage performance analyses.


 

Will Hybrid drives conquer enterprise storage?

Toyota Hybrid Synergy Drive Decal: RAC Future Car Challenge by Dominic's pics (cc) (from Flickr)
Toyota Hybrid Synergy Drive Decal: RAC Future Car Challenge by Dominic's pics (cc) (from Flickr)

I saw where Seagate announced the next generation of their Momentus XT Hybrid (SSD & Disk) drive this week.  We haven’t discussed Hybrid drives much on this blog but it has become a viable product family.

I am not planning on describing the new drive specs here as there was an excellent review by Greg Schulz at StorageIOblog.

However, the question some in the storage industry have had is can Hybrid drives supplant data center storage.  I believe the answer to that is no and I will tell you why.

Hybrid drive secrets

The secret to Seagate’s Hybrid drive lies in its FAST technology.  It provides a sort of automated disk caching that moves frequently accessed OS or boot data to NAND/SSD providing quicker access times.

Storage subsystem caching logic has been around in storage subsystems for decade’s now, ever since the IBM 3880 Mod 11&13 storage control systems came out last century.  However, these algorithms have gotten much more sophisticated over time and today can make a significant difference in storage system performance.  This can be easily witnessed by the wide variance in storage system performance on a per disk drive basis (e.g., see my post on Latest SPC-2 results – chart of the month).

Enterprise storage use of Hybrid drives?

The problem with using Hybrid drives in enterprise storage is that caching algorithms are based on some predictability of access/reference patterns.  When you have a Hybrid drive directly connected to a server or a PC it can view a significant portion of server IO (at least to the boot/OS volume) but more importantly, that boot/OS data is statically allocated, i.e., doesn’t move around all that much.   This means that one PC session looks pretty much like the next PC session and as such, the hybrid drive can learn an awful lot about the next IO session just by remembering the last one.

However, enterprise storage IO changes significantly from one storage session (day?) to another.  Not only are the end-user generated database transactions moving around the data, but the data itself is much more dynamically allocated, i.e., moves around a lot.

Backend data movement is especially true for automated storage tiering used in subsystems that contain both SSDs and disk drives. But it’s also true in systems that map data placement using log structured file systems.  NetApp Write Anywhere File Layout (WAFL) being a prominent user of this approach but other storage systems do this as well.

In addition, any fixed, permanent mapping of a user data block to a physical disk location is becoming less useful over time as advanced storage features make dynamic or virtualized mapping a necessity.  Just consider snapshots based on copy-on-write technology, all it takes is a write to have a snapshot block be moved to a different location.

Nonetheless, the main problem is that all the smarts about what is happening to data on backend storage primarily lies at the controller level not at the drive level.  This not only applies to data mapping but also end-user/application data access, as cache hits are never even seen by a drive.  As such, Hybrid drives alone don’t make much sense in enterprise storage.

Maybe, if they were intricately tied to the subsystem

I guess one way this could all work better is if the Hybrid drive caching logic were somehow controlled by the storage subsystem.  In this way, the controller could provide hints as to which disk blocks to move into NAND.  Perhaps this is a way to distribute storage tiering activity to the backend devices, without the subsystem having to do any of the heavy lifting, i.e., the hybrid drives would do all the data movement under the guidance of the controller.

I don’t think this likely because it would take industry standardization to define any new “hint” commands and they would be specific to Hybrid drives.  Barring standards, it’s an interface between one storage vendor and one drive vendor.  Probably ok if you made both storage subsystem and hybrid drives but there aren’t any vendor’s left that does both drives and the storage controllers.

~~~~

So, given the state of enterprise storage today and its continuing proclivity to move data around accross its backend storage,  I believe Hybrid drives won’t be used in enterprise storage anytime soon.

Comments?

 

Enterprise data storage defined and why 3PAR?

More SNW hall servers and storage
More SNW hall servers and storage

Recent press reports about a bidding war for 3PAR bring into focus the expanding need for enterprise class data storage subsystems.  What exactly is enterprise storage?

Defining enterprise storage is frought with problems but I will take a shot.  Enterprise class data storage has:

  • Enhanced reliability, high availability and serviceability – meaning it hardly ever fails, it keeps operating (on redundant components) when it does fail, and repairing the storage when the rare failure occurs can be accomplished without disrupting ongoing storage services
  • Extreme data integrity – goes beyond just RAID storage, meaning that these systems lose data very infrequently, provide the latest data written to a location when read and will tell you when data cannot be accessed.
  • Automated I/O performance – meaning sophisticated caching algorithms that try to keep ahead of sequential I/O streams, buffer actively read data, and buffer write data in non-volatile cache before destaging to disk or other media.
  • Multiple types of storage – meaning the system supports SATA, SAS and/or FC disk drives and SSDs or Flash storage
  • PBs of storage – meaning behind one enterprise class storage (sub-)system one can support over 1PB of storage
  • Sophisticated functionality – meaning the system supports multiple forms of offsite replication, thin provisioning, storage tiering, point-in-time copies, data cloning, administration GUIs/CLIs, etc.
  • Compatibility with all enterprise O/Ss – meaning the storage has been tested and is on hardware compatibility lists for every major operating system in use by the enterprise today.

As for storage protocol, it seems best to leave this off the list.  I wanted to just add block storage, but enterprises today probably have as much if not more external file storage (CIFS or NFS) as they have block storage (FC or iSCSI).  And the proportion in file systems seems to be growing (see IDC report referenced below).

In addition, while I don’t like the non-determinism of iSCSI or file access protocols, this doesn’t seem to stop such storage from putting up pretty impressive performance numbers (see our performance dispatches).  Anything that can crack 100K I/O or file operations per second probably deserves to call themselves enterprise storage as long as they meet the other requirements.  So, maybe I should add high-performance storage to the list above.

Why the sudden interest in enterprise storage?

Enterprise storage has been around arguably since the 2nd half of last century (for mainframe systems) but lately has become even more interesting as applications deploy to the cloud and server virtualization (from VMware, Microsoft Hyper-V and others) takes over the data center.

Cloud storage and cloud computing services are lowering the entry points for storage and processing, enabling application deployments which were heretofore unaffordable.  These new cloud applications consume storage at increasing rates and don’t seem to be slowing down any time soon.  Arguably, some cloud storage is not enterprise storage but as service levels go up for these applications, providers must ultimately turn to enterprise storage.

In addition, server virtualization transforms the enterprise data center from a single application per server to easily 5 or more applications per physical server.  This trend is raising server utilization, driving more I/O, and requiring higher capacity.  Such “multi-application” storage almost always requires high availability, reliability and performance to work well, generating even more demand for enterprise data storage systems.

Despite all the demand, world wide external storage revenues dropped 12% last year according to IDC.  Now the economy had a lot to do with this decline but another factor reducing external storage revenue is the ongoing drop in the price of storage on a $/GB basis.  To this point, that same IDC report stated that external storage capacity increased 33% last year.

Why Dell & HP wants 3PAR storage?

Margins on enterprise storage are good, some would say very good.  While raw disk storage can be had at under $0.50/GB, enterprise class storage is often 10 or more times that price.  Now that has to cover redundant hardware, software/firmware engineering and other characteristics, but this still leaves pretty good margins.

In my mind, Dell would see enterprise storage as a natural extension of their current enterprise server business.  They already sell and support these customers, including enterprise class storage just adds another product to the mix.  Developing enterprise storage from scratch is probably a 4-7 year journey with the right people, buying 3PAR puts them in the market today with a competitive product.

HP is already in the enterprise storage market today, with their XP and EVA storage subsystems.  However, having their own 3PAR enterprise class storage may get them better margins than their current XP storage OEMed from HDS.  But I think Chuck Hollis’s post on HP’s counter bid for 3PAR may have revealed another side to this discussion – sometime M&A is as much about constraining your competition as it is about adding new capabilities to a company.

——

What do you think?