Will Hybrid drives conquer enterprise storage?

Toyota Hybrid Synergy Drive Decal: RAC Future Car Challenge by Dominic's pics (cc) (from Flickr)
Toyota Hybrid Synergy Drive Decal: RAC Future Car Challenge by Dominic's pics (cc) (from Flickr)

I saw where Seagate announced the next generation of their Momentus XT Hybrid (SSD & Disk) drive this week.  We haven’t discussed Hybrid drives much on this blog but it has become a viable product family.

I am not planning on describing the new drive specs here as there was an excellent review by Greg Schulz at StorageIOblog.

However, the question some in the storage industry have had is can Hybrid drives supplant data center storage.  I believe the answer to that is no and I will tell you why.

Hybrid drive secrets

The secret to Seagate’s Hybrid drive lies in its FAST technology.  It provides a sort of automated disk caching that moves frequently accessed OS or boot data to NAND/SSD providing quicker access times.

Storage subsystem caching logic has been around in storage subsystems for decade’s now, ever since the IBM 3880 Mod 11&13 storage control systems came out last century.  However, these algorithms have gotten much more sophisticated over time and today can make a significant difference in storage system performance.  This can be easily witnessed by the wide variance in storage system performance on a per disk drive basis (e.g., see my post on Latest SPC-2 results – chart of the month).

Enterprise storage use of Hybrid drives?

The problem with using Hybrid drives in enterprise storage is that caching algorithms are based on some predictability of access/reference patterns.  When you have a Hybrid drive directly connected to a server or a PC it can view a significant portion of server IO (at least to the boot/OS volume) but more importantly, that boot/OS data is statically allocated, i.e., doesn’t move around all that much.   This means that one PC session looks pretty much like the next PC session and as such, the hybrid drive can learn an awful lot about the next IO session just by remembering the last one.

However, enterprise storage IO changes significantly from one storage session (day?) to another.  Not only are the end-user generated database transactions moving around the data, but the data itself is much more dynamically allocated, i.e., moves around a lot.

Backend data movement is especially true for automated storage tiering used in subsystems that contain both SSDs and disk drives. But it’s also true in systems that map data placement using log structured file systems.  NetApp Write Anywhere File Layout (WAFL) being a prominent user of this approach but other storage systems do this as well.

In addition, any fixed, permanent mapping of a user data block to a physical disk location is becoming less useful over time as advanced storage features make dynamic or virtualized mapping a necessity.  Just consider snapshots based on copy-on-write technology, all it takes is a write to have a snapshot block be moved to a different location.

Nonetheless, the main problem is that all the smarts about what is happening to data on backend storage primarily lies at the controller level not at the drive level.  This not only applies to data mapping but also end-user/application data access, as cache hits are never even seen by a drive.  As such, Hybrid drives alone don’t make much sense in enterprise storage.

Maybe, if they were intricately tied to the subsystem

I guess one way this could all work better is if the Hybrid drive caching logic were somehow controlled by the storage subsystem.  In this way, the controller could provide hints as to which disk blocks to move into NAND.  Perhaps this is a way to distribute storage tiering activity to the backend devices, without the subsystem having to do any of the heavy lifting, i.e., the hybrid drives would do all the data movement under the guidance of the controller.

I don’t think this likely because it would take industry standardization to define any new “hint” commands and they would be specific to Hybrid drives.  Barring standards, it’s an interface between one storage vendor and one drive vendor.  Probably ok if you made both storage subsystem and hybrid drives but there aren’t any vendor’s left that does both drives and the storage controllers.

~~~~

So, given the state of enterprise storage today and its continuing proclivity to move data around accross its backend storage,  I believe Hybrid drives won’t be used in enterprise storage anytime soon.

Comments?