
I saw where Seagate announced the next generation of their Momentus XT Hybrid (SSD & Disk) drive this week. We haven’t discussed Hybrid drives much on this blog but it has become a viable product family.
I am not planning on describing the new drive specs here as there was an excellent review by Greg Schulz at StorageIOblog.
However, the question some in the storage industry have had is can Hybrid drives supplant data center storage. I believe the answer to that is no and I will tell you why.
Hybrid drive secrets
The secret to Seagate’s Hybrid drive lies in its FAST technology. It provides a sort of automated disk caching that moves frequently accessed OS or boot data to NAND/SSD providing quicker access times.
Storage subsystem caching logic has been around in storage subsystems for decade’s now, ever since the IBM 3880 Mod 11&13 storage control systems came out last century. However, these algorithms have gotten much more sophisticated over time and today can make a significant difference in storage system performance. This can be easily witnessed by the wide variance in storage system performance on a per disk drive basis (e.g., see my post on Latest SPC-2 results – chart of the month).
Enterprise storage use of Hybrid drives?
The problem with using Hybrid drives in enterprise storage is that caching algorithms are based on some predictability of access/reference patterns. When you have a Hybrid drive directly connected to a server or a PC it can view a significant portion of server IO (at least to the boot/OS volume) but more importantly, that boot/OS data is statically allocated, i.e., doesn’t move around all that much. This means that one PC session looks pretty much like the next PC session and as such, the hybrid drive can learn an awful lot about the next IO session just by remembering the last one.
However, enterprise storage IO changes significantly from one storage session (day?) to another. Not only are the end-user generated database transactions moving around the data, but the data itself is much more dynamically allocated, i.e., moves around a lot.
Backend data movement is especially true for automated storage tiering used in subsystems that contain both SSDs and disk drives. But it’s also true in systems that map data placement using log structured file systems. NetApp Write Anywhere File Layout (WAFL) being a prominent user of this approach but other storage systems do this as well.
In addition, any fixed, permanent mapping of a user data block to a physical disk location is becoming less useful over time as advanced storage features make dynamic or virtualized mapping a necessity. Just consider snapshots based on copy-on-write technology, all it takes is a write to have a snapshot block be moved to a different location.
Nonetheless, the main problem is that all the smarts about what is happening to data on backend storage primarily lies at the controller level not at the drive level. This not only applies to data mapping but also end-user/application data access, as cache hits are never even seen by a drive. As such, Hybrid drives alone don’t make much sense in enterprise storage.
Maybe, if they were intricately tied to the subsystem
I guess one way this could all work better is if the Hybrid drive caching logic were somehow controlled by the storage subsystem. In this way, the controller could provide hints as to which disk blocks to move into NAND. Perhaps this is a way to distribute storage tiering activity to the backend devices, without the subsystem having to do any of the heavy lifting, i.e., the hybrid drives would do all the data movement under the guidance of the controller.
I don’t think this likely because it would take industry standardization to define any new “hint” commands and they would be specific to Hybrid drives. Barring standards, it’s an interface between one storage vendor and one drive vendor. Probably ok if you made both storage subsystem and hybrid drives but there aren’t any vendor’s left that does both drives and the storage controllers.
~~~~
So, given the state of enterprise storage today and its continuing proclivity to move data around accross its backend storage, I believe Hybrid drives won’t be used in enterprise storage anytime soon.
Comments?
Interesting blog and take on the whole hybrid drive conundrum. I would add two other reasons why hybrid drives don’t make it in the enterprise. First, most of these drives write data to disk first, then cache, so there is no improvement in data write performance as there would be with a pure SSD solution, such as Kaminario’s K2. Second, the controller is geared to hard disk performance rather than SSD, so it becomes a bottleneck that prevents the SSD from reaching its full performance potential. If you want to get the full performance of SSD, you need to chuck the disk and disk controller and get a system that’s built from the ground up for SSD. And of course in the enterprise you also want a system that provides enterprise level scalability and high availability.
Gareth,Thanks for your comment. Not sure I agree with your conclusions though. Nothing says the Hybrid drive controllers have to be built around disks rather than NAND memory performance. How writes are buffered is also a bit more complex. Most disk drives today have a DRAM buffer which is written to before the data is put on disk. I could conceive of a Hybrid drive that would write the data to DRAM while at the same time as writing it to NAND. At which time the drive has “recorded” the write data and could respond that the write is “finished”. In any case, I think the main issue with Hybrid drives behind enterprise storage controllers is where the knowledge of IO patterns lies.Ray
My only comment is I think you are spot on. I think we're much more likely to see continued emphasis on the arrays handling this task in the enterprise storage world, as we see with EMC FAST Cache and NetApp Flash Cache.
@INDStorageThanks for your comment. I also believe HHDD will not appear as backend storage for a long time.Ray LucchesiRay@SilvertonConsulting.comBlog: RayOnStorage.comTwitter: twitter.com/RayLucchesi+1-720-221-7270
The other interesting "feature" of these drives is that data movement only occurs during a power cycle. There's no way this type of operation could work in an enterprise storage environment. Great technology for PC's not so much for enterprise storage.
Chris,Thanks for your comment. However, I believe that HHDD don't just cache at poweron but cache anytime they are doing IO.Ray
Ray at least with the Seagate Momentus XT HHDD your belief is correct in that they do actively cache while power is on which can be demonstrated quite easily. For example, after the first couple of boots you will see things subsequently faster almost the same as speed of booting off a SSD.
Likewise when working on documents that are frequently accessed, you will also see the active caching that occurs with persistency meaning that when power is removed, no data is lost that was in cache.
How do I know this?
Simple, I have tried it to see what would happen…
I recently installed yet another HHDD, this one a 750GB Momentus XT (newest generation) that I was able to do some performance testing vs. previous generation that one of these days I will get around to posting results/findings on my on-going momentus moments blog post series.
Oh and for these (or at least the Momentus HHDDs) writing data to disk before to the cache, lol, that’s a good one I will add to the best of FUD list
Cheers gs
Greg,Thanks for the comment and the proof point. Sounds like I need to get one. If only they had Mac support.Ray
No worries Ray, you should get one for your Mac and check them out for yourself, check out the various forums for how to install/configure for Mac.
Otoh, if you have the money and need the speed and need the capacity, for for one of the 250GB or larger SSDs.
Cheers gs