When will disks become extinct?

A head assembly on a Seagate disk drive by Robert Scoble (cc) (from flickr)
A head assembly on a Seagate disk drive by Robert Scoble (cc) (from flickr)

Yesterday, it was announced that Hitachi General Storage Technologies (HGST) is being sold to Western Digital for $4.3B and after that there was much discussion in the tweeterverse about the end of enterprise disk as we know it.  Also, last week I was at a dinner at an analyst meeting with Hitachi, where the conversation turned to when disks will no longer be available. This discussion was between Mr. Takashi Oeda of Hitachi RSD, Mr. John Webster of Evaluator group and myself.

Why SSDs will replace disks

John was of the opinion that disks would stop being economically viable in about 5 years time and will no longer be shipping in volume, mainly due to energy costs.  Oeda-san said that Hitachi had predicted that NAND pricing on a $/GB basis would cross over (become less expensive than) 15Krpm disk pricing sometime around 2013.  Later he said that NAND pricing had not come down as fast as projected and that it was going to take longer than anticipated.  Note that Oeda-san mentioned density price cross over for only 15Krpm disk not 7200rpm disk.  In all honesty, he said SATA disk would take longer, but he did not predict when

I think both arguments are flawed:

  • Energy costs for disk drives drop on a Watts/GB basis every time disk density increases. So the energy it takes to run a 600GB drive today will likely be able to run a 1.2TB drive tomorrow.  I don’t think energy costs are going to be the main factor to drives disks out of the enterprise.
  • Density costs for NAND storage are certainly declining but cost/GB is not the only factor in technology adoption. Disk storage has cost more than tape capacity since the ’50s, yet they continue to coexist in the enterprise. I contend that disks will remain viable for at least the next 15-20 years over SSDs, primarily because disks have unique functional advantages which are vital to enterprise storage.

Most analysts would say I am wrong, but I disagree. I believe disks will continue to play an important role in the storage hierarchy of future enterprise data centers.

NAND/SSD flaws from an enterprise storage perspective

All costs aside, NAND based SSDs have serious disadvantages when it comes to:

  • Data retention – the problem with NAND data cells is that they can only be written so many times before they fail.  And as NAND cells become smaller, this rate seems to be going the wrong way, i.e,  today’s NAND technology can support 100K writes before failure but tomorrow’s NAND technology may only support 15K writes before failure.  This is not a beneficial trend if one is going to depend on NAND technology for the storage of tomorrow.
  • Sequential access – although NAND SSDs perform much better than disk when it comes to random reads and less so, random writes, the performance advantage of sequential access is not that dramatic.  NAND sequential access can be sped up by deploying multiple parallel channels but it starts looking like internal forms of wide striping across multiple disk drives.
  • Unbalanced performance – with NAND technology, reads operate quicker than writes. Sometimes 10X faster.  Such unbalanced performance can make dealing with this technology more difficult and less advantageous than disk drives of today with much more balanced performance.

None of these problems will halt SSD use in the enterprise. They can all be dealt with through more complexity in the SSD or in the storage controller managing the SSDs, e.g., wear leveling to try to prolong data retention, multi-data channels for sequential access, etc. But all this additional complexity increases SSD cost, and time to market.

SSD vendors would respond with yes it’s more complex, but such complexity is a one time charge, mostly a one time delay, and once done, incremental costs are minimal. And when you come down to it, today’s disk drives are not that simple either with defect skipping, fault handling, etc.

So why won’t disk drives go away soon.  I think other major concern in NAND/SSD ascendancy is the fact that the bulk NAND market is moving away from SLC (single level cell or bit/cell) NAND to MLC (multi-level cell) NAND due to it’s cost advantage.  When SLC NAND is no longer the main technology being manufactured, it’s price will not drop as fast and it’s availability will become more limited.

Some vendors also counter this trend by incorporating MLC technology into enterprise SSDs. However, all the problems discussed earlier become an order of magnitude more severe with MLC NAND. For example, rather than 100K write operations to failure with SLC NAND today, it’s more like 10K write operations to failure on current MLC NAND.  The fact that you get 2 to 3 times more storage per cell with MLC doesn’t help that much when one gets 10X less writes per cell. And the next generation of MLC is 10X worse, maybe getting on the order of 1000 writes/cell prior to failure.  Similar issues occur for write performance, MLC writes are much slower than SLC writes.

So yes, raw NAND may become cheaper than 15Krpm Disks on a $/GB basis someday but the complexity to deal with such technology is also going up at an alarming rate.

Why disks will persist

Now something similar can be said for disk density, what with the transition to thermally assisted recording heads/media and the rise of bit-patterned media.  All of which are making disk drives more complex with each generation that comes out.  So what allows disks to persist long after $/GB is cheaper for NAND than disk:

  • Current infrastructure supports disk technology well in enterprise storage. Disks have been around so long, that storage controllers and server applications have all been designed around them.  This legacy provides an advantage that will be difficult and time consuming to overcome. All this will delay NAND/SSD adoption in the enterprise for some time, at least until this infrastructural bias towards disk is neutralized.
  • Disk technology is not standing still.  It’s essentially a race to see who will win the next generations storage.  There is enough of an eco-system around disk that will keep pushing media, heads and mechanisms ever forward into higher densities, better throughput, and more economical storage.

However, any infrastructural advantage can be overcome in time.  What will make this go away even quicker is the existance of a significant advantage over current disk technology in one or more dimensions. Cheaper and faster storage can make this a reality.

Moreover, as for the ecosystem discussion, arguably the NAND ecosystem is even larger than disk.  I don’t have the figures but if one includes SSD drive producers as well as NAND semiconductor manufacturers the amount of capital investment in R&D is at least the size of disk technology if not orders of magnitude larger.

Disks will go extinct someday

So will disks become extinct, yes someday undoubtedly, but when is harder to nail down. Earlier in my career there was talk of super-paramagnetic effect that would limit how much data could be stored on a disk. Advances in heads and media moved that limit out of the way. However, there will come a time where it becomes impossible (or more likely too expensive) to increase magnetic recording density.

I was at a meeting a few years back where a magnetic head researcher predicted that such an end point to disk density increase would come in 25 years time for disk and 30 years for tape.  When this occurs disk density increase will stand still and then it’s a certainty that some other technology will take over.  Because as we all know data storage requirements will never stop increasing.

I think the other major unknown is other, non-NAND semiconductor storage technologies still under research.  They have the potential for  unlimited data retention, balanced performance and sequential performance orders of magnitude faster than disk and can become a much more functional equivalent of disk storage.  Such technologies are not commercially available today in sufficient densities and cost to even threaten NAND let alone disk devices.

—-

So when do disks go extinct.  I would say in 15 to 20 years time we may see the last disks in enterprise storage.  That would give disks an almost an 80 year dominance over storage technology.

But in any event I don’t see disks going away anytime soon in enterprise storage.

Comments?

What eMLC and eSLC do for SSD longevity

Enterprise NAND from Micron.com (c) 2010 Micron Technology, Inc.
Enterprise NAND from Micron.com (c) 2010 Micron Technology, Inc.

I talked last week with some folks from Nimbus Data who were discussing their new storage subsystem.  Apparently it uses eMLC (enterprise Multi-Level Cell) NAND SSDs for its storage and has no SLC (Single Level Cell) NAND at all.

Nimbus believes with eMLC they can keep the price/GB down and still supply the reliability required for data center storage applications.  I had never heard of eMLC before but later that week I was scheduled to meet with Texas Memory Systems and Micron Technologies that helped get me up to speed on this new technology.

eMLC/eSLC defined

eMLC and its cousin, eSLC are high durability NAND parts which supply more erase/program cycles than generally available from MLC and SLC respectively.  If today’s NAND technology can supply 10K erase/program cycles for MLC and similarly, 100K erase/program cycles for SLC then, eMLC can supply 30K.  Never heard a quote for eSLC but 300K erase/program cycles before failure might be a good working assumption.

The problem is that NAND wears out, and can only sustain so many erase/program cycles before it fails.  By having more durable parts, one can either take the same technology parts (from MLC to eMLC) to use them longer or move to cheaper parts (from SLC to eMLC) to use them in new applications.

This is what Nimbus Data has done with eMLC.  Most data center class SSD or cache NAND storage these days are based on SLC. But SLC, with only on bit per cell, is very expensive storage.  MLC has two (or three) bits per cell and can easily halve the cost of SLC NAND storage.

Moreover, the consumer market which currently drives NAND manufacturing depends on MLC technology for cameras, video recorders, USB sticks, etc.  As such, MLC volumes are significantly higher than SLC and hence, the cost of manufacturing MLC parts is considerably cheaper.

But the historic problem with MLC NAND is the reduction in durability.  eMLC addresses that problem by lengthening the page programming (tProg) cycle which creates a better, more lasting data write, but slows write performance.

The fact that NAND technology already has ~5X faster random write performance than rotating media (hard disk drives) makes this slightly slower write rate less of an issue. If eMLC took this to only ~2.5X disk writes it still would be significantly faster.  Also, there are a number of architectural techniques that can be used to speed up drive write speeds easily incorporated into any eMLC SSD.

How long will SLC be around?

The industry view is that SLC will go away eventually and be replaced with some form of MLC technology because the consumer market uses MLC and drives NAND manufacturing.  The volumes for SLC technology will just be too low to entice manufacturers to support it, driving the price up and volumes even lower – creating a vicious cycle which kills off SLC technology.  Not sure how much I believe this, but that’s conventional wisdom.

The problem with this prognosis is that by all accounts the next generation MLC will be even less durable than today’s generation (not sure I understand why but as feature geometry shrinks, they don’t hold charge as well).  So if today’s generation (25nm) MLC supports 10K erase/program cycles, most assume the next generation (~18nm) will only support 3K erase/program cycles. If eMLC then can still support 30K or even 10K erase/program cycles that will be a significant differentiator.

—-

Technology marches on.  Something will replace hard disk drives over the next quarter century or so and that something is bound to be based on transistorized logic of some kind, not the magnetized media used in disks today. Given todays technology trends, it’s unlikely that this will continue to be NAND but something else will most certainly crop up – stay tuned.

Anything I missed in this analysis?

Seagate launches their Pulsar SSD

Seagate's Pulsar SSD (seagate.com)
Seagate's Pulsar SSD (seagate.com)

Today Seagate announced their new SSD offering, named the Pulsar SSD.  It uses SLC NAND technology and comes in a 2.5″ form factor at 50, 100 or 200GB capacity.  The fact that it uses a 3GB/s SATA interface seems to indicate that Seagate is going after the server market rather than the highend storage market place but different interfaces can be added over time.

Pulsar SSD performance

The main fact that makes the Pulsar interesting is the peak write rates at 25,000 4KB aligned writes per second versus a peak read rate of 30,000.  The ratio of peak reads to peak writes 30:25 represents a significant advance over prior SSDs and presumably this is through the magic of buffering.  But once we get beyond peak IO buffering sustained 128KB writes drops to 2600, 5300, or 10,500 ops/sec for the 50, 100, and 200GB drives respectively.  Kind of interesting that this drops as capacity drops and implies that adding capacity also adds parallelism. Sustained 4KB reads for the Pulsar is speced at 30,000.

In contrast, STEC’s Zeus drive is speced at 45,000 random reads and 15,000 random writes sustained and 80,000 peak reads and 40,000 peak writes.  So performance wise the Seagate Pulsar (200GB) SSD has about ~37% the peak read and ~63% the peak write performance with ~67% the sustained read with ~70% the sustained write performance of the Zeus drive.

Pulsar reliability

The other items of interest is that Seagate states a 0.44% annual failure rate (AFR), so for a 100 Pulsar drive storage subsystem one Pulsar drive will fail every 2.27 years.  Also the Pulsar bit error rate (BER) is specified at <10E16 new and <10E15 at end of life.  As far as I can tell both of these specifications are better than STEC’s specs for the Zeus drive.

Both the Zeus and Pulsar drives support a 5 year limited warranty.  But if the Pulsar is indeed a more reliable drive as indicated by their respective specifications, vendors may prefer the Pulsar as it would require less service.

All this seems to say that reliability may become a more important factor in vendor SSD selection. I suppose once you get beyond 10K read or write IOPs per drive, performance differences just don’t matter that much. But a BER of 10E14 vs 10E16 may make a significant difference to product service cost and as such, may justify changing SSD vendors much easier. Seems to be opening up a new front in the SSD wars – drive reliability

Now if they only offered 6GB/s SAS or 4GFC interfaces…

Toshiba’s New MLC NAND Flash SSDs

Toshiba has recently announced a new series of SSD’s based on MLC NAND (Yahoo Biz story). This is only the latest in a series of MLC SSDs which Toshiba has released.

Historically, MLC (multi-level cell) NAND has supported higher capacity but has been slower and less reliable than SLC (single-level cell) NAND. The capacity points supplied for the new drive (64, 128, 256, & 512GB) reflect the higher density NAND. Toshiba’s performance numbers for new drives also look appealing but are probably overkill for most desktop/notebook/netbook users

Toshiba’s reliability specifications were not listed in the Yahoo story and probably would be hard to find elsewhere (I looked on the Toshiba America website and couldn’t locate any). However the duty cycle for a desktop/notebook data drive are not that severe. So the fact that MLC can only endure ~1/10th the writes that SLC can endure is probably not much of an issue.

SNIA is working on SSD (or SSS as SNIA calls it, see SNIA SSSI forum website) reliability but have yet to publish anything externally. Unsure whether they will break out MLC vs SLC drives but it’s certainly worthy of discussion.

But the advantage of MLC NAND SSDs is that they should be 2 to 4X cheaper than SLC SSDs, depending on the number (2, 3 or 4) of bits/cell, and as such, more affordable. This advantage can be reduced by the need to over-provision the device and add more parallelism in order to improve MLC reliability and performance. But both of these facilities are becoming more commonplace and so should be relatively straight forward to support in an SSD.

The question remains, given the reliability differences, when and if MLC NAND will ever become reliable enough for enterprise class SSDs. Although many vendors make MLC NAND SSDs for the notebook/desktop market (Intel, SanDISK, Samsung, etc.), FusionIO is probably one of the few using a combination of SLC and MLC NAND for enterprise class storage (see FusionIO press release). Although calling the FusionIO device an SSD is probably a misnomer. And what FusionIO does to moderate MLC endurance issues is not clear but buffering write data to SLC NAND must certainly play some part.