What eMLC and eSLC do for SSD longevity

Enterprise NAND from Micron.com (c) 2010 Micron Technology, Inc.
Enterprise NAND from Micron.com (c) 2010 Micron Technology, Inc.

I talked last week with some folks from Nimbus Data who were discussing their new storage subsystem.  Apparently it uses eMLC (enterprise Multi-Level Cell) NAND SSDs for its storage and has no SLC (Single Level Cell) NAND at all.

Nimbus believes with eMLC they can keep the price/GB down and still supply the reliability required for data center storage applications.  I had never heard of eMLC before but later that week I was scheduled to meet with Texas Memory Systems and Micron Technologies that helped get me up to speed on this new technology.

eMLC/eSLC defined

eMLC and its cousin, eSLC are high durability NAND parts which supply more erase/program cycles than generally available from MLC and SLC respectively.  If today’s NAND technology can supply 10K erase/program cycles for MLC and similarly, 100K erase/program cycles for SLC then, eMLC can supply 30K.  Never heard a quote for eSLC but 300K erase/program cycles before failure might be a good working assumption.

The problem is that NAND wears out, and can only sustain so many erase/program cycles before it fails.  By having more durable parts, one can either take the same technology parts (from MLC to eMLC) to use them longer or move to cheaper parts (from SLC to eMLC) to use them in new applications.

This is what Nimbus Data has done with eMLC.  Most data center class SSD or cache NAND storage these days are based on SLC. But SLC, with only on bit per cell, is very expensive storage.  MLC has two (or three) bits per cell and can easily halve the cost of SLC NAND storage.

Moreover, the consumer market which currently drives NAND manufacturing depends on MLC technology for cameras, video recorders, USB sticks, etc.  As such, MLC volumes are significantly higher than SLC and hence, the cost of manufacturing MLC parts is considerably cheaper.

But the historic problem with MLC NAND is the reduction in durability.  eMLC addresses that problem by lengthening the page programming (tProg) cycle which creates a better, more lasting data write, but slows write performance.

The fact that NAND technology already has ~5X faster random write performance than rotating media (hard disk drives) makes this slightly slower write rate less of an issue. If eMLC took this to only ~2.5X disk writes it still would be significantly faster.  Also, there are a number of architectural techniques that can be used to speed up drive write speeds easily incorporated into any eMLC SSD.

How long will SLC be around?

The industry view is that SLC will go away eventually and be replaced with some form of MLC technology because the consumer market uses MLC and drives NAND manufacturing.  The volumes for SLC technology will just be too low to entice manufacturers to support it, driving the price up and volumes even lower – creating a vicious cycle which kills off SLC technology.  Not sure how much I believe this, but that’s conventional wisdom.

The problem with this prognosis is that by all accounts the next generation MLC will be even less durable than today’s generation (not sure I understand why but as feature geometry shrinks, they don’t hold charge as well).  So if today’s generation (25nm) MLC supports 10K erase/program cycles, most assume the next generation (~18nm) will only support 3K erase/program cycles. If eMLC then can still support 30K or even 10K erase/program cycles that will be a significant differentiator.

—-

Technology marches on.  Something will replace hard disk drives over the next quarter century or so and that something is bound to be based on transistorized logic of some kind, not the magnetized media used in disks today. Given todays technology trends, it’s unlikely that this will continue to be NAND but something else will most certainly crop up – stay tuned.

Anything I missed in this analysis?

Micron’s new P300 SSD and SSD longevity

Micron P300 (c) 2010 Micron Technology
Micron P300 (c) 2010 Micron Technology

Micron just announced a new SSD drive based on their 34nm SLC NAND technology with some pretty impressive performance numbers.  They used an independent organization, Calypso SSD testing, to supply the performance numbers:

  • Random Read 44,000 IO/sec
  • Random Writes 16,000 IO/sec
  • Sequential Read 360MB/sec
  • Sequential Write 255MB/sec

Even more impressive considering this performance was generated using SATA 6Gb/s and measuring after reaching “SNIA test specification – steady state” (see my post on SNIA’s new SSD performance test specification).

The new SATA 6Gb/s interface is a bit of a gamble but one can always use an interposer to support FC or SAS interfaces.  In addition,today many storage subsystems already support SATA drives so its interface may not even be an issue.  The P300 can easily support 3Gb/s SATA if that’s whats available and sequential performance suffers but random IOPs won’t be too impacted by interface speed.

The advantages of SATA 6Gb/sec is that it’s a simple interface and it costs less to implement than SAS or FC.  The downside is the loss of performance until 6Gb/sec SATA takes over enterprise storage.

P300’s SSD longevity

I have done many posts discussing SSDs and their longevity or write endurance but this is the first time I have heard any vendor describe drive longevity using “total bytes written” to a drive. Presumably this is a new SSD write endurance standard coming out of JEDEC but I was unable to find any reference to the standard definition.

In any case, the P300 comes in 50GB, 100GB and 200GB capacities and the 200GB drive has a “total bytes written” to the drive capability of 3.5PB with the smaller versions having proportionally lower longevity specs. For the 200GB drive, it’s almost 5 years of 10 complete full drive writes a day, every day of the year.  This seems enough from my perspective to put any SSD longevity considerations to rest.  Although at 255MB/sec sequential writes, the P300 can actually sustain ~10X that rate per day – assuming you never read any data back??

I am sure over provisioning, wear leveling and other techniques were used to attain this longevity. Nonetheless, whatever they did, the SSD market could use more of it.  At this level of SSD longevity the P300 could almost be used in a backup dedupe appliance, if there was need for the performance.

You may recall that Micron and Intel have a joint venture to produce NAND chips.  But the joint venture doesn’t include applications of their NAND technology.  This is why Intel has their own SSD products and why Micron has started to introduce their own products as well.

—–

So which would you rather see for an SSD longevity specification:

  • Drive MTBF
  • Total bytes written to the drive,
  • Total number of Programl/Erase cycles, or
  • Total drive lifetime, based on some (undefined) predicted write rate per day?

Personally I like total bytes written because it defines the drive reliability in terms everyone can readily understand but what do you think?

SNIA’s new SSD performance test specification

Western Digital's Silicon Edge Blue SSD SATA drive (from their website)
Western Digital's Silicon Edge Blue SSD SATA drive (from their website)

A couple of weeks ago SNIA just released a new version of their SSSI (SSD) performance test specification for public comment. Not sure if this is the first version out for public comment or not but I discussed a prior version in a presentation I did for SNW last October and I have blogged before about some of the mystery of measuring SSD performance.  The current version looks a lot more polished than what I had to deal with last year but the essence of the performance testing remains the same:

  • Purge test – using vendor approved process, purge (erase) all the data on the drive.
  • Preconditioning test  – Write 2X the capacity of the drive using 128KiB blocksizes and sequentially writing through the whole device’s usable address space.
  • Steady state testing – varying blocksizes, varying read-write ratios, varying block number ranges, looped until steady state is achieved in device performance.

The steady state testing runs a random I/O mix for a minutes duration at whatever the current specified blocksize, RW ratio and block number range.  Also, according to the specification the measurements for steady state are done once 4KiB block sizes and 100% Read Write settles down.  This steady state determinant testing must execute over a number of rounds (4?) then the other performance test runs are considered at “steady state”.

SNIA’s SSSI performance test benefits

Lets start by saying no performance test is perfect.  I can always find fault in any performance test, even my own.  Nevertheless, the SSSI new performance test goes a long way towards fixing some intrinsic problems with SSD performance measurement.  Specifically,

  • The need to discriminate between fresh out of the box (FOB) performance and ongoing drive performance.  The preconditioning test is obviously a compromise in attempting to do this but writing double the full capacity of a drive will take a long time and should cause every NAND cell in the user space to be overwritten.  Once is not enough to overwrite all the devices write buffers.   However three times the device’s capacity may still show some variance in performance but it will take correspondingly longer.
  • The need to show steady state SSD performance versus some peak value.  SSDs are notorious for showing differing performance over time. Partially this is due to FOB performance (see above) but mostly this is due to the complexity of managing NAND erasure and programming overhead.

The steady state performance problem is not nearly as much an issue with hard disk drives but even here, with defect skipping, drive performance will degrade over time (but a much longer time than for SSDs).  My main quibble with the test specification is how they elect to determine steady state – 4KiB with 100% read write seems a bit over simplified.

Is write some proportion of read IO needed to define SSD “steady state” performance?

[Most of the original version of this post centered on the need for some write component in steady state determination.  This was all due to my misreading the SNIA spec.  I now realize that the current spec calls for a 100% WRITE workload with 4KiB blocksizes to settle down to determine steady state.   While this may be overkill, it certainly is consistent with my original feelings that some proportion of write activity needs to be a prime determinant of SSD steady state.]

Most of my concern with how the test determines SSD steady state performance is that lack of write activity. One concern is the lack of read activity in determining steady state. My other worry with this approach is the blocksize seems a bit too small, however this is minor in comparison.

Let’s start with the fact that SSDs are by nature assymetrical devices.  By that I mean their write performance differs substantially from their read performance due to the underlying nature of the NAND technology.  But much of what distinguishes an enterprise SSD from a commercial drive is the sophistication of its write processing.  By using a 100% read rate we are undervaluing this sophistication.

But using 100% writes to test for steady state may be too much.

In addition, it’s It is hard for me to imagine any commercial or enterprise class device in service not having some high portion of ongoing write read IO activity.  I can easily be convinced that a normal R:W activity for an SSD device is somewhere between 90:10 and 50:50.  But I have a difficult time seeing an SSD R:W ratio of 100:0 0:100 as realistic.  And I feel any viable interpretation of device steady state performance needs to be based on realistic workloads.

In SNIA’s defense they had to pick some reproducible way to measure steady state.  Some devices may have had difficulty reaching steady state with any 100% write activity.  However, most other benchmarks have some sort of cut off that can be used to invalidate results.  Reaching steady state is one current criteria for SNIA’s SSSI performance test.  I just think adding some portion of write mix of read and write activity would be a better measure of SSD stability.

As for the 4KiB block size, it’s purely a question of what’s the most probable blocksize in the use of SSDs and  may vary for  enterprise or consumer applications.  But 4KiB seems a bit behind the times especially with todays 128GB and higher drives…

What do you think should SSD steady state need some portion of write mix of read and write activity or not?

[Thanks to Eden Kim and his team at SSSI for pointing out my spec reading error.]

WD’s new SiliconEdge Blue SSD data write spec

Western Digital's Silicon Edge Blue SSD SATA drive (from their website)
Western Digital's SiliconEdge Blue SSD SATA drive (from their website)

Western Digital (WD) announced their first SSD drive for the desktop/laptop market space today.  Their drive offers the typical256, 128, and 64GB capacity points over a SATA interface.  Performance looks ok at 5K random read or write IO/s with sustained transfers at 250 and 140MB/s for read and write respectively.  But what caught my eye was a new specification I hadn’t seen before indicating Maximum GB written per day of 17.5, 35 and 70GB/d for their drives using WD’s Operational Lifespan – LifeEST(tm) definition.

I couldn’t find anywhere that said which NAND technology was used in the device but it likely uses MLC NAND.  In a prior posting we discussed a Toshiba study that said a “typical” laptop user writes about 2.4GB/d and a “heavy” laptop user writes about 9.2GB/d.  This data would indicate that WD’s new 64GB drive can handle almost 2X the defined “heavy” user workload for laptops and their other drives would handle it just fine.  A data write rate for desktop work, as far as I can tell, has not been published, but presumably it would be greater than laptop users.

From my perspective more information on the drives underlying NAND technology, on what a LifeEST specification actually means, and a specification as to how much NAND storage was actually present would be nice, but these are all personal nits.  All that aside, I applaud WD for standing up and saying what data write rate their drives can support.  This needs to be a standard part of any SSD specification sheet and I look forward to seeing more information like this coming from other vendors as well.

Intel-Micron new 25nm/8GB MLC NAND chip

intel_and_micron_in_25nm_nand_technology
intel_and_micron_in_25nm_nand_technology

Intel-Micron Flash Technologies just issued another increase in NAND density. This one’s manages to put 8GB on a single chip with MLC(2) technology in a 167mm square package or roughly a half inch per side.

You may recall that Intel-Micron Flash Technologies (IMFT) is a joint venture between Intel and Micron to develop NAND technology chips. IMFT chips can be used by any vendor and typically show up in Intel SSDs as well as other vendor systems. MLC technology is more suitable for use in consumer applications but at these densities it’s starting to make sense for use by data centers as well. We have written before about MLC NAND used in the enterprise disk by STEC and Toshiba’s MLC SSDs. But in essence MLC NAND reliability and endurability will ultimately determine its place in the enterprise.

But at these densities, you can just throw more capacity at the problem to mask MLC endurance concerns. For example, with this latest chip, one could conceivably have a single layer 2.5″ configuration with almost 200GBs of MLC NAND. If you wanted to configure this as 128GB SSD you could use the additional 72GB of NAND for failing pages. Doing this could conceivably add more than 50% to the life of an SSD.

SLC still has better (~10X) endurance but being able to ship 2X the capacity in the same footprint can help.  Of course, MLC and SLC NAND can be combined in a hybrid device to give some approximation of SLC reliability at MLC costs.

IMFT made no mention of SLC NAND chips at the 25nm technology node but presumably this will be forthcoming shortly.  As such, if we assume the technology can support a 4GB SLC NAND in a 167mm**2 chip it should be of significant interest to most enterprise SSD vendors.

A couple of things missing from yesterday’s IMFT press release, namely

  • read/write performance specifications for the NAND chip
  • write endurance specifications for the NAND chip

SSD performance is normally a function of all the technology that surrounds the NAND chip but it all starts with the chip.  Also, MLC used to be capable of 10,000 write/erase cycles and SLC was capable of 100,000 w/e cycles but most recent technology from Toshiba (presumably 34nm technology) shows a MLC NAND write/erase endurance of only 1400 cycles.  Which seems to imply that as the NAND technology increases density write endurance rates degrade. How much is subject to much debate and with the lack of any standardized w/e endurance specifications and reporting, it’s hard to see how bad it gets.

The bottom line, capacity is great but we need to know w/e endurance to really see where this new technology fits.  Ultimately, if endurance degrades significantly such NAND technology will only be suitable for consumer products.  Of course at ~10X (just guessing) the size of the enterprise market maybe that’s ok.

Toshiba studies laptop write rates confirming SSD longevity

Toshiba's New 2.5" SSD from SSD.Toshiba.com
Toshiba's New 2.5in SSD from SSD.Toshiba.com

Today Toshiba announced a new series of SSD drives based on their 32NM MLC NAND technology. The new technology is interesting but what caught my eye was another part of their website, i.e., their SSD FAQs. We have talked about MLC NAND technology before and have discussed its inherent reliability limitations, but this is the first time I have seen some company discuss their reliability estimates so publicly. This was documented more in an IDC white paperon their site but the summary on the FAQ web page speaks to most of it.

Toshiba’s answer to the MLC write endurance question all revolves around how much data a laptop user writes per day which their study makes clear . Essentially, Toshiba assumes MLC NAND write endurance is 1,400 write/erase cycles and for their 64GB drive a user would have to write, on average, 22GB/day for 5 years before they would exceed the manufacturers warranty based on write endurance cycles alone.

Let’s see:

  • 5 years is ~1825 days
  • 22GB/day over 5 years would be over 40,000GB of data written
  • If we divide this by the 1400 MLC W/E cycle limits given above, that gives us something like 28.7 NAND pages could fail and yet still support write reliability.

Not sure what Toshiba’s MLC SSD supports for page size but it’s not unusual for SSDs to ship an additional 20% of capacity to over provision for write endurance and ECC. Given that 20% of 64GB is ~12.8GB, and it has to at least sustain ~28.7 NAND page failures, this puts Toshiba’s MLC NAND page at something like 512MB or ~4Gb which makes sense.

MLC vs, SLC write endurance from SSD.Toshiba.com
MLC vs, SLC write endurance from SSD.Toshiba.com

The not so surprising thing about this analysis is that as drive capacity goes up, write endurance concerns diminish because the amount of data that needs to be written daily goes up linearly with the capacity of the SSD. Toshiba’s latest drive announcements offer 64/128/256GB MLC SSDs for the mobile market.

Toshiba studies mobile users write activity

To come at their SSD reliability estimate from another direction, Toshiba’s laptop usage modeling study of over 237 mobile users showed the “typical” laptop user wrote an average of 2.4GB/day (with auto-save&hibernate on) and a “heavy” labtop user wrote 9.2GB/day under similar specifications. Now averages are well and good but to really put this into perspective one needs to know the workload variability. Nonetheless, their published results do put a rational upper bound on how much data typical laptop users write during a year that can then be used to compute (MLC) SSD drive reliability.

I must applaud Toshiba for publishing some of their mobile user study information to help us all better understand SSD reliability for this environment. It would have been better to see the complete study including all the statistics, when it was done, how users were selected, and it would have been really nice to see this study done by a standard’s body (say SNIA) rather than a manufacturer, but these are all personal nits.

Now, I can’t wait to see a study on write activity for the “heavy” enterprise data center environment, …

Seagate launches their Pulsar SSD

Seagate's Pulsar SSD (seagate.com)
Seagate's Pulsar SSD (seagate.com)

Today Seagate announced their new SSD offering, named the Pulsar SSD.  It uses SLC NAND technology and comes in a 2.5″ form factor at 50, 100 or 200GB capacity.  The fact that it uses a 3GB/s SATA interface seems to indicate that Seagate is going after the server market rather than the highend storage market place but different interfaces can be added over time.

Pulsar SSD performance

The main fact that makes the Pulsar interesting is the peak write rates at 25,000 4KB aligned writes per second versus a peak read rate of 30,000.  The ratio of peak reads to peak writes 30:25 represents a significant advance over prior SSDs and presumably this is through the magic of buffering.  But once we get beyond peak IO buffering sustained 128KB writes drops to 2600, 5300, or 10,500 ops/sec for the 50, 100, and 200GB drives respectively.  Kind of interesting that this drops as capacity drops and implies that adding capacity also adds parallelism. Sustained 4KB reads for the Pulsar is speced at 30,000.

In contrast, STEC’s Zeus drive is speced at 45,000 random reads and 15,000 random writes sustained and 80,000 peak reads and 40,000 peak writes.  So performance wise the Seagate Pulsar (200GB) SSD has about ~37% the peak read and ~63% the peak write performance with ~67% the sustained read with ~70% the sustained write performance of the Zeus drive.

Pulsar reliability

The other items of interest is that Seagate states a 0.44% annual failure rate (AFR), so for a 100 Pulsar drive storage subsystem one Pulsar drive will fail every 2.27 years.  Also the Pulsar bit error rate (BER) is specified at <10E16 new and <10E15 at end of life.  As far as I can tell both of these specifications are better than STEC’s specs for the Zeus drive.

Both the Zeus and Pulsar drives support a 5 year limited warranty.  But if the Pulsar is indeed a more reliable drive as indicated by their respective specifications, vendors may prefer the Pulsar as it would require less service.

All this seems to say that reliability may become a more important factor in vendor SSD selection. I suppose once you get beyond 10K read or write IOPs per drive, performance differences just don’t matter that much. But a BER of 10E14 vs 10E16 may make a significant difference to product service cost and as such, may justify changing SSD vendors much easier. Seems to be opening up a new front in the SSD wars – drive reliability

Now if they only offered 6GB/s SAS or 4GFC interfaces…

What’s happening with MRAM?

16Mb MRAM chips from Everspin
16Mb MRAM chips from Everspin

At the recent Flash Memory Summit there were a few announcements that show continued development of MRAM technology which can substitute for NAND or DRAM, has unlimited write cycles and is magnetism based. My interest in MRAM stems from its potential use as a substitute storage technology for today’s SSDs that use SLC and MLC NAND flash memory with much more limited write cycles.

MRAM has the potential to replace NAND SSD technology because of the speed of write (current prototypes write at 400Mhz or a few nanoseconds) and with the potential to go up to 1Ghz. At 400Mhz, MRAM is already much much faster than today’s NAND. And with no write limits, MRAM technology should be very appealing to most SSD vendors.

The problem with MRAM

The only problem is that current MRAM chips use 150nm chip design technology whereas today’s NAND ICs use 32nm chip design technology. All this means that current MRAM chips hold about 1/1000th the memory capacity of today’s NAND chips (16Mb MRAM from Everspin vs 16Gb NAND from multiple vendors). MRAM has to get on the same (chip) design node as NAND to make a significant play for storage intensive applications.

It’s encouraging that somebody at least is starting to manufacture MRAM chips rather than just being lab prototypes with this technology. From my perspective, it can only get better from here…