Why SSD performance is a mystery?

SSDs! :) by gimpbully (cc) (from flickr)
SSDs! 🙂 by gimpbully (cc) (from flickr)

SSD and/or SSS (solid state storage) performance is a mystery to most end-users. The technology is inherently asymmetrical, i.e., it reads much faster than it writes. I have written on some of these topics before (STEC’s new MLC drive, Toshiba’s MLC flash, Tape V Disk V SSD V RAM) but the issue is much more complex when you put these devices behind storage subsystems or in client servers.

Some items that need to be considered when measuring SSD/SSS performance include:

  • Is this a new or used SSD?
  • What R:W ratio will we use?
  • What blocksize should be used?
  • Do we use sequential or random I/O?
  • What block inter-reference interval should be used?

This list is necessarily incomplete but it’s representative of the sort of things that should be considered to measure SSD/SSS performance.

New device or pre-conditioned

Hard drives show little performance difference whether new or pre-owned, defect skips notwithstanding. In contrast, SSDs/SSSs can perform very differently when they are new versus when they have been used for a short period depending on their internal architecture. A new SSD can write without erasure throughout it’s entire memory address space but sooner or later wear leveling must kick in to equalize the use of the device’s NAND memory blocks. Wear leveling causes both reads and rewrites of data during it’s processing. Such activity takes bandwidth and controller processing away from normal IO. If you have a new device it may take days or weeks of activity (depending on how fast you write) to attain the device’s steady state where each write causes some sort of wear leveling activity.

R:W Ratio

Historically, hard drives have had slightly slower write seeks than reads, due to the need to be more accurately positioned to write data than to read it. As such, it might take .5msec longer to write than to read 4K bytes. But for SSDs the problem is much more acute, e.g. read times can be in microseconds while write times can almost be in milliseconds for some SSDs/SSSs. This is due to the nature of NAND flash, having to erase a block before it can be programmed (written) and the programming process taking a lot’s longer than a read.

So the question for measuring SSD performance is what read to write (R:W) ratio to use. Historically a R:W of 2:1 was used to simulate enterprise environments but most devices are starting to see more like 1:1 for enterprise applications due to the caching and buffering provided by controllers and host memory. I can’t speak as well for desktop environments but it wouldn’t surprise me to see 2:1 used to simulate desktop workloads as well.

SSDs operate a lot faster if their workload is 1000:1 than for 1:1 workloads. Most SSD data sheets tout a significant read I/O rate but only for 100% read workloads. This is like a subsystem vendor quoting a 100% read cache hit rate (which some do) but is unusual in the real world of storage.

Blocksize to use

Hard drives are not insensitive to blocksizes, as blocks can potentially span tracks which will require track-to-track seeks to be read or written. However, SSDs can also have some adverse interaction with varying blocksizes. This is dependent on the internal SSD architecture and is due to over optimizing write performance.

With an SSD, you erase a block of NAND and write a page or sector of NAND at a time. As writes takes much longer than reads, many SSD vendors add parallelism to improve write throughput. Parallelism writes or programs multiple sectors at the same time. Thus, if your blocksize is an integral multiple of the multi-sector size written performance is great, if not, performance can suffer.

In all honesty, similar issues exist with hard drive sector sizes. If your blocksize is an integral multiple of the drive sector size then performance is great, if not too bad. In contrast to SSDs, drive sector size is often configurable at the device level.

Sequential vs. random IO

Hard drives perform sequential IO much better than random IO. For SSDs this is not much of a problem, as once wear leveling kicks in, it’s all random to the NAND flash. So when comparing hard drives to SSDs the level of sequentiality is a critical parameter to control.

Cache hit rate

The block inter-reference interval is simply measures how often the same block is re-referenced. This is important for caching devices and systems because it ultimately determines the cache hit rate (reading data directly from cache instead of the device storage). Hard drives have onboard cache of 8 to 32MB today. SSD drives also have a DRAM cache for data buffering and other uses. SSDs typically publicize their cache size so in order to insure 0 cache hits one needs an block inter-reference interval close to the device’s capacity. Not a problem today with 146GB devices but as they move to 300GB and larger it becomes more of a problem to completely characterize device performance.

The future

So how do we get a handle on SSD performance? SNIA and others are working on a specification on how to measure SSD performance that will one day become a standard. When the standard is available we will have benchmarks and service groups that can run these benchmarks to validate SSD vendor performance claims. Until then – caveat emptor.

Of course most end users would claim that device performance is not as important as (sub)system performance which is another matter entirely…

What’s happening with MRAM?

16Mb MRAM chips from Everspin
16Mb MRAM chips from Everspin

At the recent Flash Memory Summit there were a few announcements that show continued development of MRAM technology which can substitute for NAND or DRAM, has unlimited write cycles and is magnetism based. My interest in MRAM stems from its potential use as a substitute storage technology for today’s SSDs that use SLC and MLC NAND flash memory with much more limited write cycles.

MRAM has the potential to replace NAND SSD technology because of the speed of write (current prototypes write at 400Mhz or a few nanoseconds) and with the potential to go up to 1Ghz. At 400Mhz, MRAM is already much much faster than today’s NAND. And with no write limits, MRAM technology should be very appealing to most SSD vendors.

The problem with MRAM

The only problem is that current MRAM chips use 150nm chip design technology whereas today’s NAND ICs use 32nm chip design technology. All this means that current MRAM chips hold about 1/1000th the memory capacity of today’s NAND chips (16Mb MRAM from Everspin vs 16Gb NAND from multiple vendors). MRAM has to get on the same (chip) design node as NAND to make a significant play for storage intensive applications.

It’s encouraging that somebody at least is starting to manufacture MRAM chips rather than just being lab prototypes with this technology. From my perspective, it can only get better from here…

Testing storage systems – Intel’s SSD fix

Intel’s latest (35nm NAND) SSD shipments were halted today because a problem was identified when modifying BIOS passwords (see IT PRO story). At least they gave a timeframe for a fix – a couple of weeks.

The real question is can products be tested sufficiently these days to insure they work in the field. Many companies today will ship product to end-user beta testers to work out the bugs before the product reaches the field. But beta-testing has got to be complemented with active product testing and validation. As such, unless you plan to get 100s or perhaps 1000s of beta testers you could have a serious problem with field deployment.

And therein lies the problem, software products are relatively cheap and easy to beta test, just set up a download site and have at it. But with hardware products beta testing actually involves sending product to end-users which costs quite a bit more $’s to support. So I understand why Intel might be having problems with field deployment.

So if you can’t beta test hardware products as easily as software – then you have to have a much more effective test process. Functional testing and validation is more of an art than a science and can cost significant $’s and more importantly, time. All of which brings us back to some form of beta testing.

Perhaps Intel could use their own employees as beta testers rotating new hardware products from one organization to another, over time to get some variability in the use of a new product. Many companies use their new product hardware extensively in their own data centers to validate functionality prior to shipment. In the case of Intel’s SSD drives these could be placed in the in-numberable servers/desktops that Intel no-doubt has throughout it’s corporation.

One can argue whether beta testing takes longer than extensive functional testing. However given today’s diverse deployments, I believe beta testing can be a more cost effective process when done well.

Intel is probably trying to figure out just what went wrong in their overall testing process today. I am sure, given their history, they will do better next time.