Today Seagate announced their new SSD offering, named the Pulsar SSD. It uses SLC NAND technology and comes in a 2.5″ form factor at 50, 100 or 200GB capacity. The fact that it uses a 3GB/s SATA interface seems to indicate that Seagate is going after the server market rather than the highend storage market place but different interfaces can be added over time.
Pulsar SSD performance
The main fact that makes the Pulsar interesting is the peak write rates at 25,000 4KB aligned writes per second versus a peak read rate of 30,000. The ratio of peak reads to peak writes 30:25 represents a significant advance over prior SSDs and presumably this is through the magic of buffering. But once we get beyond peak IO buffering sustained 128KB writes drops to 2600, 5300, or 10,500 ops/sec for the 50, 100, and 200GB drives respectively. Kind of interesting that this drops as capacity drops and implies that adding capacity also adds parallelism. Sustained 4KB reads for the Pulsar is speced at 30,000.
In contrast, STEC’s Zeus drive is speced at 45,000 random reads and 15,000 random writes sustained and 80,000 peak reads and 40,000 peak writes. So performance wise the Seagate Pulsar (200GB) SSD has about ~37% the peak read and ~63% the peak write performance with ~67% the sustained read with ~70% the sustained write performance of the Zeus drive.
The other items of interest is that Seagate states a 0.44% annual failure rate (AFR), so for a 100 Pulsar drive storage subsystem one Pulsar drive will fail every 2.27 years. Also the Pulsar bit error rate (BER) is specified at <10E16 new and <10E15 at end of life. As far as I can tell both of these specifications are better than STEC’s specs for the Zeus drive.
Both the Zeus and Pulsar drives support a 5 year limited warranty. But if the Pulsar is indeed a more reliable drive as indicated by their respective specifications, vendors may prefer the Pulsar as it would require less service.
All this seems to say that reliability may become a more important factor in vendor SSD selection. I suppose once you get beyond 10K read or write IOPs per drive, performance differences just don’t matter that much. But a BER of 10E14 vs 10E16 may make a significant difference to product service cost and as such, may justify changing SSD vendors much easier. Seems to be opening up a new front in the SSD wars – drive reliability
Now if they only offered 6GB/s SAS or 4GFC interfaces…
SSD and/or SSS (solid state storage) performance is a mystery to most end-users. The technology is inherently asymmetrical, i.e., it reads much faster than it writes. I have written on some of these topics before (STEC’s new MLC drive, Toshiba’s MLC flash, Tape V Disk V SSD V RAM) but the issue is much more complex when you put these devices behind storage subsystems or in client servers.
Some items that need to be considered when measuring SSD/SSS performance include:
Is this a new or used SSD?
What R:W ratio will we use?
What blocksize should be used?
Do we use sequential or random I/O?
What block inter-reference interval should be used?
This list is necessarily incomplete but it’s representative of the sort of things that should be considered to measure SSD/SSS performance.
New device or pre-conditioned
Hard drives show little performance difference whether new or pre-owned, defect skips notwithstanding. In contrast, SSDs/SSSs can perform very differently when they are new versus when they have been used for a short period depending on their internal architecture. A new SSD can write without erasure throughout it’s entire memory address space but sooner or later wear leveling must kick in to equalize the use of the device’s NAND memory blocks. Wear leveling causes both reads and rewrites of data during it’s processing. Such activity takes bandwidth and controller processing away from normal IO. If you have a new device it may take days or weeks of activity (depending on how fast you write) to attain the device’s steady state where each write causes some sort of wear leveling activity.
Historically, hard drives have had slightly slower write seeks than reads, due to the need to be more accurately positioned to write data than to read it. As such, it might take .5msec longer to write than to read 4K bytes. But for SSDs the problem is much more acute, e.g. read times can be in microseconds while write times can almost be in milliseconds for some SSDs/SSSs. This is due to the nature of NAND flash, having to erase a block before it can be programmed (written) and the programming process taking a lot’s longer than a read.
So the question for measuring SSD performance is what read to write (R:W) ratio to use. Historically a R:W of 2:1 was used to simulate enterprise environments but most devices are starting to see more like 1:1 for enterprise applications due to the caching and buffering provided by controllers and host memory. I can’t speak as well for desktop environments but it wouldn’t surprise me to see 2:1 used to simulate desktop workloads as well.
SSDs operate a lot faster if their workload is 1000:1 than for 1:1 workloads. Most SSD data sheets tout a significant read I/O rate but only for 100% read workloads. This is like a subsystem vendor quoting a 100% read cache hit rate (which some do) but is unusual in the real world of storage.
Blocksize to use
Hard drives are not insensitive to blocksizes, as blocks can potentially span tracks which will require track-to-track seeks to be read or written. However, SSDs can also have some adverse interaction with varying blocksizes. This is dependent on the internal SSD architecture and is due to over optimizing write performance.
With an SSD, you erase a block of NAND and write a page or sector of NAND at a time. As writes takes much longer than reads, many SSD vendors add parallelism to improve write throughput. Parallelism writes or programs multiple sectors at the same time. Thus, if your blocksize is an integral multiple of the multi-sector size written performance is great, if not, performance can suffer.
In all honesty, similar issues exist with hard drive sector sizes. If your blocksize is an integral multiple of the drive sector size then performance is great, if not too bad. In contrast to SSDs, drive sector size is often configurable at the device level.
Sequential vs. random IO
Hard drives perform sequential IO much better than random IO. For SSDs this is not much of a problem, as once wear leveling kicks in, it’s all random to the NAND flash. So when comparing hard drives to SSDs the level of sequentiality is a critical parameter to control.
Cache hit rate
The block inter-reference interval is simply measures how often the same block is re-referenced. This is important for caching devices and systems because it ultimately determines the cache hit rate (reading data directly from cache instead of the device storage). Hard drives have onboard cache of 8 to 32MB today. SSD drives also have a DRAM cache for data buffering and other uses. SSDs typically publicize their cache size so in order to insure 0 cache hits one needs an block inter-reference interval close to the device’s capacity. Not a problem today with 146GB devices but as they move to 300GB and larger it becomes more of a problem to completely characterize device performance.
So how do we get a handle on SSD performance? SNIA and others are working on a specification on how to measure SSD performance that will one day become a standard. When the standard is available we will have benchmarks and service groups that can run these benchmarks to validate SSD vendor performance claims. Until then – caveat emptor.
Of course most end users would claim that device performance is not as important as (sub)system performance which is another matter entirely…
I haven’t seen much of a specification on STEC’s new enterprise MLC SSD but it should be interesting. So far everything I have seen seems to indicate that it’s a pure MLC drive with no SLC NAND. This is difficult for me to believe but could easily be cleared up by STEC or their specifications. Most likely it’s a hybrid SLC-MLC drive similar, at least from the NAND technology perspective, to FusionIO’s SSD drive.
MLC write endurance issue
My difficulty with a pure MLC enterprise drive is the write endurance factor. MLC NAND can only endure around 10,000 erase/program passes before it starts losing data. With a hybrid SLC-MLC design one could have the heavy write data go to SLC NAND which has a 100,000 erase/program pass lifecycle and have the less heavy write data go to MLC. Sort of like a storage subsystem “fast write” which writes to cache first and then destages to disk but in this case the destage may never happen if the data is written often enough.
The only flaw in this argument is that as the SSD drives get bigger (STEC’s drive is available supporting up to 800GB) this becomes less of an issue. Because with more raw storage the fact that a small portion of the data is very actively written gets swamped by the fact that there is plenty of storage to hold this data. As such, when one NAND cell gets close to its lifetime another, younger cell can be used. This process is called wear leveling. STEC’s current SLC Zeus drive already has sophisticated wear leveling to deal with this sort of problem with SLC SSDs and doing this for MLCs just means having larger tables to work with.
I guess at some point, with multi-TB per drives, the fact that MLC cannot sustain more than 10,000 erase/write passes becomes moot. Because there just isn’t that much actively written data out there in an enterprise shop. When you amortize the portion of highly written data as a percentage of a drive, the more drive capacity, the smaller the active data percentages become. As such, as SSD drive capacities gets larger this becomes less of an issue. I figure with 800GB drives, active data proportion might still be high enough to cause a problem but it might not be an issue at all.
Of course with MLC it’s also cheaper to over provision NAND storage to also help with write endurance. For an 800GB MLC SSD, you could easily add another 160GB (20% over provisioning) fairly cheaply. As such, over provisioning will also allow you to sustain an overall drive write endurance that is much higher than the individual NAND write endurance.
Another solution to the write endurance problem is to increase the power of ECC to handle write failures. This would probably take some additional engineering and may or may not be in the latest STEC MLC drive but it would make sense.
The other issue about MLC NAND is that it has slower read and erase/program cycle times. Now these are still order’s of magnitude faster than standard disk but slower than SLC NAND. For enterprise applications SLC SSDs are blistering fast and are often performance limited by the subsystem they are attached to. So, the fact that MLC SSDs are somewhat slower than SLC SSDs may not even be percieved by enterprise shops.
MLC performance is slower because it takes longer to read a cell with multiple bits in it than it takes with just one. MLC, in one technology I am aware of, encodes 2-bits in the voltage that is programmed in or read out from a cell, e.g., VoltageA = “00”, VoltageB=”01″, VoltageC=”10″, and VoltageD=”11″. This gets more complex with 3 or more bits per cell but the logic holds. With multiple voltages, determining which voltage level is present is more complex for MLC and hence, takes longer to perform.
In the end I would expect STEC’s latest drive to be some sort of SLC-MLC hybrid but I could be wrong. It’s certainly possible that STEC have gone with just an MLC drive and beefed up the capacity, over provisioning, ECC, and wear leveling algorithms to handle its lack of write endurance
MLC takes over the world
But the major issue with using MLC in SSDs is that MLC technology is driving the NAND market. All those items in the photo above are most probably using MLC NAND, if not today then certainly tomorrow. As such, the consumer market will be driving MLC NAND manufacturing volumes way above anything the SLC market requires. Such volumes will ultimately make it unaffordable to manufacture/use any other type of NAND, namely SLC in most applications, including SSDs.
So sooner or later all SSDs will be using only MLC NAND technology. I guess the sooner we all learn to live with that the better for all of us.