
A couple of weeks ago SNIA just released a new version of their SSSI (SSD) performance test specification for public comment. Not sure if this is the first version out for public comment or not but I discussed a prior version in a presentation I did for SNW last October and I have blogged before about some of the mystery of measuring SSD performance. The current version looks a lot more polished than what I had to deal with last year but the essence of the performance testing remains the same:
- Purge test – using vendor approved process, purge (erase) all the data on the drive.
- Preconditioning test – Write 2X the capacity of the drive using 128KiB blocksizes and sequentially writing through the whole device’s usable address space.
- Steady state testing – varying blocksizes, varying read-write ratios, varying block number ranges, looped until steady state is achieved in device performance.
The steady state testing runs a random I/O mix for a minutes duration at whatever the current specified blocksize, RW ratio and block number range. Also, according to the specification the measurements for steady state are done once 4KiB block sizes and 100% Read Write settles down. This steady state determinant testing must execute over a number of rounds (4?) then the other performance test runs are considered at “steady state”.
SNIA’s SSSI performance test benefits
Lets start by saying no performance test is perfect. I can always find fault in any performance test, even my own. Nevertheless, the SSSI new performance test goes a long way towards fixing some intrinsic problems with SSD performance measurement. Specifically,
- The need to discriminate between fresh out of the box (FOB) performance and ongoing drive performance. The preconditioning test is obviously a compromise in attempting to do this but writing double the full capacity of a drive will take a long time and should cause every NAND cell in the user space to be overwritten. Once is not enough to overwrite all the devices write buffers. However three times the device’s capacity may still show some variance in performance but it will take correspondingly longer.
- The need to show steady state SSD performance versus some peak value. SSDs are notorious for showing differing performance over time. Partially this is due to FOB performance (see above) but mostly this is due to the complexity of managing NAND erasure and programming overhead.
The steady state performance problem is not nearly as much an issue with hard disk drives but even here, with defect skipping, drive performance will degrade over time (but a much longer time than for SSDs). My main quibble with the test specification is how they elect to determine steady state – 4KiB with 100% read write seems a bit over simplified.
Is write some proportion of read IO needed to define SSD “steady state” performance?
[Most of the original version of this post centered on the need for some write component in steady state determination. This was all due to my misreading the SNIA spec. I now realize that the current spec calls for a 100% WRITE workload with 4KiB blocksizes to settle down to determine steady state. While this may be overkill, it certainly is consistent with my original feelings that some proportion of write activity needs to be a prime determinant of SSD steady state.]
Most of my concern with how the test determines SSD steady state performance is that lack of write activity. One concern is the lack of read activity in determining steady state. My other worry with this approach is the blocksize seems a bit too small, however this is minor in comparison.
Let’s start with the fact that SSDs are by nature assymetrical devices. By that I mean their write performance differs substantially from their read performance due to the underlying nature of the NAND technology. But much of what distinguishes an enterprise SSD from a commercial drive is the sophistication of its write processing. By using a 100% read rate we are undervaluing this sophistication.
But using 100% writes to test for steady state may be too much.
In addition, it’s It is hard for me to imagine any commercial or enterprise class device in service not having some high portion of ongoing write read IO activity. I can easily be convinced that a normal R:W activity for an SSD device is somewhere between 90:10 and 50:50. But I have a difficult time seeing an SSD R:W ratio of 100:0 0:100 as realistic. And I feel any viable interpretation of device steady state performance needs to be based on realistic workloads.
In SNIA’s defense they had to pick some reproducible way to measure steady state. Some devices may have had difficulty reaching steady state with any 100% write activity. However, most other benchmarks have some sort of cut off that can be used to invalidate results. Reaching steady state is one current criteria for SNIA’s SSSI performance test. I just think adding some portion of write mix of read and write activity would be a better measure of SSD stability.
As for the 4KiB block size, it’s purely a question of what’s the most probable blocksize in the use of SSDs and may vary for enterprise or consumer applications. But 4KiB seems a bit behind the times especially with todays 128GB and higher drives…
What do you think should SSD steady state need some portion of write mix of read and write activity or not?
[Thanks to Eden Kim and his team at SSSI for pointing out my spec reading error.]