All flash storage performance testing

There are some serious problems with measuring IO performance of all flash arrays with what we use on disk storage systems. Mostly, these are due to the inherent differences between current flash- and disk-based storage.

NAND garbage collection

First off, garbage collection is required by any SSD or NAND storage to be able to write data. Garbage collection coalesces free space by moving non-modified data to new pages/blocks and freeing up the space held by old, no-longer current data.

The problem is NAND garbage collection takes place only after a suitable amount of write activity and measuring all-flash array storage system performance without taking into account garbage collection is misleading at best and dishonest at worse.

The only way to control for garbage collection is to write lots of data to a all-flash storage system and measure its performance over a protracted period of time. How long this takes is dependent on the amount of storage in an all flash array but filling it up to 75% of its capacity and then measuring IO performance as you fill up another 10-15% of its capacity with new data should suffice. Of course this would all have to be done consecutively, without any time off between runs (which would allow garbage collection to sneak in).

Flash data reduction

Second, many all flash arrays offer data reduction like data compression or deduplication. Standard IO benchmarks today don’t control for data reduction.

What we need is a standard corpus of reducible data for an IO workload. Such data would need to be able to be data compressed and data deduplicated. Unclear where such a data corpus could be found but one is needed to properly measure all flash system performance. What would help is some real world data reduction statistics, from a large number of customer installations that could help identify what real-world dedup and compression ratios look like. Then we could use these statistics to construct a suitable data load that can then be scaled and tailored to required performance needs.

Perhaps SNIA or maybe a big (government) customer could support the creation of this data corpus that can be used for “standard” performance testing. With real world statistics and a suitable data corpus, standard IO benchmarks could control for data reduction on flash arrays and better measure system performance.

Block IO differences

Third, block heat maps (access patterns) need to become much more realistic. For disk based systems it was important to randomize IO stream to minimize the advantage of DRAM caching. But with all flash storage arrays, cache is less useful and because flash can’t be rewritten in place, having IO occur to the same block (especially overwrites) causes NAND page fragmentation and more NAND write overhead.


Only by controlling for garbage collection, using a standard, data reducible data load and returning to a cache friendly (or at least write cache friendly) workload we will truly understand all flash storage performance.


Thanks to Larry Freeman (@Larry_Freeman) for the idea for today’s post.

Photo Credit(s): Race Faces by Jerome Rauckman