The session was led off by Doc D’Errico (@docderrico) but CTO, Brian Carmody (@initzero) did most of the talking and Erik Kaulberg (@ekaulberg) discussed some of their new products. I have known Doc, Erik and Brian for years and all of them are industry heavyweights, having worked at major and startup storage companies for decades.
The challenge that Infinidat has is how to perform as well as (or better than) an all flash array when you have hybrid flash – disk storage.
The advantage of spinning disk is that it’s relatively cheap ($/GB) storage with good throughput, great reliability and reasonable volumetric density (GB/mm**3). However, its random read access time is much, two orders of magnitude worse than flash (~10 msec vs. ~100 µsec).
Fortunately, DRAM has a random access time of ~100 nsec which is 3 orders of magnitude better than flash. A manager I had once said everyone wants their data stored on tape but accessed out of memory.
So the problem comes down to insuring that application data is sitting in DRAM (cache) when requested. This is called a read (cache) hit. [Write hits are easier because they are typically written directly into memory and later destaged. So essentially all writes are cache hits.]
Mainframes vs. open systems
In the old days with MVS on mainframes, read hit rates of ~70%+ were considered good and doable. Over time MVS and z/OS, its followon, started providing hints about what IO’s were coming next. With mainframe hints, storage systems started hitting ~90% read hit rates.
Open systems never seemed to come close to those hit rates. A typical open system hit rate was ~50% for well behaved applications. For VMware and it’s IO mixer, a 50% read hit rate was aspirational and hardly ever achieved in reality. This has improved over time, but nothing comes close to 90% read hit rates over extended periods.
So, for a storage systems in open system environments to average an 90% DRAM read hit cache hit rate is unheard of, and only seen for brief intervals at best, with especially well behaved applications and not under virtualization. For a customer to see an average DRAM hit rate exceeding 90%, over the course of multiple days was inconceivable,.
And yet that’s exactly what Brian’s showing in the photo above, an average over 1 week of over 95% DRAM read hit rate over a whole week, for an major retail/e-commerce/ERP customer during one of the heaviest, if not the heaviest activity weeks of the year, The data set size was 1.5PB.
How is this possible, what does Infinidat do differently to predict which data applications need moment to moment, over the course of the heaviest retail week in the year.
Infinidat’s Neural caching algorithm
It all starts with writes. When data is written to Infinidat, it records a terrain map of all the other data that has been written recently. I suppose one could think of this as a 2 dimensional map, with spots on the map being the equivalent of data in the storage system that have recently been written. This map changes over time so it’s more like a movie stream of frames showing, from frame to frame all the recently written data in the system at any point in time. Of course the frame rate for this stream is the IO rate.
When a IO request comes in for a specific record, Infinidat uses an index to locate the last time (frame) the record was written, and using this snapshot of all data written at the same time, it reads into cache all the other data written at that time.
This additional data could be read from disk or SSD. Read throughput is slightly better for flash over disk but not orders of magnitude.
In any case, any system could do this sort of caching algorithm, iff they had the processing power needed, had the metadata layout which made recording the IO stream frame by frame space efficient, had the metadata indexing which would enable them to locate the last frame a record was written in AND had the IO parallelism required to do a whole lot of IO all the time to keep that DRAM cache filled with hit candidates. Did I mention that Infinidat uses a three controller storage system, unlike the rest of the industry that uses a two controller system. This gives them 50% more horse power and data paths to get data into cache.
Brian goes into some depth on Neural caching implementation but there’s plenty of secret sauce behind it.
Somewhere in his presentation, Brian stated that across their entire customer base (~3.4EB @ end of last quarter), they average a 90% read (DRAM) cache hit rate, inconceivable in the old days, nigh impossible today. Of course it only gets better from here.
If a hybrid system can continuously average a 95% read cache hit rate for a customer over weeks of IO, there’s no reason that system couldn’t outperform an AFA, even an NVMeoF AFA storage system.
I suppose cache hit rates like these could be application dependent but they didn’t seem to say anything about specific verticals they were targeting. And at 3+ EB it doesn’t appear to be application specific.
For more information you may want to see these other SFD16 participants write ups on Infinidat: