Coho Data, the packet processing squeeze and working set exploits

Was at Coho Data this week with Storage Field Day 8 (SFD8) (see the videos here)  and we met with Andy Warfield (@andywarfield), CTO and Co-founder Coho Data. Last time we met (at SFD6) Andy talked at length about some enhancements they were working on and gave us a tutorial on HyperLogLog (HLL) data structures that can be used to identify application working sets.

Packet processing time is getting squeezed

 

IMG_5551

Andy’s always a joy to talk with and this time was no exception. Andy started out talking about the speed of networking and what it meant for network packet processing time. He showed a chart with network speeds on the horizontal axis and packet processing time (in nsec) on the vertical access. It was a log-log chart but it showed an exponential decay such that at 10GbE a system had 67.2nsec to process a packet, at 40GbE, it had 16.8nsec to process a packet and at 100GbE the system had 6.7nsec to process a single packet. He was leading up to explaining why “storage datapaths are like network datapaths in hell”.

Similar performance dynamics are impacting storage device processing. In this case, NVMe PCIe flash devices are becoming processing bound.

Andy showed a chart for 4K random reads, plotting the number of cores on the bottom against K-IOPS on the vertical axis. At about 4 cores with one P3700 Intel PCIe NVMe card, the IOP performance of the storage system (as measured for IMG_5552 (1)NIC throughput) flattened out, from that point on, even after doubling the number of cores. It turns out with just one Intel P3700 NVMe PCIe flash card and 4 core Xeon processors one can quickly max out IOs across a 40GbE network, even though there’s plenty of networking bandwidth still available. Of course, this situation becomes much worse  with the new XPoint NVM which is 1000X faster than NAND, coming out next year from Micron-Intel (subject for a future post as Intel was another SFD8 presenter).

Andy also made the point that as a component of a system increases in cost, software usually tries to improve its utilization. This dynamic is now occurring for PCIe flash cards, which generally make up about 50% of the cost of a storage controller complex.

Location, location, location, …

Net Net, (network forwarding decision) storage data transfer time is shrinking as data placement times are becoming longer. By that I think he means that determining where to place data in the storage hierarchy is becoming more complex, taking more processing cycles just when we have less time to make those decisions.

So the crux of the question is how do we make those decisions better. Coho Data has attacked this problem by implementing HLLs to better identify application working sets.

IMG_5557Last year (See prior post for more info on HLLs) Coho Data had just started working with HLL technology and hadn’t fully implemented their working set analytics. But this year, Andy displayed an On-Stream reporting service treemap chart (where rectangle size indicates relative size of a parameter) that indicated an application’s cache working set size.

Using working set history to improve IO

By using a time series of properly implemented HLLs together with snapshots of working set block information, Coho Data can tell how the working set changes over time for an application or VM. Andy showed an example of application working set size changes over the course of multiple days and each day in the evening there was a giant spike in working set size. This turned out to be backup scans.

So Coho Data could then go back and snapshot the working set information before and after the spike to see if it was different. Once it was determined to be different, they then could go further and re-apply the working set cache data prior to the spike (not sure if this is implemented just yet) after the backup scan to re-warm the workload data cache. Of course, this meant that the system would have  read all this data back into cache. But doing so would leave the application’s data location optimized for upcoming IO activity.

This was just one example of what Coho Data could do to make a better data placement decision and improve the applications IO performance. Neat stuff, if you ask me.

Can’t wait until next year to see what Coho Data is working on next.

Comments?

12 Replies to “Coho Data, the packet processing squeeze and working set exploits”

Comments are closed.