83: GreyBeards talk NVMeoF/TCP with Muli Ben-Yehuda, Co-founder & CTO and Kam Eshghi, VP Strategy & Bus. Dev., Lightbits Labs

This is the first time we’ve talked with Muli Ben-Yehuda (@Muliby), Co-founder & CTO and Kam Eshghi (@KamEshghi), VP of Strategy & Business Development, Lightbits Labs. Keith and I first saw them at Dell Tech World 2019, in Vegas as they are a Dell Ventures funded organization. The company has 70 (mostly engineering) employees and is based in Israel, with offices in NY and the Valley as well as elsewhere around the world. Kam was previously with (Dell) EMC DSSD and Muli’s spent years as a Master Inventor with IBM Research.

[This was Keith Townsend’s (@CTOAdvisor & The CTO Advisor), first time as a GreyBeard co-host and we had a great time with him on the show.]

I would have to say it was a far ranging discussion but focused on their software defined, NVMeoF/TCP storage. As you may recall we talked with Solarflare Communications last year who were also working on a NVMeoF/TCP, only in their case it was an accelerator board. After the recording, Muli said the hardware accelerator they have is their own design.

Why NVMeoF/TCP?

Most NVMeoF today, that uses Ethernet, requires RoCE or iWARP compatible NICs and switches. Lightbits Labs has long been active in the NVMeoF/RoCE-iWARP market place. Early on they noticed that enterprise and cloud service providers were reluctant to adopt NVMeoF technology because of the need to change out all their networking equipment to use it. This is what brought about their focus on NVMeoF/TCP.

The advantage of NVMeoF/TCP is that it can be run on any Ethernet NIC and switch available today. From Muli’s perspective, NVMeoF/TCP is going to become the next SAN of choice for the data center. They were active, early on, in the standards committee to push for NVMeoF/TCP adoption.

How does it work?

Their software defined solution runs LightOS® storage software, a Linux based package, and uses off the shelf, server hardware with persistent storage (Optane DC PM/SSDs, NV DIMMs, V-NAND, etc.). They use persistent memory for a FAST write buffer and a place where they can “mold” the written data into something that can be better written to backend NVMe SSDs.

One surprise about Lightbits solution is that it offers a decent set of data services. These include erasure coding, thin provisioning, wire-speed inline compression, QoS and wide striping. It seems like any of these can be disabled by a customers want. But they only add very little overhead. I think Muli mentioned one Lightbits customer with encrypted data that disabled compression.

Lightbits also offers a global FTL (flash translation layer), which means they control SSD addressing which maps data to physical/raw NAND locations at the storage system level. If done well, a global FTL can help improve flash endurance and may offer better write performance (through increased parallelism).

Lightbits claim to inline, wire speed data compression is premised on the use of more current CPUs with high (>=28) core counts in a storage server. If the storage server has older CPUs (<28 cores), they suggest you install their LightField™ hardware accelerator add in card. LightField offers a number of hardware based, performance accelerations in addition to compression speedups.

LightOS requires no host (client) software. Muli’s a long time Linux kernel contributor and indicated that the only thing LightOS needs is a current Linux Kernel (5.0 or later) which has the NVMeoF/TCP driver software (and persistent memory). Lightbits believes that it’s only a matter of time until other OSs also implement NVMeoF/TCP drivers.

Lightbits business considerations

Long term, Lightbits sees a need for compute-storage disaggregation in hyper scalar and enterprise cloud environments. Early on it was relatively easy to replicate servers with DAS storage but as NVMe SSDs came out the expense to do this throughout their >>1000 server environment starts to become exorbitant. If they only had an easy way to disaggregate their storage from compute and still enjoy all the performance advantages of DAS NVMe SSDS. With LightOS they can do that.

Lightbits can be sold today through Dell, as a partner solution, which means that Dell can integrate, test and validate their servers with LightField accelerator card and deliver that package to your data center. I believe you still need to purchase and install their LightOS software yourself.

Lightbits charges for LightOS software on a per storage node basis, but they have different charges based on the maximum number of NVMe SSD slots available is in a server. There is no capacity charge. They also offer worldwide service and support for LightOS software and LightField hardware.

It’s all about performance

From a performance perspective, one Fortune 500 hyper-scalar benchmarked their storage solution against a DAS NVMe server and found it added about 30 µsec to the IO latency as compare to DAS NVMe SSDs. From their perspective, the added data services, better endurance, and disaggregated compute-storage environment provided by LightOS more than made up for the additional overhead.

Finally, I asked about whether multiple LightOS storage servers could be clustered together. Muli intervened, after stating some legal stuff, said they were working on the next generation LightOS and it will support clustered storage servers, local data replication as well as distributed (across storage servers) erasure coding.

The podcast is a long one and runs over ~47 minutes. There was a lot to talk about and Kam and Muli seem to know it all. It was interesting to hear the history of their pivot to TCP. They seem to have the right technology to address the market. Listen to the podcast to learn more.

Muli Ben-Yehuda, Co-founder and CTO, Lightbits Labs

Muli Ben-Yehuda is the CTO and Co-Founder of Lightbits Labs, where he leads technological developments.

Prior to founding Lightbits, he was chief scientist at Stratoscale and a researcher and Master Inventor at IBM Research.

He holds an M.Sc. in Computer Science (summa cum laude) from the Technion — Israel Institute of Technology and a B.A. (cum laude) from the Open University of Israel.

He is a long time Linux kernel contributor and his code and ideas are most likely included in an operating system or hypervisor running near you. He is also one of the authors of the NVMe/TCP standard and technology. 

Kam Eshghi, VP Strategy & Business Development, Lightbits Labs

Kam joined Lightbits Labs from Dell EMC and has over 20yrs of experience in strategic marketing and business development with startups and public companies.

Most recently as VP of strategic alliances at startup DSSD, Kam led business development with technology partners and developed DSSD’s partnership with EMC, leading to EMC’s acquisition of DSSD.

Previously as Sr. Director of Marketing & Business Development at IDT, Kam built their NVMe Controller business from scratch. Previous to that, Kam worked in data center storage, compute and networking markets at HP, Intel, and Crosslayer Networks. 

Kam is a U.C. Berkeley and MIT graduate with a BS and MS in Electrical Engineering and Computer Science and an MBA.

Leave a Reply

Your email address will not be published. Required fields are marked *