83: GreyBeards talk NVMeoF/TCP with Muli Ben-Yehuda, Co-founder & CTO and Kam Eshghi, VP Strategy & Bus. Dev., Lightbits Labs

This is the first time we’ve talked with Muli Ben-Yehuda (@Muliby), Co-founder & CTO and Kam Eshghi (@KamEshghi), VP of Strategy & Business Development, Lightbits Labs. Keith and I first saw them at Dell Tech World 2019, in Vegas as they are a Dell Ventures funded organization. The company has 70 (mostly engineering) employees and is based in Israel, with offices in NY and the Valley as well as elsewhere around the world. Kam was previously with (Dell) EMC DSSD and Muli’s spent years as a Master Inventor with IBM Research.

[This was Keith Townsend’s (@CTOAdvisor & The CTO Advisor), first time as a GreyBeard co-host and we had a great time with him on the show.]

I would have to say it was a far ranging discussion but focused on their software defined, NVMeoF/TCP storage. As you may recall we talked with Solarflare Communications last year who were also working on a NVMeoF/TCP, only in their case it was an accelerator board. After the recording, Muli said the hardware accelerator they have is their own design.

Why NVMeoF/TCP?

Most NVMeoF today, that uses Ethernet, requires RoCE or iWARP compatible NICs and switches. Lightbits Labs has long been active in the NVMeoF/RoCE-iWARP market place. Early on they noticed that enterprise and cloud service providers were reluctant to adopt NVMeoF technology because of the need to change out all their networking equipment to use it. This is what brought about their focus on NVMeoF/TCP.

The advantage of NVMeoF/TCP is that it can be run on any Ethernet NIC and switch available today. From Muli’s perspective, NVMeoF/TCP is going to become the next SAN of choice for the data center. They were active, early on, in the standards committee to push for NVMeoF/TCP adoption.

How does it work?

Their software defined solution runs LightOS® storage software, a Linux based package, and uses off the shelf, server hardware with persistent storage (Optane DC PM/SSDs, NV DIMMs, V-NAND, etc.). They use persistent memory for a FAST write buffer and a place where they can “mold” the written data into something that can be better written to backend NVMe SSDs.

One surprise about Lightbits solution is that it offers a decent set of data services. These include erasure coding, thin provisioning, wire-speed inline compression, QoS and wide striping. It seems like any of these can be disabled by a customers want. But they only add very little overhead. I think Muli mentioned one Lightbits customer with encrypted data that disabled compression.

Lightbits also offers a global FTL (flash translation layer), which means they control SSD addressing which maps data to physical/raw NAND locations at the storage system level. If done well, a global FTL can help improve flash endurance and may offer better write performance (through increased parallelism).

Lightbits claim to inline, wire speed data compression is premised on the use of more current CPUs with high (>=28) core counts in a storage server. If the storage server has older CPUs (<28 cores), they suggest you install their LightField™ hardware accelerator add in card. LightField offers a number of hardware based, performance accelerations in addition to compression speedups.

LightOS requires no host (client) software. Muli’s a long time Linux kernel contributor and indicated that the only thing LightOS needs is a current Linux Kernel (5.0 or later) which has the NVMeoF/TCP driver software (and persistent memory). Lightbits believes that it’s only a matter of time until other OSs also implement NVMeoF/TCP drivers.

Lightbits business considerations

Long term, Lightbits sees a need for compute-storage disaggregation in hyper scalar and enterprise cloud environments. Early on it was relatively easy to replicate servers with DAS storage but as NVMe SSDs came out the expense to do this throughout their >>1000 server environment starts to become exorbitant. If they only had an easy way to disaggregate their storage from compute and still enjoy all the performance advantages of DAS NVMe SSDS. With LightOS they can do that.

Lightbits can be sold today through Dell, as a partner solution, which means that Dell can integrate, test and validate their servers with LightField accelerator card and deliver that package to your data center. I believe you still need to purchase and install their LightOS software yourself.

Lightbits charges for LightOS software on a per storage node basis, but they have different charges based on the maximum number of NVMe SSD slots available is in a server. There is no capacity charge. They also offer worldwide service and support for LightOS software and LightField hardware.

It’s all about performance

From a performance perspective, one Fortune 500 hyper-scalar benchmarked their storage solution against a DAS NVMe server and found it added about 30 µsec to the IO latency as compare to DAS NVMe SSDs. From their perspective, the added data services, better endurance, and disaggregated compute-storage environment provided by LightOS more than made up for the additional overhead.

Finally, I asked about whether multiple LightOS storage servers could be clustered together. Muli intervened, after stating some legal stuff, said they were working on the next generation LightOS and it will support clustered storage servers, local data replication as well as distributed (across storage servers) erasure coding.

The podcast is a long one and runs over ~47 minutes. There was a lot to talk about and Kam and Muli seem to know it all. It was interesting to hear the history of their pivot to TCP. They seem to have the right technology to address the market. Listen to the podcast to learn more.

Muli Ben-Yehuda, Co-founder and CTO, Lightbits Labs

Muli Ben-Yehuda is the CTO and Co-Founder of Lightbits Labs, where he leads technological developments.

Prior to founding Lightbits, he was chief scientist at Stratoscale and a researcher and Master Inventor at IBM Research.

He holds an M.Sc. in Computer Science (summa cum laude) from the Technion — Israel Institute of Technology and a B.A. (cum laude) from the Open University of Israel.

He is a long time Linux kernel contributor and his code and ideas are most likely included in an operating system or hypervisor running near you. He is also one of the authors of the NVMe/TCP standard and technology. 

Kam Eshghi, VP Strategy & Business Development, Lightbits Labs

Kam joined Lightbits Labs from Dell EMC and has over 20yrs of experience in strategic marketing and business development with startups and public companies.

Most recently as VP of strategic alliances at startup DSSD, Kam led business development with technology partners and developed DSSD’s partnership with EMC, leading to EMC’s acquisition of DSSD.

Previously as Sr. Director of Marketing & Business Development at IDT, Kam built their NVMe Controller business from scratch. Previous to that, Kam worked in data center storage, compute and networking markets at HP, Intel, and Crosslayer Networks. 

Kam is a U.C. Berkeley and MIT graduate with a BS and MS in Electrical Engineering and Computer Science and an MBA.

78: GreyBeards YE2018 IT industry wrap-up podcast

In this, our yearend industry wrap up episode, we discuss trends and technology impacting the IT industry in 2018 and what we can see ahead for 2019 and first up is NVMeoF

NVMeoF has matured

In the prior years, NVMeoF was coming from startups, but last year it’s major vendors like IBM FlashSystem, Dell EMC PowerMAX and NetApp AFF releasing new NVMeoF storage systems. Pure Storage was arguably earliest with their NVMeoF JBOF.

Dell EMC, IBM and NetApp were not far behind this curve and no doubt see it as an easy way to reduce response time without having to rip and replace enterprise fabric infrastructure.

In addition, NVMeoFstandards have finally started to stabilize. With the gang of startups, standards weren’t as much of an issue as they were more than willing to lead, ahead of standards. But major storage vendors prefer to follow behind standards committees.

As another example, VMware showed off an NVMeoF JBOF for vSAN. A JBoF like this improves vSAN storage efficiency for small clusters. Howard described how this works but with vSAN having direct access to shared storage, it can reduce data and server protection requirements for storage. Especially, when dealing with small clusters of servers becoming more popular these days to host application clusters.

The other thing about NVMeoF storage is that NVMe SSDs have also become very popular. We are seeing them come out in everyone’s servers and storage systems. Servers (and storage systems) hosting 24 NVMe SSDs is just not that unusual anymore. For the price of a PCIe switch, one can have blazingly fast, direct access to a TBs of NVMe SSD storage.

HCI reaches critical mass

HCI has also moved out of the shadows. We recently heard news thet HCI is outselling CI. Howard and I attribute this to the advances made in VMware’s vSAN 6.2 and the appliance-ification of HCI. That and we suppose NVMe SSDs (see above).

HCI makes an awful lot of sense for application clusters that VMware is touting these days. CI was easy but an HCI appliance cluster is much, simpler to deploy and manage

For VMware HCI, vSAN Ready Nodes are available from just about any server vendor in existence. With ready nodes, VARs and distributors can offer an HCI appliance in the channel, just like the majors. Yes, it’s not the same as a vendor supplied appliance, doesn’t have the same level of software or service integration, but it’s enough.

[If you want to learn more, Howard’s is doing a series of deep dive webinars/classes on HCI as part of his friend’s Ivan’s ipSpace.net. The 1st 2hr session was recorded 11 December, part 2 goes live 22 January, and the final installment on 5 February. The 1st session is available on demand to subscribers. Sign up here]

Computional storage finally makes sense

Howard and I 1st saw computational storage at FMS18 and we did a podcast with Scott Shadley of NGD systems. Computational storage is an SSD with spare ARM cores and DRAM that can be used to run any storage intensive, Linux application or Docker container.

Because it’s running in the SSD, it has (even faster than NVMe) lightening fast access to all the data on the SSD. Indeed, And the with 10s to 1000s of computational storage SSDs in a rack, each with multiple ARM cores, means you can have many 1000s of cores available to perform your data intensive processing. Almost like GPUs only for IO access to storage (SPUs?).

We tried this at one vendor in the 90s, executing some database and backup services outboard but it never took off. Then in the last couple of years (Dell) EMC had some VM services that you could run on their midrange systems. But that didn’t seem to take off either.

The computational storage we’ve seen all run Linux. And with todays data intensive applications coming from everywhere these days, and all the spare processing power in SSDs, it might finally make sense.

Futures

Finally, we turned to what we see coming in 2019. Howard was at an Intel Analyst event where they discussed Optane DIMMs. Our last podcast of 2018 was with Brian Bulkowski of Aerospike who discussed what Optane DIMMs will mean for high performance database systems and just about any memory intensive server application. For example, affordable, 6TB memory servers will be coming out shortly. What you can do with 6TB of memory is another question….

Howard Marks, Founder and Chief Scientist, DeepStorage

Howard Marks is the Founder and Chief Scientist of DeepStorage, a prominent blogger at Deep Storage Blog and can be found on twitter @DeepStorageNet.

Raymond Lucchesi, Founder and President, Silverton Consulting

Ray Lucchesi is the President and Founder of Silverton Consulting, a prominent blogger at RayOnStorage.com, and can be found on twitter @RayLucchesi. Signup for SCI’s free, monthly e-newsletter here.