61: GreyBeards talk composable storage infrastructure with Taufik Ma, CEO, Attala Systems

In this episode,  we talk with Taufik Ma, CEO, Attala Systems (@AttalaSystems). Howard had met Taufik at last year’s FlashMemorySummit (FMS17) and was intrigued by their architecture which he thought was a harbinger of future trends in storage. The fact that Attala Systems was innovating with new, proprietary hardware made an interesting discussion, in its own right, from my perspective.

Taufik’s worked at startups and major hardware vendors in his past life and seems to have always been at the intersection of breakthrough solutions using hardware technology.

Attala Systems is based out of San Jose, CA.  Taufik has a class A team of executives, engineers and advisors making history again, this time in storage with JBoFs and NVMeoF.

Ray’s written about JBoF (just a bunch of flash) before (see  FaceBook moving to JBoF post). This is essentially a hardware box, filled with lots of flash storage and drive interfaces that directly connects to servers. Attala Systems storage is JBOF on steroids.

Composable Storage Infrastructure™

Essentially, their composable storage infrastructure JBOF connects with NVMeoF (NVMe over Fabric) using Ethernet to provide direct host access to  NVMe SSDs. They have implemented special purpose, proprietary hardware in the form of an FPGA, using this in a proprietary host network adapter (HNA) to support their NVMeoF storage.

Their HNA has a host side and a storage side version, both utilizing Attala Systems proprietary FPGA(s). With Attala HNAs they have implemented their own NVMeoF over UDP stack in hardware. It supports multi-path IO and highly available dual- or single-ported, NVMe SSDs in a storage shelf. They use standard RDMA capable Ethernet 25-50-100GbE (read Mellanox) switches to connect hosts to storage JBoFs.

They also support RDMA over Converged Ethernet (RoCE) NICS for additional host access. However I believe this requires host (NVMeoF) (their NVMeoY over UDP stack) software to connect to their storage.

From the host, Attala Systems storage on HNAs, looks like directly attached NVMe SSDs. Only they’re hot pluggable and physically located across an Ethernet network. In fact, Taufik mentioned that they already support VMware vSphere servers accessing Attala Systems composable storage infrastructure.

Okay on to the good stuff. Taufik said they measured their overhead and it was able to perform an IO with only an additional 5 µsec of overhead over native NVMe SSD latencies. Current NVMe SSDs operate with a response time of from 90 to 100 µsecs, and with Attala Systems Composable Storage Infrastructure, this means you should see 95 to 105 µsec response times over a JBoF(s) full of NVMe SSDs! Taufik said with Intel Optane SSD’s 10 µsec response times, they see response times at ~16 µsec (the extra µsec seems to be network switch delay)!!

Managing composable storage infrastructure

They also use a management “entity” (running on a server or as a VM),  that’s used to manage their JBoF storage and configure NVMe Namespaces (like a SCSI LUN/Volume).  Hosts use NVMe NameSpaces to access and split out the JBoF  NVMe storage space. That is, multiple Attala Systems Namespaces can be configured over a single NVMe SSD, each one corresponding to a single  (virtual to real) host NVMe SSD.

The management entity has a GUI but it just uses their RESTful APIs. They also support QoS on an IOPs or bandwidth limiting basis for Namespaces, to control manage noisy neighbors.

Attala systems architected their management system to support scale out storage. This means they could support many JBoFs in a rack and possibly multiple racks of JBoFs connected to swarms of servers. And nothing was said that would limit the number of Attala storage system JBoFs attached to a single server or under a single (dual for HA) management  entity. I thought the software may have a problem with this (e.g., 256 NVMe (NameSpaces) SSDs PCIe connected to the same server) but Taufik said this isn’t a problem for modern OS.

Taufik mentioned that with their RESTful APIs,  namespaces can be quickly created and torn down, on the fly. They envision their composable storage infrastructure to be a great complement to cloud compute and container execution environments.

For storage hardware, they use storage shelfs from OEM vendors. One recent configuration from Supermicro has hot-pluggable, dual ported, 32 NVMe slots in a 1U chasis, which at todays ~16TB capacities, is ~1/2PB of raw flash. Taufik mentioned 32TB NVMe SSDs are being worked on as we speak. Imagine that 1PB of flash NVMe SSD storage in 1U!!

The podcast runs ~47 minutes. Taufik took a while to get warmed up but once he got going, my jaw dropped away.  Listen to the podcast to learn more.

Taufik Ma, CEO Attala Systems

Tech-savvy business executive with track record of commercializing disruptive data center technologies.  After a short stint as an engineer at Intel after college, Taufik jumped to the business side where he led a team to define Intel’s crown jewels – CPUs & chipsets – during the ascendancy of the x86 server platform.

He honed his business skills as Co-GM of Intel’s Server System BU before leaving for a storage/networking startup.  The acquisition of this startup put him into the executive team of Emulex where as SVP of product management, he grew their networking business from scratch to deliver the industry’s first million units of 10Gb Ethernet product.

These accomplishments draw from his ability to engage and acquire customers at all stages of product maturity including partners when necessary.

42: GreyBeards talk next gen, tier 0 flash storage with Zivan Ori, CEO & Co-founder E8 Storage.

In this episode, we talk with Zivan Ori (@ZivanOri), CEO and Co-founder of E8 Storage, a new storage startup out of Israel. E8 Storage provides a tier 0, next generation all flash array storage solution for HPC and high end environments that need extremely high IO performance, with high availability and modest data services. We first saw E8 Storage at last years Flash Memory Summit (FMS 2016) and have wanted to talk with them since.

Tier 0 storage

The Greybeards discussed new tier 0 solutions in our annual yearend industry review podcast. As we saw it then, tier 0 provides lightening fast (~100s of µsec) read and write IO operations and millions of IO/sec. There are not a lot of applications that need this level of speed and quantity of IOs but for those that do, Tier 0 storage is their only solution.

In the past Tier 0, was essentially SSDs sitting on a PCIe bus, isolated to a single server. But today, with the emergence of NVMe protocols and SSDs, 40/50/100GBE NICs and switches and RDMA  protocols, this sort of solution can be shared across from racks of servers.

There were a few shared Tier 0 solutions available in the past but their challenge was that they all used proprietary hardware. With today’s new hardware and protocols, these new Tier 0 systems often perform as good or much better than the old generation but with off the shelf hardware.

E8 came to the market (emerged out of stealth and GA’d in September of 2016) after NVMe protocols, SSDs and RDMA were available in commodity hardware and have taken advantage of all these new capabilities.

E8 Storage system hardware & software

E8 Storage offers a 2U HA appliance with 24, hot-pluggable NVMe SSDs connected to it and support 8 client or host ports. The  hardware appliance has two controllers, two power supplies, and two batteries. The batteries are used to hold up a DRAM write cache until it can be flushed to internal storage for power failures. They don’t do any DRAM read caching because the performance off the NVMe SSDs is more than fast enough.

The 24 NVMe SSDs are all dual ported for fault tolerance and provide hot-pluggable replacement for better servicing in the field. One E8 Storage system can supply up to 180TB of usable, shared NVMe flash storage.

E8 Storage uses RDMA (RoCE) NICs between client servers and their storage system, which support 40GBE, 50GBE or 100GBE networking.

E8 does not do data reduction (thin provisioning, data deduplication or data compression) on their storage, so usable capacity = effective capacity.  Their belief is that these services consume a lot of compute/IO limiting IO/sec and increasing response times and as the price of NVMe SSD capacity is coming down over time these activities become less useful.

They also have client software that provides a fault tolerant initiator for their E8 storage. This client software supports MPIO and failover across controllers in the event of a controller outage. The client software currently runs on just about any flavor of Linux available today and E8 is working to port this to other OSs based on customer requests.

Storage provisioning and management is through a RESTful API, CLI or web based GUI management portal. Hardware support is supplied by E8 Storage and they offer a 3 year warranty on their system with the ability to extend this to 5 years, if needed.

One problem with today’s standard NVMe over Fabric solutions is that they lack any failover capabilities and really have no support for data protection. By developing their own client software, E8 provides fault tolerance and data protection for Tier 0 storage. They currently supported RAID 0 and 5 for E8 Storage and RAID 6 is in development.

Performance

Everyone wants native DAS-NVMe SSD storage but unlike server Tier 0 solutions, E8 Storage’s 180TB of NVMe capacity can be shared across up to 100 servers (currently have 96 servers talking to a single E8 Storage appliance at one customer).  By moving this capacity out to a shared storage device it can be be made more fault tolerant, more serviceable and be amortized over more servers. However the problem with doing this has always been the lack of DAS like performance.

Talking to Zivan, he revealed that a single E8 Storage service was capable of 5M IO/sec, and at that rate, the system delivers an average response time of  300µsec and for a more reasonable 4M IO/sec, the system can deliver ~120µsec response times. He said they can saturate a 100GBE network by operating at 10M IO/sec. He didn’t say what the response time was at 10M IO/sec but with network saturation, response times probably went exponentially higher.

The other thing that Zivan mentioned was that the system delivered these response times with a very small variance (standard deviation). I believe he mentioned 1.5 to 3% standard deviations which at 120µsec is 18 to 36µsec and even at 300µsec its 45 to 90µsec. We have never see this level of response times, response time variance and IO/sec in a single shared storage system before.

E8 Storage

Zivan and many of his team previously came from IBM XIV storage. As such, they have  been involved in developing and supporting enterprise class storage systems for quite awhile now. So, E8 Storage knows what it takes to create products that can survive in 7X24, high end, highly active and demanding environments.

E8 Storage currently has customers in production in the US. They are seeing primary interest  in their system from the HPC, FinServ, and Retail industries but any large customers could have the need for something like this.  They sell their storage for from $2 to $3/GB.

The podcast runs ~42 minutes, and Zivan was easy to talk with and has a good grasp of the storage industry technologies.  Listen to the podcast to learn more.

Zivan Ori CEO & Co-Founder, E8 Storage

Mr. Zivan Ori is the co-founder and CEO of E8 Storage. Before founding E8 Storage, Mr. Ori held the position of IBM XIV R&D Manager, responsible for developing the IBM XIV high-end, grid-scale storage system, and served as Chief Architect at Stratoscale, a provider of hyper-converged infrastructure.

Prior to IBM XIV, Mr. Ori headed Software Development at Envara (acquired by Intel) and served as VP R&D at Onigma (acquired by McAfee).