107: GreyBeards talk MinIO’s support of VMware’s new Data Persistence Platform with AB Periasamy, CEO MinIO

Sponsored by:

The GreyBeards have talked with Anand Babu (AB) Periasamy (@ABPeriasamy), CEO MinIO, before (see 097: GreyBeards talk open source S3… episode). And we also saw him earlier this year, at their headquarters for Storage Field Day 19 (SFD19) where AB gave a great discussion of what they were doing and how it worked (see MinIO’s SFD18 presentation videos).

The podcast runs ~26 minutes. AB is very technically astute and always a delight to talk with. He’s extremely knowledgeable about the cloud, containerized applications and high performing S3 compatible object storage. And now with MinIO and vSAN Data Persistence under VCF Tanzu, very knowledgeable about the virtualized IT environment as well. Listen to the podcast to learn more. [We’re trying out a new format placing the podcast up front. Let us know what you think; The Eds.]


VMware VCF vSAN Data Persistence Platform with MinIO

Earlier this month VMware announced a new capability available with the next updates of vSAN, vSphere & VCF called the vSAN Data Persistence Platform. The Data Persistence Platform is a VMware framework designed to integrate stateful, independent vendor software defined storage services in vSphere. By doing so, VCF can provide API access to persistent storage services for containerized applications running under Tanzu Kubernetes (k8s) Grid service clusters.

At the announcement, VMware identified three object storage and one (Cassandra) database technical partners that had been integrated with the solution.  MinIO was an object storage, open source partner.

VMware’s VCF vSAN Data Persistence framework allows vCenter administrators to use vSphere cluster infrastructure to configure and deploy these new stateful storage services, like MinIO, into namespaces and enables app developers direct k8s API access to these storage namespaces to provide persistent, stateful object storage for applications. 

With VCF Tanzu and the vSAN Data Persistence Platform using MinIO, dev can have full support for their CiCd pipeline using native k8s tools to deploy and scale containerized apps on prem, in the public cloud and in hybrid cloud, all using VCF vSphere.

MinIO on the Data Persistence Platform

AB said MinIO with Data Persistence takes advantage of a new capability called vSAN Direct which gives vSAN almost JBOF types of IO control and performance. With MinIO vSAN Direct, storage and k8s cluster applications can co-reside on the same ESX node hardware so that IO activity doesn’t have to hop off host to be performed. In addition, can now populate ESX server nodes with lots (100s to 1000s?) of storage devices and be assured the storage will be used by applications running on that host.

As a result, MinIO’s object storage IO performance on VCF Tanzu is very good due to its use of vSAN Direct and MinIO’s inherent superior IO performance for S3 compatible object storage.

With MinIO on the VCF vSAN Data Persistence Platform, VMware takes over all the work of deploying MinIO software services on the VCF cluster. This way customers can take advantage of MiniO’s fully compatible S3 object storage system operating in their VCF cluster. For app developers they get the best of all worlds, infrastructure configured, deployed and managed by admins but completely controllable, scaleable and accessible through k8s API services.

If developers want to take advantage of MinIO specialized services such as data security or replication, they can do so directly using MinIOs APIs, just like they would when operating bare metal or in the cloud.

AB said the VMware development team was very responsive during development of Data Persistence. AB was surprised to see such a big company, like VMware, operate with almost startup like responsiveness. Keith mentioned he’s seen this in action as vSAN has matured very rapidly to a point of almost feature parity, with just about any storage system out there today .

With MinIO object storage, container applications that need PB of data, now have a home on VCF Tanzu. And it’s as easily usable as any public cloud storage. And with VCF Tanzu configuring and deploying the storage over its own infrastructure, and then having it all managed and administered by vCenter admins, its simple to create and use PB of object storage.

MinIO is already the most popular S3 compatible object storage provider for applications running in the cloud and on prem. And VMware is easily the most popular virtualization platform on the planet. Now with the two together on VCF Tanzu, there seems to be nothing in the way of conquering containerized applications running in IT as well.

With that, MinIO is available everywhere containers want to run, natively available in the cloud, on prem and hybrid cloud or running with VCF Tanzu everywhere as well.


AB Periasamy, CEO MinIO

AB Periasamy is the CEO and co-founder of MinIO. One of the leading thinkers and technologists in the open source software movement,

AB was a co-founder and CTO of GlusterFS which was acquired by RedHat in 2011. Following the acquisition, he served in the office of the CTO at RedHat prior to founding MinIO in late 2015.

AB is an active angel investor and serves on the board of H2O.ai and the Free Software Foundation of India.

He earned his BE in Computer Science and Engineering from Annamalai University.


This image has an empty alt attribute; its file name is Subscribe_on_iTunes_Badge_US-UK_110x40_0824.png
This image has an empty alt attribute; its file name is play_prism_hlock_2x-300x64.png
This image has an empty alt attribute; its file name is Spotify_Logo_CMYK_Black-1024x307.png


103: GreyBeards talk scale-out file and cloud data with Molly Presley & Ben Gitenstein, Qumulo

Sponsored by:

Ray has known Molly Presley (@Molly_J_Presley), Head of Global Product Marketing for just about a decade now and we both just met Ben Gitenstein (@Qumulo_Product), VP of Products & Solutions, Qumulo on this podcast. Both Molly and Ben were very knowledgeable about the problems customers have with massive data troves.

Molly has been on our podcast before (with another company, see: GreyBeards talk HPC storage with Molly Rector, CMO & EVP, DDN ). And we have talked with Qumulo before as well (see: GreyBeards talk data-aware, scale-out file systems with Peter Godman, Co-founder & CEO, Qumulo ).

Qumulo has a long history of dealing with customer issues with data center application access to data, usually large data repositories, with billions of small or large files, they have accumulated over time. But recently Qumulo has taken on similar problems in the cloud as well.

Qumulo’s secret has always been to allow researchers to run their applications wherever their data resides. This has led Qumulo’s software defined storage to offer multiple protocol access as well as a completely native, AWS and GCP cloud version of their solution.

That way customers can run Qumulo in their data center or in the cloud and have the same great access to data. Molly mentioned one customer that creates and gathers data using SMB protocol on prem and then, after replication, processes it in the cloud.

Qumulo Shift

Ben mentioned that many competitive storage systems are business model focused. That is they are all about keeping customer data within their solutions so they can charge for capacity. Although Qumulo also charges for capacity, with the new Qumulo Shift service, customer can easily move data off Qumulo and into native cloud storage. Using Shift, customers can free up Qumulo storage space (and cost) for any data that only needs to be accessed as objects.

With Shift, customers can replicate or move on prem or in the cloud Qumulo file data to AWS S3 objects. Once in S3, customers can access it with AWS native applications, other applications that make use of AWS S3 data, or can have that data be accessible around the world.

Qumulo customers can select directories to Shift to an AWS S3 bucket. The Qumulo directory name will be mapped to a S3 bucket name and each file in that directory will be copied to an S3 object in that bucket with the same file name.

At the moment, Qumulo Shift only supports AWS S3. Over time, Qumulo plans to offer support for other public cloud storage targets for Shift.

Shift is based on Qumulo replication services. Qumulo has a number of patents on replication technology that provides for sophisticated monitoring, control and high performance for moving vast amounts of data.

How customers use Shift

One large customer uses Qumulo cloud file services to process seismic data but then makes the results of that analysis available to other clients as S3 objects.

Customers can also take advantage of AWS and other applications that support objects only. For example, AWS SageMaker Machine Learning (ML) processes S3 object data. Qumulo customers could gather training data as files and Shift it to S3 objects for ML training.

Moreover, customers can use Shift to create AWS S3 object backups, archives and DR repositories of Qumulo file data. Ben mentioned DevOps could also use Qumulo Shift via APIs to move file data to S3 objects as part of new application deployment.

Finally, using Shift to copy or move file data to AWS S3, makes it ideal for collaboration by researchers, analysts and just about other entity that needs access to data.

The podcast ran ~26 minutes. Molly has always been easy to talk with and Ben turned out also to be easy to talk with and knew an awful lot about the product and how customers can use it. Keith and I enjoyed our time with Molly and Ben discussing Qumulo and their new Shift service. Listen to the podcast to learn more.

This image has an empty alt attribute; its file name is Spotify_Logo_CMYK_Black-1024x307.png
This image has an empty alt attribute; its file name is Subscribe_on_iTunes_Badge_US-UK_110x40_0824.png
This image has an empty alt attribute; its file name is play_prism_hlock_2x-300x64.png

Ben Gitenstein, VP of Products and Solutions, Qumulo

Ben Gitenstein runs Product at Qumulo. He and his team of product managers and data scientists have conducted nearly 1,000 interviews with storage users and analyzed millions of data points to understand customer needs and the direction of the storage market.

Prior to working at Qumulo, Ben spent five years at Microsoft, where he split his time between Corporate Strategy and Product Planning.

Molly Presley, Head of Global Product Marketing, Qumulo

Molly Presley joined Qumulo in 2018 and leads worldwide product marketing. Molly brings over 15 years of file system and archive technology leadership experience to the role.

Prior to Qumulo, Molly held executive product and marketing leadership roles at Quantum, DataDirect Networks (DDN) and Spectra Logic.

Presley also created the term “Active Archive”, founded the Active Archive Alliance and has served on the Board of the Storage Networking Industry Association (SNIA).

097: GreyBeards talk open source S3 object store with AB Periasamy, CEO MinIO

Ray was at SFD19 a few weeks ago and the last session of the week (usually dead) was with MinIO and they just blew us away (see videos of MinIO’s session here). Ray thought Anand Babu (AB) Periasamy (@ABPeriasamy), CEO MinIO, who was the main presenter at the session, would be a great invite for our GreyBeards podcast. Keith and I had a ball talking with AB.

Why object store

There’s something afoot in object storage space over the last year or so. It seems everybody is looking to deploy object store whether that be on prem, in CoLo facilities and in the cloud. It could be just the mass of data coming online but that trend has remained the same for years no. No it’s something else.

It all starts with AWS and S3. Over the last couple of years AWS has been rolling out new functionality that only works with S3 and this has been driving even more adoption of S3 as well as other object storage solutions.

S3 compatible object stores are available in just about every cloud service, available from major (and minor) storage vendors and in open source from MinIO.

Why S3 is so popular

Because object store is accessed via RestFUL interfaces, traditionally most implementations used their own API to access it. But when AWS created S3 (simple storage service) with their own API/SDK to access it, it somehow became the de-facto standard interface for all other object stores. S3 compatibility became a significant feature that all object stores had to support.

Sometime after that MinIO came into existence. MinIO provides a 100% open source, fully AWS S3 compatible object store that you can run anywhere on prem, in CoLo facilities and indeed in the cloud. In fact, there exist customers that run MinIO in AWS AB says this is probably just customers using a packaged software solution which happens to include MinIO but it’s nonetheless more expensive than AWS S3 as it uses EC2 instances and EBS storage to create an object store

Customers can access MinIO object stores with the AWS S3 SDK or the MinIO SDK. and you can access AWS S3 storage with AWS S3 SDK or use MinIO SDK. Occosionally, AWS S3 updates have broken MinIO’s SDK but these have been later fixed by AWS. It seems AWS and MinIO are on good terms.

AB mentioned that as customers get up to a few PBs of AWS S3 storage they often find the costs to be too high. It’s at this point that they start looking at other object storage solutions. But because MinIO is 100% S3 compatible and it’s open source many of these customers deploy it in their own data center facilities or in colo environments.

For those customers that want it, MinIO also offers an S3 gateway. With the gateway on prem customers can use S3 or standard file services to access S3 object storage located in the cloud. The gateway also works in the public cloud and can support both AWS s3 as well as Microsoft Blob storage as a backend.

MinIO matches AWS S3 features

AWS S3 has a number of great features and MinIO has matched or exceeded them all, step by step. AWS S3 has cross region replication options where customers can replicate S3 data from one region to another. MinIO supports both asynchronous replication of S3 data and synchronous replication (using RADIO).

But MinIO adds support for erasure coding within a fault domain. Default is Nx2 erasure coding which duplicates all your data so as long as 1/2 of your servers and storage are available you continue to have access to all your data. But this can be configured down like 12+4 where data is split accross 16 servers any four of which can fail and you can still access data.

AWS customers can use a Snowball (standalone storage device) to transfer data to or from S3 storage. AWS Snowball implements a subset of S3 API and requires a NAS staging area of equivalent size to migrate data out of S3. MinIO has support for Snowball’s limited S3 API and as such, Snowball’s can be used to migrate data into or out of MinIO. MinIO has a blog post which describes their support for AWS Snowball.

AWS also offers S3 Lambda services or server less computing services where compute services can be invoked when data is loaded in a bucket and then turned off when no longer needed. AWS Lambda depends on AWS messaging and other services to work properly. But MinIO supports Lambda like functionality using other open source services. AB mentions MQTT and Kafka services. MinIO has another blog post discussing their Lambda like services based on Kafka.

AWS recently implemented Snowflake a SQL database server for unstructured data that uses S3 storage to hold data. Ray and Keith almost choked on that statement as unstructured data and databases never used to be uttered in the same breath. But what AWS has shown was that you can use object store for database data as long as you are willing to load the table into memory and process it there and then unload any modified table data back into the object store. Indexing of the object data seems to be done as the data is being loaded and is also being done in a (random IO) cache or in memory and once done can also be unloaded into the object store.

Now Snowflake uses S3 but it’s not available on prem. MinIO has a number of data base partners that make use of their object store as a backend to host a Snowflake like service onprem. AB mentioned Spark and Splunk but there are others as well.

We ended up the discussion with what does it mean to have 20K stars on GitHub. AB said if you did a java script getting 20K stars would be easy but you just don’t see this sort of open source popularity for storage systems. He said the number is interesting but the growth rate is even more interesting.

The podcast runs ~47 minutes. AB was a great to talk tech with. Keith and I could have talked all afternoon with AB. It was very hard to stop the recording as we could have talked with him for another hour or more. AB said he doesn’t like to do podcasts or videos but he had no problem with us firing away questions. Listen to the podcast to learn more.

This image has an empty alt attribute; its file name is Subscribe_on_iTunes_Badge_US-UK_110x40_0824.png
This image has an empty alt attribute; its file name is play_prism_hlock_2x-300x64.png

Anand Babu Periasamy, CEO MinIO

AB Periasamy is the CEO and co-founder of MinIO. One of the leading thinkers and technologists in the open source software movement, AB was a co-founder and CTO of GlusterFS which was acquired by RedHat in 2011. Following the acquisition, he served in the office of the CTO at RedHat prior to founding MinIO in late 2015. AB is an active angel investor and serves on the board of H2O.ai and the Free Software Foundation of India.

He earned his BE in Computer Science and Engineering from Annamalai University.

90: GreyBeards talk K8s containers storage with Michael Ferranti, VP Product Marketing, Portworx

At VMworld2019 USA there was a lot of talk about integrating Kubernetes (K8s) into vSphere’s execution stack and operational model. We had heard that Portworx was a leader in K8s storage services or persistent volume support and thought it might be instructive to hear from Michael Ferranti (@ferrantiM), VP of Product Marketing at Portworx about just what they do for K8s container apps and their need for state information.

Early on Michael worked for RackSpace in their SaaS team and over time saw how developers and system engineers just loved container apps. But they had great difficulty using them for mission critical applications and containers of the time had a complete lack of support for storage. Michael joined Portworx to help address these and other limitations in using containers for mission critical workloads.

Portworx is essentially a SAN, specifically designed for containers. It’s a software defined storage system that creates a cluster of storage nodes across K8s clusters and provides standard storage services on a container level granularity.

As a software defined storage system, Portworx is right in the middle of the data path, storage they must provide high availability, RAID protection and other standard storage system capabilities. But we talked only a little about basic storage functionality on the podcast.

Portworx was designed from the start to work for containers, so it can easily handle provisioning and de-provisioning, 100s to 1000s of volumes without breaking a sweat. Not many storage systems, software defined or not, can handle this level of operations and not impact storage services.

Portworx supports both synchronous and asynchronous (snapshot based) replication solutions. As all synchronous replication, system write performance is dependent on how far apart the storage nodes are, but it can provide RPO=0 (recovery point objective) for mission critical container applications.

Portworx takes this another step beyond just data replication. They also replicate container configuration (YAML) files. We’re no experts but YAML files contain an encapsulation of everything needed to understand how to run containers and container apps in a K8s cluster. When one combines replicated container YAML files, replicated persistent volume data AND an appropriate external registry, one can start running your mission critical container apps at a disaster site in minutes.

Their asynchronous replication for container data and configuration files, uses Portworx snapshots , which are sent to an alternate site. But they also support asynch replication to any S3 compatible storage via CloudSnap.

Portworx also supports KubeMotion, which replicates/copies name spaces, container app volume data and container configuration YAML files from one K8s cluster to another. This way customers can move their K8s namespaces and container apps to any other Portworx K8s cluster site. This works across on prem K8s clusters, cloud K8s clusters, between public cloud provider K8s clusters s or between on prem and cloud K8s clusters.

Michael also mentioned that data at rest encryption, for Portworx, is merely a tick box on a storage class specification in the container’s YAML file. They make use use of KMIP services to provide customer generated keys for encryption.

This is all offered as part of their Data Security/Disaster Recovery (DSDR) service. that supports any K8s cluster service whether they be AWS, Azure, GCP, OpenShift, bare metal, or VMware vSphere running K8s VMs.

Like any software defined storage system, customers needing more performance can add nodes to the Portworx (and K8s) cluster or more/faster storage to speed up IO

It appears they have most if not all the standard storage system capabilities covered but their main differentiator, besides container app DR, is that they support volumes on a container by container basis. Unlike other storage systems that tend to use a VM or higher level of granularity to contain container state information, with Portworx, each persistent volume in use by a container is mapped to a provisioned volume.

Michael said their focus from the start was to provide high performing, resilient and secure storage for container apps. They ended up with a K8s native storage and backup/DR solution to support mission critical container apps running at scale. Licensing for Portworx is on a per host (K8s node basis).

The podcast ran long, ~48 minutes. Michael was easy to talk with, knew K8s and their technology/market very well. Matt and I had a good time discussing K8s and Portworx’s unique features made for K8s container apps. Listen to the podcast to learn more.

This image has an empty alt attribute; its file name is Subscribe_on_iTunes_Badge_US-UK_110x40_0824.png
This image has an empty alt attribute; its file name is play_prism_hlock_2x-300x64.png

Michael Ferranti, VP of Product Marketing, Portworx

Michael (@ferrantiM) is VP of Product Marketing at Portworx, where he is responsible for communicating the value of containerization and digital transformation to global architects and CIOs.

Prior to joining Portworx, Michael was VP of Marketing at ClusterHQ, an early leader in the container storage market and spent five years at Rackspace in a variety of product and marketing roles

83: GreyBeards talk NVMeoF/TCP with Muli Ben-Yehuda, Co-founder & CTO and Kam Eshghi, VP Strategy & Bus. Dev., Lightbits Labs

This is the first time we’ve talked with Muli Ben-Yehuda (@Muliby), Co-founder & CTO and Kam Eshghi (@KamEshghi), VP of Strategy & Business Development, Lightbits Labs. Keith and I first saw them at Dell Tech World 2019, in Vegas as they are a Dell Ventures funded organization. The company has 70 (mostly engineering) employees and is based in Israel, with offices in NY and the Valley as well as elsewhere around the world. Kam was previously with (Dell) EMC DSSD and Muli’s spent years as a Master Inventor with IBM Research.

[This was Keith Townsend’s (@CTOAdvisor & The CTO Advisor), first time as a GreyBeard co-host and we had a great time with him on the show.]

I would have to say it was a far ranging discussion but focused on their software defined, NVMeoF/TCP storage. As you may recall we talked with Solarflare Communications last year who were also working on a NVMeoF/TCP, only in their case it was an accelerator board. After the recording, Muli said the hardware accelerator they have is their own design.

Why NVMeoF/TCP?

Most NVMeoF today, that uses Ethernet, requires RoCE or iWARP compatible NICs and switches. Lightbits Labs has long been active in the NVMeoF/RoCE-iWARP market place. Early on they noticed that enterprise and cloud service providers were reluctant to adopt NVMeoF technology because of the need to change out all their networking equipment to use it. This is what brought about their focus on NVMeoF/TCP.

The advantage of NVMeoF/TCP is that it can be run on any Ethernet NIC and switch available today. From Muli’s perspective, NVMeoF/TCP is going to become the next SAN of choice for the data center. They were active, early on, in the standards committee to push for NVMeoF/TCP adoption.

How does it work?

Their software defined solution runs LightOS® storage software, a Linux based package, and uses off the shelf, server hardware with persistent storage (Optane DC PM/SSDs, NV DIMMs, V-NAND, etc.). They use persistent memory for a FAST write buffer and a place where they can “mold” the written data into something that can be better written to backend NVMe SSDs.

One surprise about Lightbits solution is that it offers a decent set of data services. These include erasure coding, thin provisioning, wire-speed inline compression, QoS and wide striping. It seems like any of these can be disabled by a customers want. But they only add very little overhead. I think Muli mentioned one Lightbits customer with encrypted data that disabled compression.

Lightbits also offers a global FTL (flash translation layer), which means they control SSD addressing which maps data to physical/raw NAND locations at the storage system level. If done well, a global FTL can help improve flash endurance and may offer better write performance (through increased parallelism).

Lightbits claim to inline, wire speed data compression is premised on the use of more current CPUs with high (>=28) core counts in a storage server. If the storage server has older CPUs (<28 cores), they suggest you install their LightField™ hardware accelerator add in card. LightField offers a number of hardware based, performance accelerations in addition to compression speedups.

LightOS requires no host (client) software. Muli’s a long time Linux kernel contributor and indicated that the only thing LightOS needs is a current Linux Kernel (5.0 or later) which has the NVMeoF/TCP driver software (and persistent memory). Lightbits believes that it’s only a matter of time until other OSs also implement NVMeoF/TCP drivers.

Lightbits business considerations

Long term, Lightbits sees a need for compute-storage disaggregation in hyper scalar and enterprise cloud environments. Early on it was relatively easy to replicate servers with DAS storage but as NVMe SSDs came out the expense to do this throughout their >>1000 server environment starts to become exorbitant. If they only had an easy way to disaggregate their storage from compute and still enjoy all the performance advantages of DAS NVMe SSDS. With LightOS they can do that.

Lightbits can be sold today through Dell, as a partner solution, which means that Dell can integrate, test and validate their servers with LightField accelerator card and deliver that package to your data center. I believe you still need to purchase and install their LightOS software yourself.

Lightbits charges for LightOS software on a per storage node basis, but they have different charges based on the maximum number of NVMe SSD slots available is in a server. There is no capacity charge. They also offer worldwide service and support for LightOS software and LightField hardware.

It’s all about performance

From a performance perspective, one Fortune 500 hyper-scalar benchmarked their storage solution against a DAS NVMe server and found it added about 30 µsec to the IO latency as compare to DAS NVMe SSDs. From their perspective, the added data services, better endurance, and disaggregated compute-storage environment provided by LightOS more than made up for the additional overhead.

Finally, I asked about whether multiple LightOS storage servers could be clustered together. Muli intervened, after stating some legal stuff, said they were working on the next generation LightOS and it will support clustered storage servers, local data replication as well as distributed (across storage servers) erasure coding.

The podcast is a long one and runs over ~47 minutes. There was a lot to talk about and Kam and Muli seem to know it all. It was interesting to hear the history of their pivot to TCP. They seem to have the right technology to address the market. Listen to the podcast to learn more.

Muli Ben-Yehuda, Co-founder and CTO, Lightbits Labs

Muli Ben-Yehuda is the CTO and Co-Founder of Lightbits Labs, where he leads technological developments.

Prior to founding Lightbits, he was chief scientist at Stratoscale and a researcher and Master Inventor at IBM Research.

He holds an M.Sc. in Computer Science (summa cum laude) from the Technion — Israel Institute of Technology and a B.A. (cum laude) from the Open University of Israel.

He is a long time Linux kernel contributor and his code and ideas are most likely included in an operating system or hypervisor running near you. He is also one of the authors of the NVMe/TCP standard and technology. 

Kam Eshghi, VP Strategy & Business Development, Lightbits Labs

Kam joined Lightbits Labs from Dell EMC and has over 20yrs of experience in strategic marketing and business development with startups and public companies.

Most recently as VP of strategic alliances at startup DSSD, Kam led business development with technology partners and developed DSSD’s partnership with EMC, leading to EMC’s acquisition of DSSD.

Previously as Sr. Director of Marketing & Business Development at IDT, Kam built their NVMe Controller business from scratch. Previous to that, Kam worked in data center storage, compute and networking markets at HP, Intel, and Crosslayer Networks. 

Kam is a U.C. Berkeley and MIT graduate with a BS and MS in Electrical Engineering and Computer Science and an MBA.