097: GreyBeards talk open source S3 object store with AB Periasamy, CEO MinIO

Ray was at SFD19 a few weeks ago and the last session of the week (usually dead) was with MinIO and they just blew us away (see videos of MinIO’s session here). Ray thought Anand Babu (AB) Periasamy (@ABPeriasamy), CEO MinIO, who was the main presenter at the session, would be a great invite for our GreyBeards podcast. Keith and I had a ball talking with AB.

Why object store

There’s something afoot in object storage space over the last year or so. It seems everybody is looking to deploy object store whether that be on prem, in CoLo facilities and in the cloud. It could be just the mass of data coming online but that trend has remained the same for years no. No it’s something else.

It all starts with AWS and S3. Over the last couple of years AWS has been rolling out new functionality that only works with S3 and this has been driving even more adoption of S3 as well as other object storage solutions.

S3 compatible object stores are available in just about every cloud service, available from major (and minor) storage vendors and in open source from MinIO.

Why S3 is so popular

Because object store is accessed via RestFUL interfaces, traditionally most implementations used their own API to access it. But when AWS created S3 (simple storage service) with their own API/SDK to access it, it somehow became the de-facto standard interface for all other object stores. S3 compatibility became a significant feature that all object stores had to support.

Sometime after that MinIO came into existence. MinIO provides a 100% open source, fully AWS S3 compatible object store that you can run anywhere on prem, in CoLo facilities and indeed in the cloud. In fact, there exist customers that run MinIO in AWS AB says this is probably just customers using a packaged software solution which happens to include MinIO but it’s nonetheless more expensive than AWS S3 as it uses EC2 instances and EBS storage to create an object store

Customers can access MinIO object stores with the AWS S3 SDK or the MinIO SDK. and you can access AWS S3 storage with AWS S3 SDK or use MinIO SDK. Occosionally, AWS S3 updates have broken MinIO’s SDK but these have been later fixed by AWS. It seems AWS and MinIO are on good terms.

AB mentioned that as customers get up to a few PBs of AWS S3 storage they often find the costs to be too high. It’s at this point that they start looking at other object storage solutions. But because MinIO is 100% S3 compatible and it’s open source many of these customers deploy it in their own data center facilities or in colo environments.

For those customers that want it, MinIO also offers an S3 gateway. With the gateway on prem customers can use S3 or standard file services to access S3 object storage located in the cloud. The gateway also works in the public cloud and can support both AWS s3 as well as Microsoft Blob storage as a backend.

MinIO matches AWS S3 features

AWS S3 has a number of great features and MinIO has matched or exceeded them all, step by step. AWS S3 has cross region replication options where customers can replicate S3 data from one region to another. MinIO supports both asynchronous replication of S3 data and synchronous replication (using RADIO).

But MinIO adds support for erasure coding within a fault domain. Default is Nx2 erasure coding which duplicates all your data so as long as 1/2 of your servers and storage are available you continue to have access to all your data. But this can be configured down like 12+4 where data is split accross 16 servers any four of which can fail and you can still access data.

AWS customers can use a Snowball (standalone storage device) to transfer data to or from S3 storage. AWS Snowball implements a subset of S3 API and requires a NAS staging area of equivalent size to migrate data out of S3. MinIO has support for Snowball’s limited S3 API and as such, Snowball’s can be used to migrate data into or out of MinIO. MinIO has a blog post which describes their support for AWS Snowball.

AWS also offers S3 Lambda services or server less computing services where compute services can be invoked when data is loaded in a bucket and then turned off when no longer needed. AWS Lambda depends on AWS messaging and other services to work properly. But MinIO supports Lambda like functionality using other open source services. AB mentions MQTT and Kafka services. MinIO has another blog post discussing their Lambda like services based on Kafka.

AWS recently implemented Snowflake a SQL database server for unstructured data that uses S3 storage to hold data. Ray and Keith almost choked on that statement as unstructured data and databases never used to be uttered in the same breath. But what AWS has shown was that you can use object store for database data as long as you are willing to load the table into memory and process it there and then unload any modified table data back into the object store. Indexing of the object data seems to be done as the data is being loaded and is also being done in a (random IO) cache or in memory and once done can also be unloaded into the object store.

Now Snowflake uses S3 but it’s not available on prem. MinIO has a number of data base partners that make use of their object store as a backend to host a Snowflake like service onprem. AB mentioned Spark and Splunk but there are others as well.

We ended up the discussion with what does it mean to have 20K stars on GitHub. AB said if you did a java script getting 20K stars would be easy but you just don’t see this sort of open source popularity for storage systems. He said the number is interesting but the growth rate is even more interesting.

The podcast runs ~47 minutes. AB was a great to talk tech with. Keith and I could have talked all afternoon with AB. It was very hard to stop the recording as we could have talked with him for another hour or more. AB said he doesn’t like to do podcasts or videos but he had no problem with us firing away questions. Listen to the podcast to learn more.

This image has an empty alt attribute; its file name is Subscribe_on_iTunes_Badge_US-UK_110x40_0824.png
This image has an empty alt attribute; its file name is play_prism_hlock_2x-300x64.png

Anand Babu Periasamy, CEO MinIO

AB Periasamy is the CEO and co-founder of MinIO. One of the leading thinkers and technologists in the open source software movement, AB was a co-founder and CTO of GlusterFS which was acquired by RedHat in 2011. Following the acquisition, he served in the office of the CTO at RedHat prior to founding MinIO in late 2015. AB is an active angel investor and serves on the board of H2O.ai and the Free Software Foundation of India.

He earned his BE in Computer Science and Engineering from Annamalai University.

096: GreyBeards YE2019 IT Industry Trends podcast

In this, our yearend industry wrap up episode, the GreyBeards discuss trends and technologdies impacting the IT industry in 2019 and what’s ahead for 2020. This year we have Matt and Keith on the podcast along with Ray. Just like last year, we start off with NVMeoF.

NVMeoF unleashed

This year just about every major storage vendor announced new systems that either have support for NVMeoF or currently offer NVMeoF on their storage systems. Most offer FC based NVMeoF but a few offer NVMeoF/Ethernet, fewer still offer both.

All of the NVMeoF/Ethernet seem to be using RoCE or iWARP. Unclear if one is more often used that the other, so for now both continue to be used in the market. Some storage vendors are offering NVMeoF as an internal fabric to access storage while still using iSCSI or FC/SCSI to access the data. This works better than SAS but won’t provide all the performance you can get from end-to-end NVMeoF.

NVMeoF is all about increasing IOPS and reducing response times. That and getting ready for SCM SSDs. In the mean time the SSD industry has introduced some very attractive NVMe (NAND) SSDs that in NVMeoF storage system can increase IOPS and reduce latencies.

We talked last year about NVMeoF standards finally stabilizing and this year the rollout across enterprise storage systems is testament to that.

SCM hits the enterprise

Most of us attended an Intel Data Center Event earlier this past yea,r where Optane DC PM was introduced. Optane DC PM is the memory version of Optane SCM (3DX Crosspoint) technology. Intel offers two distinct modes of accessing Optane DC PM as memory: 1) App Direct mode, where data in Optane DC PM persists across power cycles but requires one to use a special AP; and 2) Memory mode where Optane DC PM is cleared during a power cycle, (see our RayOnStorage post Need memory, Intel’s Optane DC PM…).

Vendors seem to be using Optane both memory and SCM technology differently. Pure is using Optane SSDs plugged into their FlashArray as sort of a read cache for customer IO. They suggest for well behaved applications this can reduce IO response times considerably.

Dell EMC introduced SCM as a storage tier and are using their automated storage tiering to move the hottest data to SCM. Oracle’s latest Exadata appliance uses Optane DC PM as both a read and write caching layer.

It won’t be long before every enterprise vendor offers SCM drives in their storage systems with a few offering Optane DC PM as in memory caching technology.

Of course, the big news for Optane DC PM is its use in memory databases, specifically SAP HANA. HANA can take advantage of the (6) TB of memory to to handle larger databases. Keith mentioned that even Microsoft SQL server can take advantage of the additional memory to provide faster responses to queries.

Keith also mentioned that there are some systems out there that can be configured to share Optane memory (or storage). When SAP or other databases use this solution they are able to amortize the cost of the technology over more use cases.

Of course, Optane DC PM are only available on the lastest generation Intel processors. None of us have heard anything from AMD (or Micron) on providing a second source for support of Optane DC PM (or the memory technology itself). Presumably most customers would want a second source for Optane DC PM processor support (as well as the technology)

Cloud enterprise storage hits mainstream

The other thing we saw more of this year is enterprise vendors offering versions of storage in public cloud environments. NetApp was an early proponent of doing this.

We saw at Pure that they have a new Cloud Block Store witch is a re-architected version of FlashArray//X storage using AWS hardware and networking services. We were very impressed with what they have accomplished and it was the subject of more than one late night discussion. Listen to the Keith & Ray show at Pure//Accelerate2019 podcast to learn more.

Matt mentioned Nimble’s cloud volume storage which is cloud adjacent. Most enterprise vendors offer something similar today. They differentiate on how easy it is to configure, use and where (which regions) it’s available in.

NetApp has arguably been at this the longest and has the deepest offerings available from cloud adjacent file and block storage, to offering native enterprise file services for all public cloud environments, to supplying a suite of dedicated data services to surround all of their storage technology operating in public clouds and on premises.

While Dell EMC may have missed the turn to the cloud, they are quickly trying to catch up. Keith mentioned Faction, a Dell partner that offers cloud storage services using VMware with VMC. With Faction and vSAN customers have access to software defined storage that uses cloud hardware to support data services.

What’s driving data growth

There seems to be no end for the need for storage to store data. The GreyBeards point to three trends driving data growth today.

  1. IoT seems to have no bounds. A recent RayOnStorage post Internet of Tires discussed how tire companies were tying their tires to the internet. And that’s just the start, pretty soon every artifact, every device, every manufactured item will have a number of sensors attached all of which will be creating massive amounts of data.
  2. AI ML DL has an insatiable appetite for data. IoT is being used largely to optimize products and services. But it’s DL, with a large dollop of data, that is behind much of that optmization.
  3. SaaS applications is a relatively new application approach that’s being rolled out to more arenas and as it’s online and user oriented, seems to generate lots of data.

Containers storage debate

We closed the podcast with a heavy debate on whether container applications have need for storage. Keith was adamant that containers by their very nature are stateless and that Kubernetes ability to stop and start container applications at will almost requires stateless operations.

Ray was a bit more theoretical on the topic and believed that most container applications today take advantage of some sort of database or other services to store state and that state is just another word for storage.

Keith mentioned encoding as a typical container app. Encoding containers can be fired up and taken down at will without hurting anything but throughput. Yes, but those encoder container apps must access some database or other state information to find out what work is left to do and as they complete their work they update this data as well as store their newly encoded segments. This all involves the use of state information.

In the end, I think we were talking about the same thing but using different terminology. Keith believes that persistent state information is needed and Ray says that this is just another word for (containers) storage. Matt said we probably need Nigel (@NigelPoulton) on the podcast to straighten us both out.

The podcast ran a bit long and could have run longer. Keith and Matt bring systems level perspective to what’s happening in the storage market. But they come at it from different sides. Ray seems to frame everything from a storage perspective. Diverse perspectives lead to a more fuller and interesting discussion. Listen to the podcast to learn more.


This image has an empty alt attribute; its file name is Spotify_Logo_CMYK_Black-1024x307.png
This image has an empty alt attribute; its file name is Subscribe_on_iTunes_Badge_US-UK_110x40_0824.png
This image has an empty alt attribute; its file name is play_prism_hlock_2x-300x64.png

Ray Lucchesi ( @RayLucchesi) is the host of GreyBeardsOnStorage and is President/Founder of Silverton Consulting, and a prominent blogger at RayOnStorage.com.

Keith Townsend (@CTOAdvisor) is a IT thought leader who has written articles for many industry publications, interviewed many industry heavyweights, worked with Silicon Valley startups, and engineered cloud infrastructure for large government organizations. Keith is the co-founder of The CTO Advisor, blogs at Virtualized Geek

Matt Leib (@MBLeib), one of our co-hosts, has been blogging in the storage space for over 10 years, with work experience both on the engineering and presales/product marketing. His blog is at Virtually Tied to My Desktop.


094: GreyBeards talk shedding light on data with Scott Baker, Dir. Content & Data Intelligence at Hitachi Vantara

Sponsored By:

At Hitachi NEXT 2019 Conference, last month, there was a lot of talk about new data services from Hitachi. Keith and I thought it would be a good time to sit down and talk with Scott Baker (@Kraken-Scuba), Director of Content and Data Intelligence, at Hitachi Vantara about what’s going on with data operations these days and how customers are shedding more light on their data.

Information supply chain

Something Scott said in his opening remarks caught my attention when he mentioned customer information supply chains. The information supply chain is similar to manufacturing supply chains, but it’s all about data. Just like manufacturing supply chains where parts and services come from anywhere and are used to create products/services for customers,

information supply chains are about the data used in their organization operations. Information supply chain data is A) being sourced from many places (or applications); B) being added to by supply chain processing (or other applications); and C) ultimately used by the organization to supply a product/service to customers.

But after the product/service is supplied the similarity between manufacturing and information supply chains breaks down. With the information supply chain, data is effectively indestructible, is infinitely re-useable and can live forever. Who throws data away anymore?

The problem most organizations have with information supply chains is once the product/service is supplied, data is often put away never to be seen again or as Scott puts it, goes dark.

This is where Hitachi Content intelligence (HCI) comes in. HCI is designed to take (unstructured or structured) data and analyze it (using natural language and other processing tools) to surround it with information and other metadata, so that it can become more visible and useful to the organization for the life of its existence.

Customers can also use HCI to extract and blend data streams together, automating the creation of an information rich, data repository. The data repository can readily be searched to re-discover or uncover attributes about the data not visible before.

Scott also mentioned the Hitachi Pentaho Platform which can be used to make real time decision from structured data. Pentaho information can also be fed into HCI to provide more intelligence for your structured data.

But HCI can also be used to analyze other database data as well. For instance, database blob and text elements can be fed to and analyzed by HCI. HCI analysis can include natural language processing and other functionality to tag the data by adding key:value information, all of which can be supplied back to the database or Pentaho to add further value to structured data.

Customers can also use HCI to read and transform database tables into XML files. XML files can be stored in object stores as objects or in file systems. XML data could easily be textually indexed and be searched by various tools to better understand the structured data information

We also talked about Hadoop data that can be offloaded to Hitachi Content Platform (HCP) object storage with a stub left behind. Once data is in HCP, HCI can be triggered to index and add more metadata, which can then later be used to decide when to move data back to Hadoop for further analysis.

Finally, Keith mentioned that he just got back from KubeCon and there was an increasing cry for data being used with containerized applications. Scott mentioned HCP for Cloud Scale, the newest member of the HCP object store family, focused on scale out capabilities to provide highly consistent, object storage performance for customers that need it. Customers running containerized workloads use scale-out capabilities to respond to user demand and now they have on premises object storage that can scale with them, as needs change.

The podcast ran ~24 minutes. Scott was very knowledgeable about data workflows, pipelines and the need for better discovery tools. We had a great time discussing information supply chains and how Hitachi can help customers optimize their data pipelines. Listen to the podcast to learn more.

This image has an empty alt attribute; its file name is Subscribe_on_iTunes_Badge_US-UK_110x40_0824.png
This image has an empty alt attribute; its file name is play_prism_hlock_2x-300x64.png

Scott Baker, Director of Content and Data Intelligence at Hitachi Vantara

Scott Baker is, and has been, an active member of the information technology, data analytics, data management, and data protection disciplines for longer than he is willing to admit.

In his present role at Hitachi, Scott is the Senior Director of the Content and Data Intelligence organization focused on Hitachi’s Digital Transformation, Data Management, Data Governance, Data Mobility, Data Protection and Data Analytics solutions which includes Hitachi Content Platform (HCP), HCP Anywhere, HCP Gateway, Hitachi Content Intelligence, and Hitachi Data Protection Solutions.

Scott is a VMware Certified Professional, recognized as a subject matter expert, industry speaker, and author. Scott has been a panelist on topics related to storage, cloud, information governance, data security, infrastructure standardization, and social media topics. His educational background includes an MBA, Master’s & Bachelor’s in Computer Science.

When he’s not working, Scott is an avid scuba diver, underwater photographer, and PADI Scuba Instructor. He has a passion for public speaking, whiteboarding, teaching, and traveling the world.

91: Keith and Ray show at CommvaultGO 2019

There was a lot of news at CommvaultGO this year and it was our first chance to talk with their new CEO, Sanjay Mirchandani. Just prior to the show Commvault introduced new SaaS backup offering for the mid market, Metallic™ and about a month or so prior to the show Commvault had acquired Hedvig, a software defined storage solution. Keith and I also participated in a TechFieldDay Exclusive (TFDx) for Commvault, the day before the show began.

First up is Metallic, a Commvault Venture. When Sanjay arrived he took a worldwide tour of Commvault offices and customers and came back saying they needed a Software-as-a-Service backup offering to go after the mid market. That was about 6 months ago and since then, they have spun up a development and marketing team and today delivered their first product.

Metallic has three offerings all based on Commvault technology but re-implemented to be simpler to use and operate in the cloud.

  1. Metallic Core Backup & Recovery which is targeted at virtualized server environments whether on premises or in the cloud. It covers backup and recovery for VMware vSphere, Microsoft Hyper-V & KVM VMs, SQL server and file servers running on Windows or Linux.
  2. Metallic Office 365 Backup & Recovery, which is targeted at Office 365 solutions and provides backup and recovery solutions for these customer environments.
  3. Metallic Endpoint Backup & Recovery, which is focused on desktop and laptop users and provides backup and recovery for those end-user environments.

Metallic operates in it’s own cloud environment (believed to be Microsoft Azure) and it’s a bring your own cloud secondary storage solution with an option to use Metallic cloud storage as secondary storage.

At the moment, Metallic is only offered to US based organizations and purchased through Commvault channel partners. However, the free (believe 45 day) trial can be downloaded and purchased without the channel.

Pricing for the Core Backup & Recovery is based on TB/month and pricing for the other two Metallic offerings is based on user seats/month. There doesn’t seem to be any retention limit for the Office365 and Endpoint products. The Core Backup product data retention is only limited by the TBs that are licensed.

Next up is Commvault Activate™. This product was announced at last years GO conference but neither Keith or I took note. Activate is data management solution using Commvault backup storage and provides three capabilities, File storage Optimization, which identifies files that are suitable for archive; Sensitive Data Governance, which profiles and id’s sensitive data in files and provides governance; and Compliance Search & eDiscovery, which can be used to put legal holds and create review sets for legal and other compliance activities.

And then there’s Hedvig, a Commvault Venture. At the show there was much talk about the Data Brain as having two sides, one was for the management of data protection and the other was for the management of storage. What Commvault plans to do over the next few years is to deliver on a unified storage and protection Data Brain that supports both of these sides. During the TFD sessions there was quite a lot of chatter, twitter and otherwise about whether customers would ever be willing to have both primary and secondary storage on the same system, or be have both be controlled by the same data plane. Commvault isn’t the only vendor to have gone down this path. We will need to wait and see how customers react.

The podcast is ~23 minutes. As mentioned previously, Keith is a long time friend and co-host of our GreyBeards On Storage podcast. He always has an interesting perspective on how new technology can benefit the data center today. Listen to the podcast to learn more.

This image has an empty alt attribute; its file name is Subscribe_on_iTunes_Badge_US-UK_110x40_0824.png
This image has an empty alt attribute; its file name is play_prism_hlock_2x-300x64.png

Keith Townsend, The CTO Advisor

Keith Townsend (@CTOAdvisor) is a IT thought leader who has written articles for many industry publications, interviewed many industry heavyweights, worked with Silicon Valley startups, and engineered cloud infrastructure for large government organizations.

Keith is the co-founder of The CTO Advisor, blogs at Virtualized Geek, and can be found on LinkedIN.

89: Keith & Ray show at Pure//Accelerate 2019

There were plenty of announcements at Pure//Accelerate in Austin this past week and we were given a preview of them at a StorageFieldDay Exclusive (SFDx), the day before the announcement.

First up is Pure’s DirectMemory. They have added Optane SSDs to FlashArray//X to be used as a read cache for customer data. As you may know, Pure already has an NVRAM write cache. With DirectMemory, customers can have 3TB or 6TB of Optane storage in a FlashArray//X70 or //X90 storage. It almost looks plug and play, you take out one or two flash modules and plug in Optane SSD(s) and off it goes. DirectMemory went GA at the show.

Pure also announced FlashArray//C at Accelerate. This is a new capacity optimized storage solution. They have re-designed their flash module to support higher capacity flash, and supply higher capacity storage (targeted for QLC flash but will originally ship with TLC). FlashArray//C supplies ~5PB of effective (~1.4PB raw) capacity in 9U. Although, FlashArray//C offers cheaper storage on $/GB basis it is also much slower (RT latency on order of 2-4msec) than FlashArray//X storage.. Pure like other vendors we have talked with are trying to drive disk technology out of the enterprise. We had some interesting discussions with Pure (and others) on this topic at the reception. Just remember, tape is still alive and well in the enterprise AND cloud, 52 years after being pronounced dead.

Pure had announced CloudBlockStore (CBS) previously but it is now GA through partners or on AWS marketplace. Give them kudos for their approach as they have taken a different approach to Pure storage in the cloud. With CBS, they have effectively re-archetected and re-implemented Pure FlashArray using AWS EC2, IO1, EBS and S3 storage and ended up with a highly available (iSCSI) block software defined storage. It will be interesting to see how well it’s adopted. Picture is from me explaining CBS architecture to @DVellante.

For Pure’s FlashBlade storage, they have doubled the number of blades in a cluster (or name space), from 75 to 150 FlashBlades. Each FlashBlade contains storage and compute (almost computational storage), so one should see an increase in bandwidth with the added blades. None at Pure would go on record with specific numbers on any performance improvement because it’s still undergoing testing.

Finally, FlashArray//X will offer full NFS and SMB file support. This is coming from a recent acquisition (Compuverde). They plan to differentiate between file on FlashArraiy//X file storage and FlashBlade by saying that FlashArray//X file is for those customers with mostly block storage requirements but also need small amount of file storage and FlashBlade for everyone else that needs file.

The podcast is ~23 minutes. Keith is a long time friend and co-host of our GreyBeards On Storage podcast. He’s always got an interesting perspective on how new technology can benefit the data center today. Listen to the podcast to learn more.

This image has an empty alt attribute; its file name is Subscribe_on_iTunes_Badge_US-UK_110x40_0824.png
This image has an empty alt attribute; its file name is play_prism_hlock_2x-300x64.png

Keith Townsend, The CTO Advisor

Keith Townsend (@CTOAdvisor) is a IT thought leader who has written articles for many industry publications, interviewed many industry heavyweights, worked with Silicon Valley startups, and engineered cloud infrastructure for large government organizations. Keith is the co-founder of The CTO Advisor, blogs at Virtualized Geek, and can be found on LinkedIN.