097: GreyBeards talk open source S3 object store with AB Periasamy, CEO MinIO

Ray was at SFD19 a few weeks ago and the last session of the week (usually dead) was with MinIO and they just blew us away (see videos of MinIO’s session here). Ray thought Anand Babu (AB) Periasamy (@ABPeriasamy), CEO MinIO, who was the main presenter at the session, would be a great invite for our GreyBeards podcast. Keith and I had a ball talking with AB.

Why object store

There’s something afoot in object storage space over the last year or so. It seems everybody is looking to deploy object store whether that be on prem, in CoLo facilities and in the cloud. It could be just the mass of data coming online but that trend has remained the same for years no. No it’s something else.

It all starts with AWS and S3. Over the last couple of years AWS has been rolling out new functionality that only works with S3 and this has been driving even more adoption of S3 as well as other object storage solutions.

S3 compatible object stores are available in just about every cloud service, available from major (and minor) storage vendors and in open source from MinIO.

Why S3 is so popular

Because object store is accessed via RestFUL interfaces, traditionally most implementations used their own API to access it. But when AWS created S3 (simple storage service) with their own API/SDK to access it, it somehow became the de-facto standard interface for all other object stores. S3 compatibility became a significant feature that all object stores had to support.

Sometime after that MinIO came into existence. MinIO provides a 100% open source, fully AWS S3 compatible object store that you can run anywhere on prem, in CoLo facilities and indeed in the cloud. In fact, there exist customers that run MinIO in AWS AB says this is probably just customers using a packaged software solution which happens to include MinIO but it’s nonetheless more expensive than AWS S3 as it uses EC2 instances and EBS storage to create an object store

Customers can access MinIO object stores with the AWS S3 SDK or the MinIO SDK. and you can access AWS S3 storage with AWS S3 SDK or use MinIO SDK. Occosionally, AWS S3 updates have broken MinIO’s SDK but these have been later fixed by AWS. It seems AWS and MinIO are on good terms.

AB mentioned that as customers get up to a few PBs of AWS S3 storage they often find the costs to be too high. It’s at this point that they start looking at other object storage solutions. But because MinIO is 100% S3 compatible and it’s open source many of these customers deploy it in their own data center facilities or in colo environments.

For those customers that want it, MinIO also offers an S3 gateway. With the gateway on prem customers can use S3 or standard file services to access S3 object storage located in the cloud. The gateway also works in the public cloud and can support both AWS s3 as well as Microsoft Blob storage as a backend.

MinIO matches AWS S3 features

AWS S3 has a number of great features and MinIO has matched or exceeded them all, step by step. AWS S3 has cross region replication options where customers can replicate S3 data from one region to another. MinIO supports both asynchronous replication of S3 data and synchronous replication (using RADIO).

But MinIO adds support for erasure coding within a fault domain. Default is Nx2 erasure coding which duplicates all your data so as long as 1/2 of your servers and storage are available you continue to have access to all your data. But this can be configured down like 12+4 where data is split accross 16 servers any four of which can fail and you can still access data.

AWS customers can use a Snowball (standalone storage device) to transfer data to or from S3 storage. AWS Snowball implements a subset of S3 API and requires a NAS staging area of equivalent size to migrate data out of S3. MinIO has support for Snowball’s limited S3 API and as such, Snowball’s can be used to migrate data into or out of MinIO. MinIO has a blog post which describes their support for AWS Snowball.

AWS also offers S3 Lambda services or server less computing services where compute services can be invoked when data is loaded in a bucket and then turned off when no longer needed. AWS Lambda depends on AWS messaging and other services to work properly. But MinIO supports Lambda like functionality using other open source services. AB mentions MQTT and Kafka services. MinIO has another blog post discussing their Lambda like services based on Kafka.

AWS recently implemented Snowflake a SQL database server for unstructured data that uses S3 storage to hold data. Ray and Keith almost choked on that statement as unstructured data and databases never used to be uttered in the same breath. But what AWS has shown was that you can use object store for database data as long as you are willing to load the table into memory and process it there and then unload any modified table data back into the object store. Indexing of the object data seems to be done as the data is being loaded and is also being done in a (random IO) cache or in memory and once done can also be unloaded into the object store.

Now Snowflake uses S3 but it’s not available on prem. MinIO has a number of data base partners that make use of their object store as a backend to host a Snowflake like service onprem. AB mentioned Spark and Splunk but there are others as well.

We ended up the discussion with what does it mean to have 20K stars on GitHub. AB said if you did a java script getting 20K stars would be easy but you just don’t see this sort of open source popularity for storage systems. He said the number is interesting but the growth rate is even more interesting.

The podcast runs ~47 minutes. AB was a great to talk tech with. Keith and I could have talked all afternoon with AB. It was very hard to stop the recording as we could have talked with him for another hour or more. AB said he doesn’t like to do podcasts or videos but he had no problem with us firing away questions. Listen to the podcast to learn more.

This image has an empty alt attribute; its file name is Subscribe_on_iTunes_Badge_US-UK_110x40_0824.png
This image has an empty alt attribute; its file name is play_prism_hlock_2x-300x64.png

Anand Babu Periasamy, CEO MinIO

AB Periasamy is the CEO and co-founder of MinIO. One of the leading thinkers and technologists in the open source software movement, AB was a co-founder and CTO of GlusterFS which was acquired by RedHat in 2011. Following the acquisition, he served in the office of the CTO at RedHat prior to founding MinIO in late 2015. AB is an active angel investor and serves on the board of H2O.ai and the Free Software Foundation of India.

He earned his BE in Computer Science and Engineering from Annamalai University.

095: GreyBeards talk file sync&share with S. Azam Ali, VP Customer Success at CentreStack

We haven’t talked with a file synch and share vendor in a while now and Matt was interested in the technology. He had been talking with CentreStack, and found that they had been making some inroads in the enterprise. So we contacted S. Azam Ali, VP of Customer Success at CentreStack and asked if he wanted to talk about their product on our podcast.

File synch and share, is part collaboration tool, part productivity tool. With file synch & share many users share the same files, across many different environments and end point devices. It’s especially popular with road warriors that need access to the same files on the road that reside in corporate data centers. With this technology, files updated anywhere would be available to all.

Most file synch&share systems require you to use their storage. But CentreStack just provides synch and share access to NFS and SMB storage that’s already in the data center.

CentreStack doesn’t use VPNs to access data, many other vendor do. But with CentreStack, one just log’s into a website (with AD credentials) and they have immediate browser access to files.

CentreStack uses a gateway VM, that runs in the corporate data center, configured to share files/file directories/shares. We asked whether they were in the data path and Azam said no. However, the gateway does register for file system notifications (e.g. when files are updated, outside CentreStack, they get notified).

CentreStack does maintain meta-data on the files, directories, shares that are under it’s control. Presumably, once an admin sets it up, it goes out and access the file systems that have shared files and populates their meta-data for those files.

CentreStack works with any NFS and SMB file system as well as NAS servers that support these two. It’s unclear whether customers can have more than one gateway server in their data center supporting synch and share but Azam did say that it wasn’t unusual for customers with multi-data centers to have a gateway in each, to support synch&share requirements for each data center.

They use client software on end point devices, which presents the shared files as an external drive (to Mac), presumably a cloud drive for Windows PCs and similar services (in an App) for other systems (IOS, Android phones, iPad, etc.). We believe Azam said Linux was coming soon.

The client software can be configured in cache mode or offline mode:

  • Cache mode – the admin can configure how much space to use on the endpoint device and the software will cache the most recently used files in that space for faster access
  • Offline mode – the software moves all files that the endpoint login can access, to the device.

In cache mode, when users open a file (not in the most recently used cache), there will be some delay as the system retrieves data from the internet and copies it to the endpoint device. Unclear what the delay might be but it’s probably a function of internet speed and load on the gateway, with possibly some overhead for the NFS/SMB/NAS system to supply the data. If there’s not enough space to hold the file, the oldest non-open file is erased from the cache.

In both modes, Centrestack supports cross domain locking. That is, if one client has a file open (for update), all other systems/endpoints may only access the file in read-only mode. After the file is closed. the file can then be opened for update by other users.

When CentreStack clients are used to update files, the data is stored back in the original file systems with versioning. This way if the data is corrupted, admins can easily return back to a known good copy version.

CentreStack also offers a cloud backup and DR service. Gateway admins can request that synch&share files be backed up to cloud storage (AWS S3, Azure Blob and Wasabi). When CentreStack backups file data to the cloud, it also includes metadata information about the files so they can be re-constituted anywhere.

A CentreStack cloud gateway VM can be activated in the cloud to supply access to backed up files. Unclear whether the CentreStack cloud backup has to be restored to block or file storage first or if it just accesses the data on cloud storage directly. But one customers using CentreStack cloud DR would need to run client software in their applications accessing these files.

Wasabi seemed an odd solution to have on their list of supported cloud storage providers, but Azam said for their market, the economics of Wasabi storage were hard to ignore. See our previous podcast with David Friend, Co-Founder& CEO, Wasabi, to learn more about Wasabi.

CentreStack is licensed on a per user basis, not storage capacity bucking industry trends. But they don’t actually own the storage so it makes sense. For CentreStack cloud backup, customers also have to supply the cloud storage.

They also offer a 30 day free trial on their website with unlimited users. We assume this uses CentreStacks cloud gateway and customers bring their own cloud storage to support it.

The podcast runs about 35 minutes. Azam was a bit more marketing than we are used to, but he warmed up once we started asking questions. Listen to the podcast to learn more.

This image has an empty alt attribute; its file name is Subscribe_on_iTunes_Badge_US-UK_110x40_0824.png
This image has an empty alt attribute; its file name is play_prism_hlock_2x-300x64.png

S. Azam Ali, VP of Customer Success, CentreStack

S. Azam Ali, is VP of Customer Success at CentreStack and is an executive with extensive experience in managing global teams including sales, support and consulting services.

Azam’s channel experience includes on-boarding new partners including creation of marketing and training collateral for the partners. Azam is an executive with a passion for customer success and establishing long term relationships and partnerships.

Azam is also an advisor to startups as well as established technology companies.

92: Ray talks AI with Mike McNamara, Sr. Manager, AI Solution Mkt., NetApp

Sponsored By: NetApp

NetApp’s been working in the AI DL (deep learning) space for a long time now and announced their partnership with NVIDIA DGX systems, back in August of 2018. At NetApp Insight, this week they were showing off their new NVIDIA DGX systems reference architectures. These architectures use NetApp AFF A800 storage (for more info on AI DL, checkout Ray’s Learning Machine (deep) Learning posts – part 1, – part 2 and – part3).

Besides the ONTAP AI systems, NetApp also offers

  • FlexPod AI solution based on their partnership with Cisco using UCS C480 ML M5 rack servers which include 8 NVIDA Tesla V100 GPUs and also features NetApp AFF A800 storage for use in core AI DL.
  • NetApp HCI has two configurations with 2- or 3-NVIDIA GPUs that come in 1U or 2U rack servers and run VMware vSphere or RedHad OpenStack/OpenShift software hypervisors suitable for edge or core AI DL.
  • E-series reference architecture that uses the BeeGFS parallel file system and offers InfiniBAND data access for HPC or core AI DL.

On the conference floor, NetApp showed AI DL demos for automotive, financial services, Public Sector and healthcare verticals. They also had a facial recognition application running that could estimate your age and emotional state (I didn’t try it, but Mike said they were hedging the model so it predicted a lower age).

Mike said one healthcare solution was focused on radiological image scans, to identify pathologies from x-Ray, MRI, or CAT scan images. Mike mentioned there was a lot of radiological technologists burn-out due to the volume of work caused by the medical imaging explosion over the last decade or so. Mike said image analysis is something that h AI DL can perform very effectively and doing so would improve the accuracy and reduce the volume of work being done by technologists.

He also mentioned another healthcare application that uses an AI DL app to count TB cells in blood samples and estimate the extent of TB infections. Historically, this has been time consuming, error prone and hard to do in the field. The app uses a microscope with a smart phone and can be deployed and run anywhere in the world.

Mike mentioned a genomics AI DL application that examined DNA sequences and tried to determine its functionality. He also mentioned a retail AI DL facial recognition application that would help women “see” what they would look like with different makeup on.

There was a lot of discussion on NetApp Cloud services at the show, such as Cloud Volume Services and Azure NetApp File (ANF). Both of these could easily be used to implement an AI DL application or be part of an edge to core to cloud data flow for an AI DL application deployment using NetApp Data Fabric.

NetApp also announced a new, all flash StorageGRID appliance that was targeted at heavy IO intensive uses of object store like AI DL model training and data analytics.

Finally, Mike mentioned NetApp’s ecosystem of partners working in the AI space to help customers deploy AI DL algorithms in their industries. Some of these include:

  1. Flexential, Try and Buy AI so that customers could bring them in to supply AI DL expertise to generate an AI DL application using customer data and deploy it on customer cloud or on prem infrastructure .
  2. Core Scientific, AI-as-a-Service, so that customers could purchase a service to implement an AI DL application using customer data and running on Core Scientific infrastructure..
  3. Scale Matrix, Mobile data center AI, so that customers could create an AI DL application and run it on Scale Matrix infrastructure that was transported to wherever the customer wanted it to be run.

We recorded the podcast on the show floor, in a glass booth, so there’s some background noise (sorry about that, but can’t be helped). The podcast is ~27 minutes. Mike is a long time friend and NetApp product expert, recently working in AI DL solutions at NetApp. When I saw Mike at Insight, I just had to ask him about what NetApp’s been doing in the AI DL space. Listen to the podcast to learn more.

This image has an empty alt attribute; its file name is Subscribe_on_iTunes_Badge_US-UK_110x40_0824.png
This image has an empty alt attribute; its file name is play_prism_hlock_2x-300x64.png

Mike McNamara, Senior Manager AI Solution Marketing, NetApp

With over 25 years of data management product and solution marketing experience, Mike’s background includes roles of increasing responsibility at NetApp (10+ years), Adaptec, EMC and Digital Equipment Corporation. 

In addition to his past role as marketing chairperson for the Fibre Channel Industry Association, he was a member of the Ethernet Technology Summit Conference Advisory Board, a member of the Ethernet Alliance, and a regular contributor to industry journals, and a frequent speaker at events.

84: GreyBeards talk ultra-secure NAS with Eric Bednash, CEO & Co-founder, RackTop Systems

We were at a recent vendor conference where Steve Foskett (@SFoskett) introduced us to Eric Bednash (@ericbednash), CEO & Co-Founder, RackTop Systems. They have taken ZFS and made it run as a ultra-secure NAS system. Matt Leib, my co-host for this episode, has on-the-job experience with ZFS and was a great co-host for this episode.

It turns out that Eric and his CTO (perhaps other RackTop employees) have extensive experience with intelligence and other government agencies that depend on data security. These agencies deal with cyber security threats an order of magnitude larger, than what corporations see .

All that time in intelligence gave Eric a unique perspective on what it takes to build secure, bullet proof NAS systems. Nine years or so ago, he and his CTO, took OpenZFS (and OpenSolaris) and used it as the foundation for their new highly available and ultra-secure NAS system.

Most storage systems support user access data protection based on authorization. If a user is authorized to see/write data, they have unrestricted access to the data. Perhaps if an organization is paranoid, they might also use data at rest encryption. But RackTop takes all this to a whole other level.

Data security to the Nth degree

RackTop offers dual encryption for data at rest. Most organizations would say single encryption’s enough. The data’s encrypted, how will another level of encryption make it more secure.

It all depends on how one secures keys (and just my thoughts here, maybe how easily quantum computing can decrypt singly encrypted data). So RackTop systems uses self encrypting drives (1st level of encryption) as well as software encryption (2nd level of encryption). Each having their own unique keys RackTop can maintain either in their own system or in a KMIP service provided by the data center.

They also supply user profiling. User data access can be profiled with a dataset heat map and other statistical/logging information. When users go outside their usual access profiles, it may signal a security breach. At the moment, when this happens RackTop notifies security administrators, but Eric mentioned a future release will have the option to automatically shut that user down.

And with all the focus on GDPR and similar regulations coming to a state near you, having user access profiles and access logs can easily satisfy any regulatory auditing requirements.

Eric said that any effective security has to be multi-layered. With RackTop, their multi-layer approach goes way beyond just data-at-rest encryption and user access authentication. RackTop also offers their appliance hardware sourced from secure supply chains and manufactured inside secured facilities. They have also modified OpenSolaris to be more secure and hardened it and its OS against cyber threat.

RackTop even supports cloud tiering with an internally developed secure data mover. Their data mover can securely migrate data (retaining meta-data on their system) to any S3 compatible object storage.

As proof of the security available from a RackTop NAS system, an unnamed US government agency had a “red-team” attack their storage. Although Eric shared only a few details on what the red-team attempted, he did say RackTop NAS survived the assualt without security breach.

He also mentioned that they are trying to create a Zero Trust storage environment. Zero Trust implies constant verification and authentication. Rather like going beyond one time entered login credentials and making users re-authenticate every time they access data. Eric didn’t say when, if ever they’d reach this level of security but it’s a clear indication of a direction for their products.

ZFS based NAS system

A RackTop NAS supplies a ZFS-based file system. As such, it inheritnall the features and advanced functionality of OpenZFS but within a more secured, hardened and highly available storage system

ZFS has historically had issues with usability and its multiplicity of tuning knobs. RackTop has worked hard to make ZFS easier to operate and removed much of the manual tuning required to make it perform well.

The podcast is a long and runs over ~44 minutes. We spent most of our time talking about security and less on the storage functionality of RackTop NAS. The security of RackTop systems takes some getting used to but the need exists today and not many storage systems are implementing security quite to their level. Much of what RackTop does to improve data security blew Matt and I away. Eric is a very smart security expert in addition to being a storage vendor CEO. Listen to the podcast to learn more.

Eric Bednash, CEO & Co-founder, RackTop Systems

Eric Bednash is the co-founder and CEO of RackTop Systems, the pioneer of CyberConvergedTM data security, a new market that fuses data storage with advanced security and compliance into a single platform.   

A serial entrepreneur and innovator, Bednash has more than 20 years of experience in solving the most complex and challenging data problems through designing products and solutions for the U.S. Intelligence Community and commercial enterprises.

Bednash co-founded RackTop in 2010 with partner and current CTO Jonathan Halstuch. Prior to co-founding RackTop, he served as co-founder and CTO of a mid-sized consulting firm, focused on developing mission data systems within the Department of Defense and U.S. intelligence communities.

Bednash started his professional career in data center systems at Time-Warner, and spent the better part of the dot-com boom in the Washington, D.C. area connecting businesses to the internet. His career path began while still in high school, where Bednash’s contracted with small businesses and individuals to write software and build computers. 

Bednash attended Rochester Institute of Technology and Penn State University, and completed both undergrad and graduate coursework in Business and Technology Management at Stevenson University. A Forbes Technology Council member, he regularly hosts thought leadership & technology video blogs, and is a technology writer and speaker. He is a multi-instrument musician, recreational athlete and a die-hard Pittsburgh Steelers fan. He currently resides in Fulton, Md. with his wife Laura and two children

67: GreyBeards talk infrastructure monitoring with James Holden, Sr. Prod. Mgr. NetApp

Sponsored by: Howard and I first talked with James Holden, NetApp Senior Product Manager for OnCommand Insight and Cloud Insights,  last month, at Storage Field Day 16 (SFD16) in Waltham, MA. At the time, we thought it would be great to also have him on the show.

James has been with the NetApp OnCommand Insight (OCI) team for quite awhile now and is very knowledgeable about the product and its technology. NetApp Cloud Insights is a new SaaS offering that provides some of the same services as OCI without the footprint, focused on newer, non-traditional applications and available on a pay as you go model.

NetApp OnCommand Insight (OCI)

NetApp OCI is sort of a stripped down, souped up enterprise SRM tool, without storage and servers configuration-provisioning (see James’s introduction video from SFD15 for more info). It supports NetApp and just about anyone’s storage including Dell EMC, IBM, Hitachi Vantara (HDS), HPE, Infinidat, and Pure Storage as well as most major OSs such as VMware vSphere, Microsoft HyperV, RHEL, etc. Other storage can easily be  added to OCI through a patch/minor update and is typically done by customer request.

NetApp OCI currently runs in some of the biggest enterprises  in the world today, including top F500 companies and one of the world’s largest banks. OCI is agentless but does use a data collector server/VM onprem or in cloud that takes advantage of storage and system APIs to gather data.

OCI provides extensive end-to-end infrastructure monitoring and trouble shooting (see James’s SFD16 OCI monitoring & troubleshooting session). OCI monitors application workloads from VMs to the storage supporting them.

OCI also supplies extensive charge back capabilities (see his SFD16 OCI cost control/chargebacks session). In times like these when IT competes with public cloud offerings every day, charge backs can be very illuminating.

Also, OCI has extensive integration with ServiceNOW and similar offerings (see SFD16 OCI ecosystem session). With this level of integration, OCI can provide seamless tracking of service requests from initiation to completion through verification.

In addition, OCI can monitor public cloud infrastructure as well as onprem. For example, with Amazon Web Services (AWS), customers can use OCI to monitor EC2 instances EBS IO activity. OCI reports on AWS IOPS rates by EC2-EBS connection. Customers paying for EBS IOPS, can use OCI to monitor and tailor their EBS costs. OCI also supports Microsoft Azure environments.

NetApp Cloud Insights

NetApp Cloud Insights, a new SaaS offering, that is currently in Public Preview status but is expected to release in October, 2018 (checkout his SFD16 Cloud Insights session video).

Customers can currently register to use the preview version at Cloud.netapp.com/Cloud Insights. There’s a registration wall but that’s all it takes to get started. .

The minimum Cloud Insights instance is a single server and 5TB of storage. Unlike OCI, Cloud Insights is tailored to support smaller shops without significant infrastructure. However, Cloud Insight also offers standard onprem enterprise infrastructure monitoring as well.

Cloud Insights is also focused on modern, cloud-native applications whether they operate on prem or in the cloud. The problem with cloud native, container apps is that they come and go in seconds, and there’s thousands of them. Cloud Insights was designed specifically for container and other cloud native applications and as such, should provide a more accurate monitoring of operations for these systems.

We talked about Cloud Insight’s development cadence. James said that because it’s a SaaS offering new Cloud Insights functionality can be released daily, if not more frequently. Contrast that with OCI, where they schedule 3-4 releases a year.

Cloud Insight currently supports the Kubernetes container ecosystems today but more are on the way. Again, customers determine which Container or other cloud native ecosystems will be supported next.

The podcast runs ~22 minutes. James was very knowledgeable about OCI, Cloud Insights and infrastructure monitoring in general and he was easy to talk with. Howard and I had a great time at SFD16 and enjoyed our time talking with him again on the podcast.  Listen to the podcast to learn more.

James Holden, Senior Product Manager NetApp OCI and Cloud Insights 

 

James Holden is a Senior Manager of Product Management at NetApp, and for the last 5 years  has been building the infrastructure monitoring and reporting tool OnCommand Insight.

Today he is working across NetApp’s Cloud Analytics portfolio, including Cloud Insights, a new SaaS offering currently in preview.

Prior to NetApp, James worked for 14 years at CSC in both the US and the UK on their storage, compute and automation solutions.