126: GreyBeards talk k8s storage with Alex Chircop, CEO, Ondat

Keith and I had an interesting discussion with Alex Chircop (@chira001), CEO of Ondat, a kubernetes storage provider. They have a high performing system, laser focused on providing storage for k8s stateful container applications. Their storage is entirely containerized and has a number of advanced features for data availability, performance and security that developers need the run stateful container apps. Listen to the podcast to learn more.

We started by asking Alex how Ondats different from all the other k8s storage solutions out there today (which we’ve been talking with lately). He mentioned three crucial capabilities:

  • Ondat was developed from the ground up to run as k8s containers. Doing this would allow any k8s distribution to run their storage to support stateful container apps. .
  • Ondat was designed to allow developers to run any possible container app. Ondat supports both block as well as file storage volumes.
  • Ondat provides consistent, superior performance, at scale, with no compromises. Sophisticated data placement insures that data is located where it is consumed and their highly optimized data path provides low-latency access that data storage.

Ondat creates a data mesh (storage pool) out of all storage cluster nodes. Container volumes are carved out of this data mesh and at creation time, data and the apps that use them are co-located on the same cluster nodes.

At volume creation, Dev can specify the number of replicas (mirrors) to be maintained by the system. Alex mentioned that Ondat uses synchronous replication between replica clusters nodes to make sure that all active replica’s are up to date with the last IO that occurred to primary storage.

Ondat compresses all data that goes over the network as well as encrypts data in flight. Dev can easily specify that the data-at-rest also be compressed and/or encrypted. Compressing data in flight helps supply consistent performance where networks are shared.

Alex also mentioned that they support both the 1 reader/writer, k8s block storage volumes as well as multi-reader/multi-writer, k8s file storage volumes for containers.

In Ondat each storage volume includes a mini-brain used to determine primary and replica data placement. Ondat also uses desegregated consensus to decide what happens to primary and replica data after a k8s split cluster occurs. After a split cluster, isolated replica’s are invalidated and replicas are recreate, where possible, in the surviving nodes of the cluster portion that holds the primary copy of the data.

Also replica’s can optionally be located across AZs if available in your k8s cluster. Ondat doesn’t currentlysupport replication across k8s clusters.

Ondat storage works on any hyperscaler k8s solution as well as any onprem k8s system. I asked if Ondat supports VMware TKG and Alex said yes but when pushed mentioned that they have not tested it yet.

Keith asked what happens when things go south, i.e., an application starts to suffer worse performance. Alex said that Ondat supplies system telemetry to k8s logging systems which can be used to understand what’s going on. But he also mentioned they are working on a cloud based, Management-aaS offering, to provide multi-cluster operational views of Ondat storage in operation to help understand, isolate and fix problems like this.

Keith mentioned he had attended a talk by Google engineers that developed kubernetes and they said stateful containers don’t belong under kubernetes. So why are stateful containers becoming so ubiquitous now.

Alex said that may have been the case originally but k8s has come a long way from then and nowadays as many enterprises shift left enterprise applications from their old system environment to run as containers they all require state for processing. Having that stateful information or storage volumes accessible directly under k8s makes application re-implementation much easier.

What’s a typical Ondat configuration? Alex said there doesn’t appear to be one. Current Ondat deployments range from a few 100 to 1000s of k8s cluster nodes and 10 to 100s of TB of usable data storage.

Ondat has a simple pricing model, licensing costs are determined by the number of nodes in your k8s cluster. There’s different node pricing depending on deployment options but other than that it’s pretty straightforward.

Alex Chircop, CEO Ondat

Alex Chircop is the founder and CEO of Ondat (formerly StorageOS), which makes it possible to easily deploy and manage stateful Kubernetes applications with persistent data volumes. He also serves as co-chair of the CNCF (Cloud Native Computing Foundation) Storage Technical Advisory Group.

Alex comes from a technical background working in IT that includes more than 10 years with Nomura and Goldman Sachs.

125: GreyBeard talk K8s storage with Tad Lebeck, US CTO for ionir

We had some technical difficulties with Matt getting on the podcast so, Ray had to fly solo. This month we continue our investigations into K8s storage with a discussion with Tad Lebeck (@TadLebeck) US CTO, ionir, a software defined storage system that only runs under K8s. ionir Kubernetes Data Services platform is an outgrowth of Reduxio a “tin-wrapped” software defined storage system which pivoted to K8s as the environment to target and left the tin behind.

ionir offers a deduplicating, continuous data protection storage system for PVs (persistent volumes) under K8s that uses 3 way mirroring, across data nodes for data protection. Their solution offers a number of unique services that we haven’t seen in other K8s storage systems. Listen to the podcast to learn more.

Tad opened with a long spiel on what ionir is and we spent the next 40 minutes unpacking that to understand what exactly they were doing.

Let’s start with why stateful containers are all the rage these days. Tad had a slightly different rationale than we’ve heard before. From his perspective, it all comes from current enterprise applications that used database servers/machines. As these apps are re-factored to run as K8s containerized micro services, developers need and want their data be containerized right along with the application.

ionir constructs a block storage system across K8s data nodes or K8s worker nodes with direct attached storage. In the cloud, this storage can be ephemeral (storage that only exists as long as the compute instance operates) or normal block storage (e.g., EBS in AWS). It’s unclear how ephemeral works on-prem. But in any case, they cluster together a set of data nodes into one massive block storage and map PVs onto that. K8s data nodes can be added to the ionir cluster while it’s operating.

As mentioned earlier, they use 3-way mirroring for data protection and ionir insures the 3 copies are stored on different data nodes. As such, when one data node goes down, copies of PV data are available from the other 2 nodes and the data can then be rewritten elsewhere to insure 3-way mirroring continues. We suppose this means a minimum configuration requires at least 3 data nodes.

ionir also provides deduplicating block storage, which should theoretically reduce physical storage footprint for any PV. Data blocks are deduplicated across the cluster. ionir also has a metadata service (also 3 way replicated, to different data space) that records the manifest for all blocks associated with a PV, their hashes and (logical/physical) locations.

There was no mention of data compression or encryption so those are probably not present. We find deduplication very effective for backup storage but less effective for primary storage. Any deduplication ratio for ionir primary storage is likely specific to data being stored, i.e. columnar database, row database, text, office files, etc. Each of these would likely have different dedupe ratios for primary storage.

Furthermore, ionir supplies continuous data protection (CDP) for PV data. PV data written to ionir is immutable, i.e., never modified AND they keep previous versions of PV blocks in storage until they age out. This allows ionir to provide any prior version (well most recent ones) of a PV. ionic uses a timestamp to distinguish different PV versions. So, if ransomware attacked your site, users could ask for a PV version just prior to the time of the attack and you’d have that version of the PV to restart operations. Customer’s can limit how far back ionir saves prior versions of blocks for PVs.

Having CDP for PVs, makes DevOps qualification and testing significantly faster. Normally DevOps would need to copy production data to test environments in order to validate new app code. But ionir can easily instantiate a separate copy of any PV (at any time in their saved set) in a matter of seconds. This can take DevOps deployment testing down from days to minutes or less.

In addition, ionir can teleport PV data to other, remote K8s clusters running ionir. Essentially, this copies PV metadata and it’s “hot” blocks over to any remote ionir cluster. During teleportation, the remote cluster can access PV data as soon as all PV metadata has been copied. The remote site accesses this PV data from the originating cluster (albeit much slower than accesses within the cluster) while “hot” blocks are being copied. Any writes, at the remote site, to PV data would be considered new data, deduplicated at the remote site, and only available at the remote site. Somewhat surprisingly, all of the PV’s data is never copied to the remote system, leaving the PV in a permanent teleported access mode.

Not sure we like the implications of teleporting PVs, from a data integrity perspective. It does make for near-instant access to PV data from other clusters and offers a solution to data gravity (it takes forever to move TB of data across the web), it’s incomplete, as the data is never fully copied to the remote site. Once hot blocks have been copied, remote cluster PV access should run faster. But If there’s 20% of the requested blocks, not in the heat map, those IOs will take 100s mseclonger, depending on wire distance between the sites, to perform. And the write’s at the remote site cause the two copies (one at source site and one at remote site) of the PV to diverge.

Their storage system is priced on a per data node basis which makes it easy to price out their various deployment options. And it works on any K8s standard environment, although Tad admits they haven’t tested VMware Tanzu yet, but they have tested it on GCP, Microsoft Azure, AWS, and Red Hat OpenShift.

They offer a fully functional free trial of ionir storage, only capped at the number of data nodes in use. So, if you only need a small amount of storage (ok 3 data nodes with 24 14TB SSDs each make for large amount of storage) for your K8s environment, you can probably run forever on the free version.

Tad Lebeck, US CTO, ionir

Tad Lebeck is a global technology executive with over two decades of experience in startups and large vendors. Prior to ionir, he founded and led Nuvoloso, an innovator in Kubernetes data services. Earlier, Lebeck served as CTO at Huawei Symantec Technologies, Vice President at Symantec/Veritas, co-founder/CTO at Invio, and CTO at Legato Systems, where he helped create the modern enterprise data-protection market.

Tad was a founding member of the SNIA Technical Council. He earned an MS/CS from the University of Wisconsin, and a combined MBA from the Columbia, London, and HKU Schools of Business.

124: GreyBeards talk k8s storage orchestration using CNCF Rook Project with Sébastien Han & Travis Nielsen, Red Hat

Stateful containers are becoming a hot topic these days so we thought it a good time to talk to the CNCF (Cloud Native Computing Foundation) Rook team about what they are doing to make storage easier to use for k8s container apps. CNCF put us into contact with Sébastien Han (@leseb_), Ceph Storage Architect and Travis Nielsen (@STravisNielsen), both Principal Software Engineers at Red Hat and active on the Rook project. Rook is a CNCF “graduated” open source project just like Kubernetes, Prometheus, ContainerD, etc., this means it’s mature enough to run production workloads.

Rook is used to configure, deploy and manage a Red Hat Ceph(r) Storage cluster under k8s. Rook creates all the k8s deployment scripts to set up a Ceph Storage cluster as containers, start it and monitor its activities. Rook monitoring of Ceph operations can restart any Ceph service container or scale any Ceph services up/down as needed by container apps using its storage. Rook is not in the Ceph data path, but rather provides a k8s based Ceph control or management plane for running Ceph storage under k8s.

Readers may recall we talked to SoftIron, an appliance provider, for Ceph Storage in the enterprise for our 120th episode. Rook has another take on using Ceph storage, only this time running it under k8s,. Listen to the podcast to learn more.

The main problem Rook is solving is how to easily incorporate storage services and stateful container apps within k8s control. Containerized apps can scale up or down based on activity and storage these apps use needs the same capabilities. The other option is to have storage that stands apart or outside k8s cluster and control. But then tho container apps and their storage have 2 (maybe more) different control environments. Better to have everything under k8s control or nothing at all.

Red Hat Ceph storage has been available as a standalone storage solutions for a long time now and has quite the extensive customer list, many with multiple PB of storage. Rook-Ceph and all of its components run as containers underneath k8s.

Ceph supports replication (mirroring) of data 1 to N ways typically 3 way or erasure coding for data protection and also supports file, block and object protocols or access methods. Ceph normally consumes raw block DAS for it’s backend but Ceph can also support a file gateway to NFS storage behind it. Similarly, Ceph can offers an object storage gateway option. But with either of these approaches, the (NFS or object) storage exists outside k8s scaling and resiliency capabilities and Rook management.

Ceph uses storage pools that can be defined using storage performance levels, storage data protection levels, system affinity, or any combination of the above. Ceph storage pools are mapped to k8s storage classes using the Ceph CSI. Container apps that want to use storage would issue a persistent volume claim (PVC) request specifying a Ceph storage class which would allocate the Ceph storage from the pool to the container.  

Besides configuring, deploying and monitoring/managing your Ceph storage cluster, Rook can also automatically upgrade your Ceph cluster for you. 

We discussed the difference between running Rook-Ceph within k8s and running Ceph outside k8s. Both approaches depend on Ceph CSI but with Rook, Ceph and all its software is all running under k8s control as containers and Rook manages the Ceph cluster for you. When it’s run outside 1) you manage the Ceph cluster and 2) Ceph storage scaling and resilience are not automatic. 

Sébastien Han, Principal Software Engineer, Ceph Architect, Red Hat

Sebastien Han currently serves as a Senior Principal Software Engineer, Storage Architect for Red Hat. He has been involved with Ceph Storage since 2011 and has built strong expertise around it.

Curious and passionate, he loves working on bleeding edge technologies and identifying opportunities where Ceph can enhance the user experience. He did that with various technology such as OpenStack, Docker.

Now on a daily basis, he rotates between Ceph, Kubernetes, and Rook in an effort to strengthen the integration between all three. He is one of the maintainers of Rook-Ceph.

Travis Nielson, Principal Software Engineer, Red Hat

Travis Nielsen is a Senior Principal Software Engineer at Red Hat with the Ceph distributed storage system team. Travis leads the Rook project and is one of the original maintainers, integrating Ceph storage with Kubernetes.

Prior to Rook, Travis was the storage platform tech lead at Symform, a P2P storage startup, and an engineering lead for the Windows Server group at Microsoft.

122: GreyBeards talk big data archive with Floyd Christofferson, CEO StrongBox Data Solutions

The GreyBeards had a great discussion with Floyd Christofferson, CEO, StrongBox Data Solutions on their big data/HPC file and archive solution. Floyd’s is very knowledgeable on problems of extremely large data repositories and has been around the HPC and other data intensive industries for decades.

StrongBox’s StrongLink solution offers a global namespace file system that virtualizes NFS, SMB, S3 and Posix file environments and maps this to a software-only, multi-tier, multi-site data repository that can span onsite flash, disk, S3 compatible or Azure object and LTFS tape iibrary storage as well as offsite versions of all the above tiers.

Typical StrongLink customers range in the 10s to 100s of PB, and ingesting or processing PBs a day. 200TB is a minimum StrongLink configuration, but Floyd said any shop with over 500TB has problems with data silos and other issues, but may not understand it yet. StrongLink manages data placement and movement, throughout this hierarchy to better support data access and economical storage. In the process StrongLink eliminates any data silos due to limitations of NAS systems while providing the most economic placement of data to meet user performance requirements.

Floyd said that StrongLink first installs in customer environment and then operates in the background to discover and ingest metadata from the primary customers file storage environment. Some point later the customer reconfigures their end-users share and mount points to StrongLink servers and it’s up and starts running.

The minimal StrongLink, HA environment consists of 3 nodes. They use a NoSQL metadata database which is replicated and sharded across the nodes. It’s shared for performance load balancing and fully replicated (2-way or 3-way) across all the StrongLink server nodes for HA.

The StrongLink nodes create a cluster, called a star in StrongBox vernacular. Multiple clusters onsite can be grouped together to form a StrongLink constellation. And multiple data center sites, can be grouped together to form a StrongLink galaxy. Presumably if you have a constellation or a galaxy, the same metadata is available to all the star clusters across all the sites.

They support any tape library and any NFS, SMB, S3 orAzure compatible object or file storage. Stronglink can move or copy data from one tier/cluster to another based on policies AND the end-users never sees any difference in their workflow or mount/share points.

One challenge with typical tape archives is that they can make use of proprietary tape data formats which are not accessible outside those systems. StrongLink has gone with a completely open-source, LTFS file format on tape, which is well documented and is available to anyone.

Floyd also made it a point of saying they don’t use any stubs, or soft links to provide their data placement magic. They only use standard file metadata.

File data moves across the hierarchy based on policies or by request. One of the secrets to StrongLink success is all the work they have done to ensure that any data movement can occur at line rate speeds. They heavily parallelize any data movement that’s required to support data placement across as many servers as the customer wants to throw at it. StrongBox services will help right-size the customer deployment to support any data movement performance that is required.

StrongLink supports up to 3-way replication of a customer’s data archives. This supports a primary archive and 3 more replicas of data.

Floyd mentioned a couple of big customers:

  • One autonomous automobile supplier, was downloading 2PB of data from cars in the field, processing this data and then moving it off their servers to get ready for the next day’s data load.
  • Another weather science research organization, had 150PB of data in an old tape archive and they brought in StrongLink to migrate all this data off and onto LTFS tape format as well as support their research activities which entail staging a significant chunk of file data on research servers to do a climate run/simulation.

NASA, another StrongLink customer, operates slightly differently than the above, in that they have integrated StrongLink functionality directly into their applications by making use of StrongBox’s API.

StrongLink can work in three ways.

  • Using normal file access services where StrongLink virtualizes your NFS, SMB, S3 or Posix file environment. For this service StrongLink is in the data path and you can use policy based management to have data moved or staged as the need arises.
  • Using StrongLink CLI to move or copy data from one tier to another. Many HPC customers use this approach through SLURM scripts or other orchestration solutions.
  • Using StrongLink API to move or copy data from one tier to another. This requires application changes to take advantage of data placement.

StrongBox customers can of course, use all three modes of operation, at the same time for their StrongLink data galaxy. StrongLink is billed by CPU/vCPU level and not for the amount of data customers throw into the archive. This has the effect of Customers gaining a flat expense cost, once StrongLink is deployed, at least until they decide to modify their server configuration.

Floyd Christofferson, CEO StrongBox Data Solutions

As a professional involved in content management and storage workflows for over 25 years, Floyd has focused on methods and technologies needed to manage massive volumes of data across many different storage types and use cases.

Prior to joining SBDS, Floyd worked with software and hardware companies in this space, including over 10 years at SGI, where he managed storage and data management products. In that role, he was part of the team that provided solutions used in some of the largest data environments around the world.

Floyd’s background includes work at CBS Television Distribution, where he helped implement file-based content management and syndicated content distribution strategies, and Pathfire (now ExtremeReach), where he led the team that developed and implemented a satellite-based IP-multicast content distribution platform that manages delivery of syndicated content to nearly 1,000 TV stations throughout the US.

Earlier in his career, he ran Potomac Television, a news syndication and production service in Washington DC, and Manhattan Center Studios, an audio, video, graphics, and performance facility in New York.

120: GreyBeards talk CEPH storage with Phil Straw, Co-Founder & CEO, SoftIron

GreyBeards talk universal CEPH storage solutions with Phil Straw (@SoftIronCEO), CEO of SoftIron. Phil’s been around IT and electronics technology for a long time and has gone from scuba diving electronics, to DARPA/DOD researcher, to networking, and is now doing storage. He’s also their former CTO and co-founder of the company. SoftIron make hardware storage appliances for CEPH, an open source, software defined storage system.

CEPH storage includes file (CEPHFS, POSIX), object (S3) and block (RBD, RADOS block device, Kernel/librbd) services and has been out since 2006. CEPH storage also offers redundancy, mirroring, encryption, thin provisioning, snapshots, and a host of other storage options. CEPH is available as an open source solution, downloadable at CEPH.io, but it’s also offered as a licensed option from RedHat, SUSE and others. For SoftIron, it’s bundled into their HyperDrive storage appliances. Listen to the podcast to learn more.

SoftIron uses the open source version of CEPH and incorporates this into their own, HyperDrive storage appliances, purpose built to support CEPH storage.

There are two challenges to using open source solutions:

  • Support is generally non-existent. Yes, the open source community behind the (CEPH) project supplies bug fixes and can possibly answer some questions but this is not considered enterprise support where customers require 7x24x365 support for a product
  • Useability is typically abysmal. Yes, open source systems can do anything that anyone could possibly want (if not, code it yourself), but trying to figure out how to use any of that often requires a PHD or two.

SoftIron has taken both of these on to offer a CEPH commercial product offering.

Take support, SoftIron offers enterprise level support that customers can contract for on their own, even if they don’t use SoftIron hardware. Phil said the would often get kudos for their expert support of CEPH and have often been requested to offer this as a standalone CEPH service. Needless to say their support of SoftIron appliances is also excellent.

As for ease of operations, SoftIron makes the HyperDrive Storage Manager appliance, which offers a standalone GUI, that takes the PHD out of managing CEPH. Anything one can do with the CEPH CLI can be done with SoftIron’s Storage Manager. It’s also a very popular offering with SoftIron customers. Similar to SoftIron’s CEPH support above, customers are requesting that their Storage Manager be offered as a standalone solution for CEPH users as well.

HyperDrive hardware appliances are storage media boxes that offer extremely low-power storage for CEPH. Their appliances range from high density (120TB/1U) to high performance NVMe SSDs (26TB/1U) to just about everything in between. On their website, I count 8 different storage appliance offerings with various spinning disk, hybrid (disk-SSD), SATA and NVMe SSDs (SSD only) systems.

SoftIron designs, develops and manufacturers all their own appliance hardware. Manufacturing is entirely in the US and design and development takes place in the US and Europe only. This provides a secure provenance for HyperDrive appliances that other storage companies can only dream about. Defense, intelligence and other security conscious organizations/industries are increasingly concerned about where electronic systems come from and want assurances that there are no security compromises inside them. SoftIron puts this concern to rest.

Yes they use CPUs, DRAMs and other standardized chips as well as storage media manufactured by others, but SoftIron has have gone out of their way to source all of these other parts and media from secure, trusted suppliers.

All other major storage companies use storage servers, shelves and media that come from anywhere, usually sourced from manufacturers anywhere in the world.

Moreover, such off the shelf hardware usually comes with added hardware that increases cost and complexity, such as graphics memory/interfaces, Cables, over configured power supplies, etc., but aren’t required for storage. Phil mentioned that each HyperDrive appliance has been reduced to just what’s required to support their CEPH storage appliance.

Each appliance has 6Tbps network that connects all the components, which means no cabling in the box. Also, each storage appliance has CPUs matched to its performance requirements, for low performance appliances – ARM cores, for high performance appliances – AMD EPYC CPUs. All HyperDrive appliances support wire speed IO, i.e, if a box is configured to support 1GbE or 100GbE, it transfers data at that speed, across all ports connected to it.

Because of their minimalist hardware design approach, HyperDrive appliances run much cooler and use less power than other storage appliances. They only consume 100W or 200W for high performance storage per appliance, where most other storage systems come in at around 1500W or more.

In fact, SoftIron HyperDrive boxes run so cold, that they don’t need fans for CPUs, they just redirect air flom from storage media over CPUs. And running colder, improves reliability of disk and SSD drives. Phil said they are seeing field results that are 2X better reliability than the drives normally see in the field.

They also offer a HyperDrive Storage Router that provides a NFS/SMB/iSCSI gateway to CEPH. With their Storage Router, customers using VMware, HyperV and other systems that depend on NFS/SMB/iSCSI for storage can just plug and play with SoftIron CEPH storage. With the Storage Router, the only storage interface HyperDrive appliances can’t support is FC.

Although we didn’t discuss this on the podcast, in addition to HyperDrive CEPH storage appliances, SoftIron also provides HyperCast, transcoding hardware designed for real time transcoding of one or more video streams and HyperSwitch networking hardware, which supplies a secure provenance, SONiC (Software for Open Networking in [the Azure] Cloud) SDN switch for 1GbE up to 100GbE networks.

Standing up PB of (CEPH) storage should always be this easy.

Phil Straw, Co-founder & CEO SoftIron

The technical visionary co-founder behind SoftIron, Phil Straw initially served as the company’s CTO before stepping into the role as CEO.

Previously Phil served as CEO of Heliox Technologies, co-founder and CTO of dotFX, VP of Engineering at Securify and worked in both technical and product roles at both Cisco and 3Com.

Phil holds a degree in Computer Science from UMIST.