131: GreyBeards talk native K8s data protection using Veritas NetBackup with Reneé Carlisle

The GreyBeards have been discussing K8s storage services a lot over the last year or so and it was time to understand how container apps and data could be protected. Recently, we saw an article about a Veritas funded survey, discussing the need for data protection in K8s. As such, it seemed a good time to have a talk with Reneé Carlisle (@VeritasTechLLC), Staff Product Manager for NetBackup (K8S), Veritas.

It turns out that Veritas NetBackup (NBU) has just released their 2nd version of K8s data protection. It’s gone completely (K8s) native. That is, Veritas have completely re-implemented all 3 tiers of NBU as K8s micro services. Moreover, the new release still supports all other NBU infrastructure implementations, such as bare metal or VM NBU primary server/media server services. It’s almost like you have all the data protection offered by NBU for the enterprise over the years, now also available for K8s container apps. Listen to the podcast to learn more.

To make use of NBU K8s, backup admins establish named gold, silver, bronze backup policies selecting frequency of backups, retention periods, backup storage, etc. Then DevOps would tag a namespace, pods, containers, or PVs with those data protection policy names. Once this is done, NBU K8S will start protecting that namespace, pod, container, or PV.

In addition, backup admins can include or exclude specific K8s namespace(s), pod(s), container(s), labels (tags), or PVs to be backed up with a specific policy. When that policy is triggered it will go out into the cluster to see if those K8s elements are active and start protecting them or excluding them from protection as requested.

NBU K8s has an Operator service, Data Mover services and other micro services that execute in the cluster. That is, at least one Operator service must be deployed in the cluster (recommended to be in a separate namespace but this is optional). The Operator service is the control plane for NBU K8S services. It will spin up data movers when needed and spin them down when done.

The Operator service supports a CLI but more importantly to DevOps, a complete implemented RESTful API service. Turns out the CLI is implemented ontop of the NBU (Operator) API. With the NBU API DevOps CI/CD tools or other automation can perform all the data protection services to protect K8s.

One historical issue with backup processing is that it can consume every ounce of network/storage and sometimes compute power in an environment. The enterprise class data movers (or maybe the Operator control plane) has various mechanisms to constrain or limit NBU K8S resource consumption so that this doesn’t become a problem.

But as the Operator and its Data Mover are just micro services, if there’s need for more throughput, more can be spun up or if there’s a need to reduce bandwidth, some of them can be spun down, all with no manual intervention whatsoever.

Furthermore, NBU K8s can be used to restore/recover PVs, containers, applications or namespaces to other, CNCF compliant K8s infrastructure. So, if you wanted to say, move your K8s namespace from AKS to GKE or onprem to RedHat OpenShift, it becomes a simple matter of moving the last NBU backup to the target environment, deploying NBU K8s in that environment and restoring the namespace.

NBU K8s can also operate in the cloud just as well as on prem and works in any CNCF compatible K8s environment which includes AKS, EKS, GKE, VMware Tanzu and OpenShift.

In the latest NBU K8s they implemented new, enterprise class Data Movers as micro services in order to more efficiently protect and recover K8S resources. Enterprise class Data Movers can perform virus-scanning/ransomware detection, encryption, data compression, and other services that enterprise customers have come to expect from NBU data protection.

NBU K8S accesses PV data, container, pod and namespace data and metadata using standard CSI storage provider and normal K8s API services.

As mentioned earlier, in the latest iteration of NBU K8s, they have completely implemented their NBU infrastructure, natively as containers. That adds, K8s auto-scaling, full CI/CD automation via APIs, to all the rest of NBU infrastructure operating completely in the K8s cluster.

So, now backup admins can run NBU completely in K8s or run just the Operator and its data mover services connecting to other NBU infrastructure (primary server and media servers) executing elsewhere in the data center.

NBU K8s supports all the various, disk, dedicated backup appliances, object/cloud storage or other backup media options that NBU uses. So that means you can store your K8s backup data on the cloud, in secondary storage appliances, or anyplace else that’s supported by NBU.

Licensing for NBU K8s follows the currently available Veritas licensing such as front end TB protected, subscription and term licensing options are available.

Reneé Carlisle, Staff Product Manager, Veritas NetBackup (K8S)

Reneé (LinkedIn) has been with Veritas Technologies for eleven years in various focus areas within the NetBackup Product Management Team.  In her current role she is the Product Manager responsible for the NetBackup strategic direction of Modern Platforms including Kubernetes and OpenStack.   She has a significant technical background into many of the NetBackup features including Kubernetes, virtualization, Accelerator, and cloud.  

Prior to working for Veritas, she was a customer running a large-scale NetBackup operation as well as a partner implementing, designing, and integrating NetBackup in many different companies.

128: GreyBeards talk containers, K8s, and object storage with AB Periasamy, Co-Founder&CEO MinIO

Sponsored by:

Once again Keith and I are talking K8s storage, only this time it was object storage. Anand Babu (AB) Periasamy, Co-founder and CEO of MinIO, has been on our show a couple of times now and its always an insightful discussion. He’s got an uncommon perspective on IT today and what needs to change.

Although MinIO is an open source, uber-compatible, S3 object store, AB more often talks like a revolutionary, touting the benefits of containerization, scale and automation with K8s. Object storage is just one of the vehicles to help get there. Listen to the podcast to learn more.

We started our discussion on the changing role of object storage in applications. Object storage started out as an archive solution. But then, over time, something happened, modern database startups adopted object storage to hold primary data, then analytics moved over to objects in a big way, and finally AI/ML came out with an unquenchable thirst for data and object storage was its only salvation.

Keith questioned the use of objects in analytics. Both AB and I pointed out that Splunk (and Spark) fully supported objects. But Keith said R (and Python) data scientists prefer to use protocols they learned in school, and these were all about (CSV, JPEGs, JSON) files. AB said what usually happens is this data is stored as object storage and then downloaded onto local disk as files to be processed. That’s not to say, that R or Python can’t process objects directly, but when they don’t, the ultimate source of data truth is object storage.

Somehow, we got onto the multi-cloud question. AB said the multi-cloud is really all about containers and K8s. When customers talk multi-cloud, what they really mean is they want applications that can run anywhere, in any cloud, on premise, or anyplace else for that matter.

I thought multi cloud was a DR solution. But AB reiterated it’s more a solution to vendor lock-in. What containerization gives IT is the option (ability) to run applications anywhere, but IT is not obligated to execute that option unless it makes sense

AB said that dev today doesn’t develop apps in the cloud anymore. They develop locally using minikube, once it’s working there they then add CI/CD tool chains and then move it to its final resting place (the cloud or wherever it ultimately needs to run). It turns out, containers, YAML files, scripts etc. are small and trivial to upload, migrate, or move to any internet location. And with ubiquitous K8s support available everywhere, they can move anywhere unchanged.

But where’s the data. AB said anywhere the app executes. It’s never moved, it takes too much time and effort to move this amount of data. But as applications move, any data it generates grows in that location over time.

We next turned to how MinIO was supported in K8s. AB mentioned they have a DirectPV CSI driver that creates a distributed PV to support MinIO services on local disks. In this way, containers needing access to MinIO S3 object storage can directly allocate data to user storage.

Then we asked about opinionated stacks. AB said most customers don’t want these. They may have some value in preserving an infrastructure environment but they’re better off transitioning to containerization and build any stack within those containers and the K8s cluster services.

On the other hand, MinIO object storage is available with the same S3 API, in bare metal, on VMware, OpenShift, K8s, every public cloud and most private clouds, as well. The advantage of the same, single storage interface, available everywhere can’t be beat.

MinIO recently closed a new funding round of $103M. AB mentioned they had new investments from Intel and Softbank, but I was more interested in plans he had for the new cash. And Keith asked where the new funding left MinIO with respect to its competitors in this space.

AB said it was never about the money, it was more about what you did with your team that mattered in the long run. AB’s imperative was to enter an existing market with a better product and succeed with that. Creating a new market plus a new product always cost more, takes longer and is riskier.

As for the new funds, there are really two ways to go: 1) improve the current product or 2) create a new one. My sense is that AB leans towards improving the current product.

For instance, MinIO is often asked to support a different object storage API. But AB’s perspective is that S3 was an early bet that paid off well by becoming the de facto standard for object storage. Supporting another API would divide his resources and probably make their current product worse not better. AB mentioned they are getting 1.1M downloads of their Docker container version so they seem to be succeeding well with the current product

Anand Babu (AB) Periasamy, Co-founder and CEO

AB Periasamy is the co-founder and CEO of MinIO, an open-source provider of high performance, object storage software. In addition to this role, AB is an active investor and advisor to a wide range of technology companies, from H2O.ai and Manetu where he serves on the board to advisor or investor roles with Humio, Isovalent, Starburst, Yugabyte, Tetrate, Postman, Storj, Procurify, and Helpshift. Successful exits include Gitter.im (Gitlab), Treasure Data (ARM) and Fastor (SMART).

AB co-founded Gluster in 2005 to commoditize scalable storage systems. As CTO, he was the primary architect and strategist for the development of the Gluster file system, a pioneer in software defined storage. After the company was acquired by Red Hat in 2011, AB joined Red Hat’s Office of the CTO. Prior to Gluster, AB was CTO of California Digital Corporation, where his work led to scaling of the commodity cluster computing to supercomputing class performance. His work there resulted in the development of Lawrence Livermore Laboratory’s “Thunder” code, which, at the time was the second fastest in the world.  

AB holds a Computer Science Engineering degree from Annamalai University, Tamil Nadu, India.

127: Annual year end wrap up podcast with Keith, Matt & Ray

[Ray’s sorry about his audio, it will be better next time he promises, The Eds] This was supposed to be the year where we killed off COVID for good. Alas, it was not to be and it’s going to be with us for some time to come. However, this didn’t stop that technical juggernaut we call the GreyBeards on Storage podcast.

Once again we got Keith, Matt and Ray together to discuss the past year’s top 3 technology trends that would most likely impact the year(s) ahead. Given our recent podcasts, Kubernetes (K8s) storage was top of the list. To this we add AI-MLops in the enterprise and continued our discussion from last year on how Covid & WFH are remaking the world, including offices, data centers and downtowns around the world. Listen to the podcast to learn more.

K8s rulz

For some reason, we spent many of this year’s podcasts discussing K8s storage. TK8s was never meant to provide (storage) state AND as a result, any K8s data storage has had to be shoe horned in.

Moreover, why would any IT group even consider containerizing enterprise applications let alone deploy these onto K8s. The most common answers seem to be automatic scalability, cloud like automation and run-anywhere portability.

Keith chimed in with enterprise applications aren’t going anywhere and we were off. Just like the mainframe, client-server and OpenStack applications before them, enterprise apps will likely outlive most developers, continuing to run on their current platforms forever.

But any new apps will likely be born, live a long life and eventually fade away on the latest runtime environment. which is K8s.

Matt mentioned hybrid and multi-cloud as becoming the reason-d’etre for enterprise apps to migrate to containers and K8s. Further, enterprises have pressing need to move their apps to the hybrid- & multi-cloud model. AWS’s recent hiccups, notwithstanding, multi-cloud’s time has come.

Ray and Keith then discussed which is bigger, K8s container apps or enterprise “normal” (meaning virtualized/bare metal) apps. But it all comes down to how you define bigger that matters, Sheer numbers of unique applications – enterprise wins, Compute power devoted to running those apps – it’s a much more difficult race to cal/l. But even Keith had to agree that based on compute power containerized apps are inching ahead.

AI-MLops coming on strong

AI /MLops in the enterprise was up next. For me the most significant indicator for heightened interest in AI-ML was VMware announced native support for NVIDIA management and orchestration AI-MLops technologies.

Just like K8s before it and VMware’s move to Tanzu and it’s predecessors, their move to natively support NVIDIA AI tools signals that the enterprise is starting to seriously consider adding AI to their apps.

We think VMware’s crystal ball is based on

  • Cloud rolling out more and more AI and MLops technologies for enterprises to use. on their infrastructure
  • GPUs are becoming more and more pervasive in enterprise AND in cloud infrastructure
  • Data to drive training and inferencing is coming out of the woodwork like never before.

We had some discussion as to where AMD and Intel will end up in this AI trend.. Consensus is that there’s still space for CPU inferencing and “some” specialized training which is unlikely to go away. And of course AMD has their own GPUs and Intel is coming out with their own shortly.

COVID & WFH impacts the world (again)

And then there was COVID and WFH. COVID will be here for some time to come. As a result, WFH is not going away, at least not totally any time soon. And is just becoming another way to do business.

WFH works well for some things (like IT office work) and not so well for others (K-12 education). If the GreyBeards were into (non-crypto) investing, we’d be shorting office real estate. What could move into those millions of square feet (meters) of downtime office space is anyones guess. But just like the factories of old, cities and downtowns in particular can take anything and make it useable for other purposes.

That’s about it, 2021 was another “interesteing” year for infrastructure technology. It just goes to show you, “May you live in interesting times” is actually an old (Chinese) curse.

Keith Townsend, (@TheCTOadvisor)

Keith is a IT thought leader who has written articles for many industry publications, interviewed many industry heavyweights, worked with Silicon Valley startups, and engineered cloud infrastructure for large government organizations. Keith is the co-founder of The CTO Advisor, blogs at Virtualized Geek, and can be found on LinkedIN.

Matt Leib, (@MBLeib)

Matt Leib has been blogging in the storage space for over 10 years, with work experience both on the engineering and presales/product marketing. His blog is at Virtually Tied to My Desktop and he’s on LinkedIN.

Ray Lucchesi, (@RayLucchesi)

Ray is the host and co-founder of GreyBeardsOnStorage and is President/Founder of Silverton Consulting, and a prominent (AI/storage/systems technology) blogger at RayOnStorage.com. Signup for SCI’s free, monthly industry e-newsletter here, published continuously since 2007. Ray can also be found on LinkedIn

126: GreyBeards talk k8s storage with Alex Chircop, CEO, Ondat

Keith and I had an interesting discussion with Alex Chircop (@chira001), CEO of Ondat, a kubernetes storage provider. They have a high performing system, laser focused on providing storage for k8s stateful container applications. Their storage is entirely containerized and has a number of advanced features for data availability, performance and security that developers need the run stateful container apps. Listen to the podcast to learn more.

We started by asking Alex how Ondats different from all the other k8s storage solutions out there today (which we’ve been talking with lately). He mentioned three crucial capabilities:

  • Ondat was developed from the ground up to run as k8s containers. Doing this would allow any k8s distribution to run their storage to support stateful container apps. .
  • Ondat was designed to allow developers to run any possible container app. Ondat supports both block as well as file storage volumes.
  • Ondat provides consistent, superior performance, at scale, with no compromises. Sophisticated data placement insures that data is located where it is consumed and their highly optimized data path provides low-latency access that data storage.

Ondat creates a data mesh (storage pool) out of all storage cluster nodes. Container volumes are carved out of this data mesh and at creation time, data and the apps that use them are co-located on the same cluster nodes.

At volume creation, Dev can specify the number of replicas (mirrors) to be maintained by the system. Alex mentioned that Ondat uses synchronous replication between replica clusters nodes to make sure that all active replica’s are up to date with the last IO that occurred to primary storage.

Ondat compresses all data that goes over the network as well as encrypts data in flight. Dev can easily specify that the data-at-rest also be compressed and/or encrypted. Compressing data in flight helps supply consistent performance where networks are shared.

Alex also mentioned that they support both the 1 reader/writer, k8s block storage volumes as well as multi-reader/multi-writer, k8s file storage volumes for containers.

In Ondat each storage volume includes a mini-brain used to determine primary and replica data placement. Ondat also uses desegregated consensus to decide what happens to primary and replica data after a k8s split cluster occurs. After a split cluster, isolated replica’s are invalidated and replicas are recreate, where possible, in the surviving nodes of the cluster portion that holds the primary copy of the data.

Also replica’s can optionally be located across AZs if available in your k8s cluster. Ondat doesn’t currentlysupport replication across k8s clusters.

Ondat storage works on any hyperscaler k8s solution as well as any onprem k8s system. I asked if Ondat supports VMware TKG and Alex said yes but when pushed mentioned that they have not tested it yet.

Keith asked what happens when things go south, i.e., an application starts to suffer worse performance. Alex said that Ondat supplies system telemetry to k8s logging systems which can be used to understand what’s going on. But he also mentioned they are working on a cloud based, Management-aaS offering, to provide multi-cluster operational views of Ondat storage in operation to help understand, isolate and fix problems like this.

Keith mentioned he had attended a talk by Google engineers that developed kubernetes and they said stateful containers don’t belong under kubernetes. So why are stateful containers becoming so ubiquitous now.

Alex said that may have been the case originally but k8s has come a long way from then and nowadays as many enterprises shift left enterprise applications from their old system environment to run as containers they all require state for processing. Having that stateful information or storage volumes accessible directly under k8s makes application re-implementation much easier.

What’s a typical Ondat configuration? Alex said there doesn’t appear to be one. Current Ondat deployments range from a few 100 to 1000s of k8s cluster nodes and 10 to 100s of TB of usable data storage.

Ondat has a simple pricing model, licensing costs are determined by the number of nodes in your k8s cluster. There’s different node pricing depending on deployment options but other than that it’s pretty straightforward.

Alex Chircop, CEO Ondat

Alex Chircop is the founder and CEO of Ondat (formerly StorageOS), which makes it possible to easily deploy and manage stateful Kubernetes applications with persistent data volumes. He also serves as co-chair of the CNCF (Cloud Native Computing Foundation) Storage Technical Advisory Group.

Alex comes from a technical background working in IT that includes more than 10 years with Nomura and Goldman Sachs.

125: GreyBeard talk K8s storage with Tad Lebeck, US CTO for ionir

We had some technical difficulties with Matt getting on the podcast so, Ray had to fly solo. This month we continue our investigations into K8s storage with a discussion with Tad Lebeck (@TadLebeck) US CTO, ionir, a software defined storage system that only runs under K8s. ionir Kubernetes Data Services platform is an outgrowth of Reduxio a “tin-wrapped” software defined storage system which pivoted to K8s as the environment to target and left the tin behind.

ionir offers a deduplicating, continuous data protection storage system for PVs (persistent volumes) under K8s that uses 3 way mirroring, across data nodes for data protection. Their solution offers a number of unique services that we haven’t seen in other K8s storage systems. Listen to the podcast to learn more.

Tad opened with a long spiel on what ionir is and we spent the next 40 minutes unpacking that to understand what exactly they were doing.

Let’s start with why stateful containers are all the rage these days. Tad had a slightly different rationale than we’ve heard before. From his perspective, it all comes from current enterprise applications that used database servers/machines. As these apps are re-factored to run as K8s containerized micro services, developers need and want their data be containerized right along with the application.

ionir constructs a block storage system across K8s data nodes or K8s worker nodes with direct attached storage. In the cloud, this storage can be ephemeral (storage that only exists as long as the compute instance operates) or normal block storage (e.g., EBS in AWS). It’s unclear how ephemeral works on-prem. But in any case, they cluster together a set of data nodes into one massive block storage and map PVs onto that. K8s data nodes can be added to the ionir cluster while it’s operating.

As mentioned earlier, they use 3-way mirroring for data protection and ionir insures the 3 copies are stored on different data nodes. As such, when one data node goes down, copies of PV data are available from the other 2 nodes and the data can then be rewritten elsewhere to insure 3-way mirroring continues. We suppose this means a minimum configuration requires at least 3 data nodes.

ionir also provides deduplicating block storage, which should theoretically reduce physical storage footprint for any PV. Data blocks are deduplicated across the cluster. ionir also has a metadata service (also 3 way replicated, to different data space) that records the manifest for all blocks associated with a PV, their hashes and (logical/physical) locations.

There was no mention of data compression or encryption so those are probably not present. We find deduplication very effective for backup storage but less effective for primary storage. Any deduplication ratio for ionir primary storage is likely specific to data being stored, i.e. columnar database, row database, text, office files, etc. Each of these would likely have different dedupe ratios for primary storage.

Furthermore, ionir supplies continuous data protection (CDP) for PV data. PV data written to ionir is immutable, i.e., never modified AND they keep previous versions of PV blocks in storage until they age out. This allows ionir to provide any prior version (well most recent ones) of a PV. ionic uses a timestamp to distinguish different PV versions. So, if ransomware attacked your site, users could ask for a PV version just prior to the time of the attack and you’d have that version of the PV to restart operations. Customer’s can limit how far back ionir saves prior versions of blocks for PVs.

Having CDP for PVs, makes DevOps qualification and testing significantly faster. Normally DevOps would need to copy production data to test environments in order to validate new app code. But ionir can easily instantiate a separate copy of any PV (at any time in their saved set) in a matter of seconds. This can take DevOps deployment testing down from days to minutes or less.

In addition, ionir can teleport PV data to other, remote K8s clusters running ionir. Essentially, this copies PV metadata and it’s “hot” blocks over to any remote ionir cluster. During teleportation, the remote cluster can access PV data as soon as all PV metadata has been copied. The remote site accesses this PV data from the originating cluster (albeit much slower than accesses within the cluster) while “hot” blocks are being copied. Any writes, at the remote site, to PV data would be considered new data, deduplicated at the remote site, and only available at the remote site. Somewhat surprisingly, all of the PV’s data is never copied to the remote system, leaving the PV in a permanent teleported access mode.

Not sure we like the implications of teleporting PVs, from a data integrity perspective. It does make for near-instant access to PV data from other clusters and offers a solution to data gravity (it takes forever to move TB of data across the web), it’s incomplete, as the data is never fully copied to the remote site. Once hot blocks have been copied, remote cluster PV access should run faster. But If there’s 20% of the requested blocks, not in the heat map, those IOs will take 100s mseclonger, depending on wire distance between the sites, to perform. And the write’s at the remote site cause the two copies (one at source site and one at remote site) of the PV to diverge.

Their storage system is priced on a per data node basis which makes it easy to price out their various deployment options. And it works on any K8s standard environment, although Tad admits they haven’t tested VMware Tanzu yet, but they have tested it on GCP, Microsoft Azure, AWS, and Red Hat OpenShift.

They offer a fully functional free trial of ionir storage, only capped at the number of data nodes in use. So, if you only need a small amount of storage (ok 3 data nodes with 24 14TB SSDs each make for large amount of storage) for your K8s environment, you can probably run forever on the free version.

Tad Lebeck, US CTO, ionir

Tad Lebeck is a global technology executive with over two decades of experience in startups and large vendors. Prior to ionir, he founded and led Nuvoloso, an innovator in Kubernetes data services. Earlier, Lebeck served as CTO at Huawei Symantec Technologies, Vice President at Symantec/Veritas, co-founder/CTO at Invio, and CTO at Legato Systems, where he helped create the modern enterprise data-protection market.

Tad was a founding member of the SNIA Technical Council. He earned an MS/CS from the University of Wisconsin, and a combined MBA from the Columbia, London, and HKU Schools of Business.