131: GreyBeards talk native K8s data protection using Veritas NetBackup with Reneé Carlisle

The GreyBeards have been discussing K8s storage services a lot over the last year or so and it was time to understand how container apps and data could be protected. Recently, we saw an article about a Veritas funded survey, discussing the need for data protection in K8s. As such, it seemed a good time to have a talk with Reneé Carlisle (@VeritasTechLLC), Staff Product Manager for NetBackup (K8S), Veritas.

It turns out that Veritas NetBackup (NBU) has just released their 2nd version of K8s data protection. It’s gone completely (K8s) native. That is, Veritas have completely re-implemented all 3 tiers of NBU as K8s micro services. Moreover, the new release still supports all other NBU infrastructure implementations, such as bare metal or VM NBU primary server/media server services. It’s almost like you have all the data protection offered by NBU for the enterprise over the years, now also available for K8s container apps. Listen to the podcast to learn more.

To make use of NBU K8s, backup admins establish named gold, silver, bronze backup policies selecting frequency of backups, retention periods, backup storage, etc. Then DevOps would tag a namespace, pods, containers, or PVs with those data protection policy names. Once this is done, NBU K8S will start protecting that namespace, pod, container, or PV.

In addition, backup admins can include or exclude specific K8s namespace(s), pod(s), container(s), labels (tags), or PVs to be backed up with a specific policy. When that policy is triggered it will go out into the cluster to see if those K8s elements are active and start protecting them or excluding them from protection as requested.

NBU K8s has an Operator service, Data Mover services and other micro services that execute in the cluster. That is, at least one Operator service must be deployed in the cluster (recommended to be in a separate namespace but this is optional). The Operator service is the control plane for NBU K8S services. It will spin up data movers when needed and spin them down when done.

The Operator service supports a CLI but more importantly to DevOps, a complete implemented RESTful API service. Turns out the CLI is implemented ontop of the NBU (Operator) API. With the NBU API DevOps CI/CD tools or other automation can perform all the data protection services to protect K8s.

One historical issue with backup processing is that it can consume every ounce of network/storage and sometimes compute power in an environment. The enterprise class data movers (or maybe the Operator control plane) has various mechanisms to constrain or limit NBU K8S resource consumption so that this doesn’t become a problem.

But as the Operator and its Data Mover are just micro services, if there’s need for more throughput, more can be spun up or if there’s a need to reduce bandwidth, some of them can be spun down, all with no manual intervention whatsoever.

Furthermore, NBU K8s can be used to restore/recover PVs, containers, applications or namespaces to other, CNCF compliant K8s infrastructure. So, if you wanted to say, move your K8s namespace from AKS to GKE or onprem to RedHat OpenShift, it becomes a simple matter of moving the last NBU backup to the target environment, deploying NBU K8s in that environment and restoring the namespace.

NBU K8s can also operate in the cloud just as well as on prem and works in any CNCF compatible K8s environment which includes AKS, EKS, GKE, VMware Tanzu and OpenShift.

In the latest NBU K8s they implemented new, enterprise class Data Movers as micro services in order to more efficiently protect and recover K8S resources. Enterprise class Data Movers can perform virus-scanning/ransomware detection, encryption, data compression, and other services that enterprise customers have come to expect from NBU data protection.

NBU K8S accesses PV data, container, pod and namespace data and metadata using standard CSI storage provider and normal K8s API services.

As mentioned earlier, in the latest iteration of NBU K8s, they have completely implemented their NBU infrastructure, natively as containers. That adds, K8s auto-scaling, full CI/CD automation via APIs, to all the rest of NBU infrastructure operating completely in the K8s cluster.

So, now backup admins can run NBU completely in K8s or run just the Operator and its data mover services connecting to other NBU infrastructure (primary server and media servers) executing elsewhere in the data center.

NBU K8s supports all the various, disk, dedicated backup appliances, object/cloud storage or other backup media options that NBU uses. So that means you can store your K8s backup data on the cloud, in secondary storage appliances, or anyplace else that’s supported by NBU.

Licensing for NBU K8s follows the currently available Veritas licensing such as front end TB protected, subscription and term licensing options are available.

Reneé Carlisle, Staff Product Manager, Veritas NetBackup (K8S)

Reneé (LinkedIn) has been with Veritas Technologies for eleven years in various focus areas within the NetBackup Product Management Team.  In her current role she is the Product Manager responsible for the NetBackup strategic direction of Modern Platforms including Kubernetes and OpenStack.   She has a significant technical background into many of the NetBackup features including Kubernetes, virtualization, Accelerator, and cloud.  

Prior to working for Veritas, she was a customer running a large-scale NetBackup operation as well as a partner implementing, designing, and integrating NetBackup in many different companies.

128: GreyBeards talk containers, K8s, and object storage with AB Periasamy, Co-Founder&CEO MinIO

Sponsored by:

Once again Keith and I are talking K8s storage, only this time it was object storage. Anand Babu (AB) Periasamy, Co-founder and CEO of MinIO, has been on our show a couple of times now and its always an insightful discussion. He’s got an uncommon perspective on IT today and what needs to change.

Although MinIO is an open source, uber-compatible, S3 object store, AB more often talks like a revolutionary, touting the benefits of containerization, scale and automation with K8s. Object storage is just one of the vehicles to help get there. Listen to the podcast to learn more.

We started our discussion on the changing role of object storage in applications. Object storage started out as an archive solution. But then, over time, something happened, modern database startups adopted object storage to hold primary data, then analytics moved over to objects in a big way, and finally AI/ML came out with an unquenchable thirst for data and object storage was its only salvation.

Keith questioned the use of objects in analytics. Both AB and I pointed out that Splunk (and Spark) fully supported objects. But Keith said R (and Python) data scientists prefer to use protocols they learned in school, and these were all about (CSV, JPEGs, JSON) files. AB said what usually happens is this data is stored as object storage and then downloaded onto local disk as files to be processed. That’s not to say, that R or Python can’t process objects directly, but when they don’t, the ultimate source of data truth is object storage.

Somehow, we got onto the multi-cloud question. AB said the multi-cloud is really all about containers and K8s. When customers talk multi-cloud, what they really mean is they want applications that can run anywhere, in any cloud, on premise, or anyplace else for that matter.

I thought multi cloud was a DR solution. But AB reiterated it’s more a solution to vendor lock-in. What containerization gives IT is the option (ability) to run applications anywhere, but IT is not obligated to execute that option unless it makes sense

AB said that dev today doesn’t develop apps in the cloud anymore. They develop locally using minikube, once it’s working there they then add CI/CD tool chains and then move it to its final resting place (the cloud or wherever it ultimately needs to run). It turns out, containers, YAML files, scripts etc. are small and trivial to upload, migrate, or move to any internet location. And with ubiquitous K8s support available everywhere, they can move anywhere unchanged.

But where’s the data. AB said anywhere the app executes. It’s never moved, it takes too much time and effort to move this amount of data. But as applications move, any data it generates grows in that location over time.

We next turned to how MinIO was supported in K8s. AB mentioned they have a DirectPV CSI driver that creates a distributed PV to support MinIO services on local disks. In this way, containers needing access to MinIO S3 object storage can directly allocate data to user storage.

Then we asked about opinionated stacks. AB said most customers don’t want these. They may have some value in preserving an infrastructure environment but they’re better off transitioning to containerization and build any stack within those containers and the K8s cluster services.

On the other hand, MinIO object storage is available with the same S3 API, in bare metal, on VMware, OpenShift, K8s, every public cloud and most private clouds, as well. The advantage of the same, single storage interface, available everywhere can’t be beat.

MinIO recently closed a new funding round of $103M. AB mentioned they had new investments from Intel and Softbank, but I was more interested in plans he had for the new cash. And Keith asked where the new funding left MinIO with respect to its competitors in this space.

AB said it was never about the money, it was more about what you did with your team that mattered in the long run. AB’s imperative was to enter an existing market with a better product and succeed with that. Creating a new market plus a new product always cost more, takes longer and is riskier.

As for the new funds, there are really two ways to go: 1) improve the current product or 2) create a new one. My sense is that AB leans towards improving the current product.

For instance, MinIO is often asked to support a different object storage API. But AB’s perspective is that S3 was an early bet that paid off well by becoming the de facto standard for object storage. Supporting another API would divide his resources and probably make their current product worse not better. AB mentioned they are getting 1.1M downloads of their Docker container version so they seem to be succeeding well with the current product

Anand Babu (AB) Periasamy, Co-founder and CEO

AB Periasamy is the co-founder and CEO of MinIO, an open-source provider of high performance, object storage software. In addition to this role, AB is an active investor and advisor to a wide range of technology companies, from H2O.ai and Manetu where he serves on the board to advisor or investor roles with Humio, Isovalent, Starburst, Yugabyte, Tetrate, Postman, Storj, Procurify, and Helpshift. Successful exits include Gitter.im (Gitlab), Treasure Data (ARM) and Fastor (SMART).

AB co-founded Gluster in 2005 to commoditize scalable storage systems. As CTO, he was the primary architect and strategist for the development of the Gluster file system, a pioneer in software defined storage. After the company was acquired by Red Hat in 2011, AB joined Red Hat’s Office of the CTO. Prior to Gluster, AB was CTO of California Digital Corporation, where his work led to scaling of the commodity cluster computing to supercomputing class performance. His work there resulted in the development of Lawrence Livermore Laboratory’s “Thunder” code, which, at the time was the second fastest in the world.  

AB holds a Computer Science Engineering degree from Annamalai University, Tamil Nadu, India.

127: Annual year end wrap up podcast with Keith, Matt & Ray

[Ray’s sorry about his audio, it will be better next time he promises, The Eds] This was supposed to be the year where we killed off COVID for good. Alas, it was not to be and it’s going to be with us for some time to come. However, this didn’t stop that technical juggernaut we call the GreyBeards on Storage podcast.

Once again we got Keith, Matt and Ray together to discuss the past year’s top 3 technology trends that would most likely impact the year(s) ahead. Given our recent podcasts, Kubernetes (K8s) storage was top of the list. To this we add AI-MLops in the enterprise and continued our discussion from last year on how Covid & WFH are remaking the world, including offices, data centers and downtowns around the world. Listen to the podcast to learn more.

K8s rulz

For some reason, we spent many of this year’s podcasts discussing K8s storage. TK8s was never meant to provide (storage) state AND as a result, any K8s data storage has had to be shoe horned in.

Moreover, why would any IT group even consider containerizing enterprise applications let alone deploy these onto K8s. The most common answers seem to be automatic scalability, cloud like automation and run-anywhere portability.

Keith chimed in with enterprise applications aren’t going anywhere and we were off. Just like the mainframe, client-server and OpenStack applications before them, enterprise apps will likely outlive most developers, continuing to run on their current platforms forever.

But any new apps will likely be born, live a long life and eventually fade away on the latest runtime environment. which is K8s.

Matt mentioned hybrid and multi-cloud as becoming the reason-d’etre for enterprise apps to migrate to containers and K8s. Further, enterprises have pressing need to move their apps to the hybrid- & multi-cloud model. AWS’s recent hiccups, notwithstanding, multi-cloud’s time has come.

Ray and Keith then discussed which is bigger, K8s container apps or enterprise “normal” (meaning virtualized/bare metal) apps. But it all comes down to how you define bigger that matters, Sheer numbers of unique applications – enterprise wins, Compute power devoted to running those apps – it’s a much more difficult race to cal/l. But even Keith had to agree that based on compute power containerized apps are inching ahead.

AI-MLops coming on strong

AI /MLops in the enterprise was up next. For me the most significant indicator for heightened interest in AI-ML was VMware announced native support for NVIDIA management and orchestration AI-MLops technologies.

Just like K8s before it and VMware’s move to Tanzu and it’s predecessors, their move to natively support NVIDIA AI tools signals that the enterprise is starting to seriously consider adding AI to their apps.

We think VMware’s crystal ball is based on

  • Cloud rolling out more and more AI and MLops technologies for enterprises to use. on their infrastructure
  • GPUs are becoming more and more pervasive in enterprise AND in cloud infrastructure
  • Data to drive training and inferencing is coming out of the woodwork like never before.

We had some discussion as to where AMD and Intel will end up in this AI trend.. Consensus is that there’s still space for CPU inferencing and “some” specialized training which is unlikely to go away. And of course AMD has their own GPUs and Intel is coming out with their own shortly.

COVID & WFH impacts the world (again)

And then there was COVID and WFH. COVID will be here for some time to come. As a result, WFH is not going away, at least not totally any time soon. And is just becoming another way to do business.

WFH works well for some things (like IT office work) and not so well for others (K-12 education). If the GreyBeards were into (non-crypto) investing, we’d be shorting office real estate. What could move into those millions of square feet (meters) of downtime office space is anyones guess. But just like the factories of old, cities and downtowns in particular can take anything and make it useable for other purposes.

That’s about it, 2021 was another “interesteing” year for infrastructure technology. It just goes to show you, “May you live in interesting times” is actually an old (Chinese) curse.

Keith Townsend, (@TheCTOadvisor)

Keith is a IT thought leader who has written articles for many industry publications, interviewed many industry heavyweights, worked with Silicon Valley startups, and engineered cloud infrastructure for large government organizations. Keith is the co-founder of The CTO Advisor, blogs at Virtualized Geek, and can be found on LinkedIN.

Matt Leib, (@MBLeib)

Matt Leib has been blogging in the storage space for over 10 years, with work experience both on the engineering and presales/product marketing. His blog is at Virtually Tied to My Desktop and he’s on LinkedIN.

Ray Lucchesi, (@RayLucchesi)

Ray is the host and co-founder of GreyBeardsOnStorage and is President/Founder of Silverton Consulting, and a prominent (AI/storage/systems technology) blogger at RayOnStorage.com. Signup for SCI’s free, monthly industry e-newsletter here, published continuously since 2007. Ray can also be found on LinkedIn

126: GreyBeards talk k8s storage with Alex Chircop, CEO, Ondat

Keith and I had an interesting discussion with Alex Chircop (@chira001), CEO of Ondat, a kubernetes storage provider. They have a high performing system, laser focused on providing storage for k8s stateful container applications. Their storage is entirely containerized and has a number of advanced features for data availability, performance and security that developers need the run stateful container apps. Listen to the podcast to learn more.

We started by asking Alex how Ondats different from all the other k8s storage solutions out there today (which we’ve been talking with lately). He mentioned three crucial capabilities:

  • Ondat was developed from the ground up to run as k8s containers. Doing this would allow any k8s distribution to run their storage to support stateful container apps. .
  • Ondat was designed to allow developers to run any possible container app. Ondat supports both block as well as file storage volumes.
  • Ondat provides consistent, superior performance, at scale, with no compromises. Sophisticated data placement insures that data is located where it is consumed and their highly optimized data path provides low-latency access that data storage.

Ondat creates a data mesh (storage pool) out of all storage cluster nodes. Container volumes are carved out of this data mesh and at creation time, data and the apps that use them are co-located on the same cluster nodes.

At volume creation, Dev can specify the number of replicas (mirrors) to be maintained by the system. Alex mentioned that Ondat uses synchronous replication between replica clusters nodes to make sure that all active replica’s are up to date with the last IO that occurred to primary storage.

Ondat compresses all data that goes over the network as well as encrypts data in flight. Dev can easily specify that the data-at-rest also be compressed and/or encrypted. Compressing data in flight helps supply consistent performance where networks are shared.

Alex also mentioned that they support both the 1 reader/writer, k8s block storage volumes as well as multi-reader/multi-writer, k8s file storage volumes for containers.

In Ondat each storage volume includes a mini-brain used to determine primary and replica data placement. Ondat also uses desegregated consensus to decide what happens to primary and replica data after a k8s split cluster occurs. After a split cluster, isolated replica’s are invalidated and replicas are recreate, where possible, in the surviving nodes of the cluster portion that holds the primary copy of the data.

Also replica’s can optionally be located across AZs if available in your k8s cluster. Ondat doesn’t currentlysupport replication across k8s clusters.

Ondat storage works on any hyperscaler k8s solution as well as any onprem k8s system. I asked if Ondat supports VMware TKG and Alex said yes but when pushed mentioned that they have not tested it yet.

Keith asked what happens when things go south, i.e., an application starts to suffer worse performance. Alex said that Ondat supplies system telemetry to k8s logging systems which can be used to understand what’s going on. But he also mentioned they are working on a cloud based, Management-aaS offering, to provide multi-cluster operational views of Ondat storage in operation to help understand, isolate and fix problems like this.

Keith mentioned he had attended a talk by Google engineers that developed kubernetes and they said stateful containers don’t belong under kubernetes. So why are stateful containers becoming so ubiquitous now.

Alex said that may have been the case originally but k8s has come a long way from then and nowadays as many enterprises shift left enterprise applications from their old system environment to run as containers they all require state for processing. Having that stateful information or storage volumes accessible directly under k8s makes application re-implementation much easier.

What’s a typical Ondat configuration? Alex said there doesn’t appear to be one. Current Ondat deployments range from a few 100 to 1000s of k8s cluster nodes and 10 to 100s of TB of usable data storage.

Ondat has a simple pricing model, licensing costs are determined by the number of nodes in your k8s cluster. There’s different node pricing depending on deployment options but other than that it’s pretty straightforward.

Alex Chircop, CEO Ondat

Alex Chircop is the founder and CEO of Ondat (formerly StorageOS), which makes it possible to easily deploy and manage stateful Kubernetes applications with persistent data volumes. He also serves as co-chair of the CNCF (Cloud Native Computing Foundation) Storage Technical Advisory Group.

Alex comes from a technical background working in IT that includes more than 10 years with Nomura and Goldman Sachs.

124: GreyBeards talk k8s storage orchestration using CNCF Rook Project with Sébastien Han & Travis Nielsen, Red Hat

Stateful containers are becoming a hot topic these days so we thought it a good time to talk to the CNCF (Cloud Native Computing Foundation) Rook team about what they are doing to make storage easier to use for k8s container apps. CNCF put us into contact with Sébastien Han (@leseb_), Ceph Storage Architect and Travis Nielsen (@STravisNielsen), both Principal Software Engineers at Red Hat and active on the Rook project. Rook is a CNCF “graduated” open source project just like Kubernetes, Prometheus, ContainerD, etc., this means it’s mature enough to run production workloads.

Rook is used to configure, deploy and manage a Red Hat Ceph(r) Storage cluster under k8s. Rook creates all the k8s deployment scripts to set up a Ceph Storage cluster as containers, start it and monitor its activities. Rook monitoring of Ceph operations can restart any Ceph service container or scale any Ceph services up/down as needed by container apps using its storage. Rook is not in the Ceph data path, but rather provides a k8s based Ceph control or management plane for running Ceph storage under k8s.

Readers may recall we talked to SoftIron, an appliance provider, for Ceph Storage in the enterprise for our 120th episode. Rook has another take on using Ceph storage, only this time running it under k8s,. Listen to the podcast to learn more.

The main problem Rook is solving is how to easily incorporate storage services and stateful container apps within k8s control. Containerized apps can scale up or down based on activity and storage these apps use needs the same capabilities. The other option is to have storage that stands apart or outside k8s cluster and control. But then tho container apps and their storage have 2 (maybe more) different control environments. Better to have everything under k8s control or nothing at all.

Red Hat Ceph storage has been available as a standalone storage solutions for a long time now and has quite the extensive customer list, many with multiple PB of storage. Rook-Ceph and all of its components run as containers underneath k8s.

Ceph supports replication (mirroring) of data 1 to N ways typically 3 way or erasure coding for data protection and also supports file, block and object protocols or access methods. Ceph normally consumes raw block DAS for it’s backend but Ceph can also support a file gateway to NFS storage behind it. Similarly, Ceph can offers an object storage gateway option. But with either of these approaches, the (NFS or object) storage exists outside k8s scaling and resiliency capabilities and Rook management.

Ceph uses storage pools that can be defined using storage performance levels, storage data protection levels, system affinity, or any combination of the above. Ceph storage pools are mapped to k8s storage classes using the Ceph CSI. Container apps that want to use storage would issue a persistent volume claim (PVC) request specifying a Ceph storage class which would allocate the Ceph storage from the pool to the container.  

Besides configuring, deploying and monitoring/managing your Ceph storage cluster, Rook can also automatically upgrade your Ceph cluster for you. 

We discussed the difference between running Rook-Ceph within k8s and running Ceph outside k8s. Both approaches depend on Ceph CSI but with Rook, Ceph and all its software is all running under k8s control as containers and Rook manages the Ceph cluster for you. When it’s run outside 1) you manage the Ceph cluster and 2) Ceph storage scaling and resilience are not automatic. 

Sébastien Han, Principal Software Engineer, Ceph Architect, Red Hat

Sebastien Han currently serves as a Senior Principal Software Engineer, Storage Architect for Red Hat. He has been involved with Ceph Storage since 2011 and has built strong expertise around it.

Curious and passionate, he loves working on bleeding edge technologies and identifying opportunities where Ceph can enhance the user experience. He did that with various technology such as OpenStack, Docker.

Now on a daily basis, he rotates between Ceph, Kubernetes, and Rook in an effort to strengthen the integration between all three. He is one of the maintainers of Rook-Ceph.

Travis Nielson, Principal Software Engineer, Red Hat

Travis Nielsen is a Senior Principal Software Engineer at Red Hat with the Ceph distributed storage system team. Travis leads the Rook project and is one of the original maintainers, integrating Ceph storage with Kubernetes.

Prior to Rook, Travis was the storage platform tech lead at Symform, a P2P storage startup, and an engineering lead for the Windows Server group at Microsoft.