117: GreyBeards talk HPC file systems with Frank Herold, CEO of ThinkParQ, makers of BeeGFS

We return back to our storage thread with a discussion of HPC file systems with Frank Herold, (@BeeGFS) CEO of ThinkParQ GmbH, the makers of BeeGFS. I’ve seen BeeGFS start to show up in some IO500 top storage benchmark results and as more and more data keeps coming online every day, we thought it time to start finding out how our friends in the HPC world handle their data deluge.

Frank’s a former rocket scientist, that’s been in and around the storage industry for years, and was very knowledgeable about BeeGFS’s software defined, parallel file system. He seemed to have a great grasp of the IO requirements in HPC, Life Sciences and other HPC-like applications. Listen to the podcast to learn more.

Turns out that ThinkParQ is a spinoff of the research institute in Germany that originally developed BeeGFS parallel file system. There are apparently two version of their product one which is publicly available (downloadable from their website) and another with commercial support. It’s not quite 100% open source but it’s got a lot of open source in it and their GIT repository is available

BeeGFS was primarily focused on HPC workloads but as this type of work has become more mainstream, they have moved beyond HPC and now have significant installations in Life Sciences, Oil&Gas and many other big data environments.

It runs on x86/AMD, OpenPower, and ARM CPUs. BeeGFS comes as a number of services, one of which is a storage service which uses a backend with ZFS or XFS file system. It also uses (POSIX compliant) host client software to access their system. There’s also a metadata and monitoring service. Most of the time these services run on separate servers but BeeGFS also supports a “converged mode”, where all these services run on a single server. And you can have multiple converged mode servers in a cluster.

BeeGFS is a parallel file system. This means that it intrinsically supports multiple metadata services/servers and multiple storage servers which allow it to scale up storage bandwidth and performance considerably beyond single appliance systems. Data is automatically distributed across all the storage servers in the configuration, unless you specify that data reside on specific, say all flash storage servers. Similarly, metadata is automatically distributed across all metadata servers in the system.

They don’t support any specific RAID protection other than mirroring and that really to speed up read throughput. Rather they depend on the underlying XFS/ZFS file system to provide drive failure protection (RAID5/6).

One of BeeGFS’s selling points is that it has few tuning parameters that a customer needs to fiddle with. Frank said it runs quite well right out of the box.

BeeGFS offers a single name space that spans the cluster (of metadata servers/storage servers). But customers can elect to split this name space across a subset of these metadata and storage servers, and by doing so they create multiple BeeGFS clusters.

There’s no inherent support for NFS or SMB but customers can configure NFS or SAMBA servers that use BeeGFS as backend storage. Also, there’s no data reduction built into BeeGFS and no automatic data tiering across the backend storage (file systems).

But as noted above, customers can direct which backend storage to use to hold their data. And they do offer a CLI data movement primitive and customers can use this in conjunction with other software to implement storage tiering or do it themselves.

Metadata performance is extremely important for small files and for large multi Billion object file systems. BeeGFS uses extensive metadata caching to provide faster access to this information.

Speaking of small file performance, we had a decent discussion on the tradeoffs involved between small and large file performance. And although BeeGFS has decent small file performance it’s not a be all for every small file intensive application. According to Frank, not every small file workload is optimal for BeeGFS.

They offer BeeOND which is BeeGFS on demand. This is an integration with Slurm workload scheduler (HPC work scheduler) that allows customers to spin up a scratch BeeGFS parallel file system across compute servers with storage.

Slurm’s BeeOND integration brings all BeeGFS services up and deploys them on compute nodes you specify. At this point you have a fully installed BeeGFS (scratch) parallel file system. Customers may use this scratch file system to support any compute-data intensive workload theyneed to run. When no longer needed, Slurm can be directed to automatically dismantle the BeeGFSl file system.

We talked about BeeGFS partners. They have a number of regional partners that provide installation and onsite support and a number of technical partners, such as NetApp, Dell, HPE and INSPUR, that supply BeeGFS configured servers and systems for deployment/installation.

Frank Herold, CEO ThinkparQ

Frank Herold is the CEO of ThinkParQ GmbH – the company behind BeeGFS. He actively leads the company and the product strategy of BeeGFS as a global player for parallel high-performance file systems.

Prior to joining ThinkParQ, he held various senior management positions within ADIC and Quantum Corporation, responsible for market segments within the academic and scientific research, oil and gas, broadcast and video surveillance sectors, focusing on large scale, high-performance and enterprise accounts within EMEA. 

Frank has over 25 years of experience in the IT industry and holds a master’s degree in engineering (Dipl. -Ing.) in rocket science.

108: GreyBeards talk DNA storage with David Turek, CTO, Catalog DNA

The Greybeards get off the beaten (enterprise) path this month, to see what lies ahead with a discussion on DNA storage. David Turek, CTO, Catalog DNA (@CatalogDNA) is a long time IBMer that had been focused on HPC systems at IBM but left and went to Catalog DNA to pursue the commercialization of DNA storage, an “emerging” technology. CatalogDNA is a company out of Boston that had recently closed a round of funding and are focused on bringing DNA storage out into the world of IT.

David was a pleasure to talk and has lot’s of knowledge on HPC and enterprise data center solutions. He also has a good grasp of what it will take to bring DNA storage to market. Keith has had some prior experience with DNA technologies in BioPharma so could talk in more detail about the technology and its ecosystem. [We’re trying out a new format, let us know what you think; The Eds.]

Ray has written about DNA storage in his RayOnStorage Blog, most recently in April of this year and May of last year. It’s been an ongoing blog topic of his for almost a decade now. When Ray was interviewed about the technology he thought it interesting but had serious obstacles with read and write latencies and throughput as well as the size of the storage device.

Well CatalogDNA seems to have got a good handle on write throughput and are seriously working on the rest.

However, DNA storage’- volumetric density was always of exceptional. Early on in the podcast, David mentioned that DNA storage was 6 orders of magnitude (1 million times) more dense in bytes/mm**3 than magnetic tape today. An LTO8 tape device stores 12TB (uncompressed) in a tape cartridge, 14.2 in**3 (230.3 cm**3) or roughly 845GB/in**3 (52GB/cm**3). One million times this, would be 12EB in the same volume.

The challenge with LTO8, disk or SSD storage today is at some point you have to move the data from one device to a more modern device. This could be every 3-5 years (for disk or SSD) or 25-30 years for tape. In either case, at some point IT would need to incur the cost and time to move the data. Not much of a problem for 100TB or so but when you start talking PB or EB of data, it can be a never ending task.

DNA storage

David mentioned Catalog uses “synthetic DNA” in their storage. This means the DNA it uses is designed to be incompatible with natural DNA such that it wouldn’t work in a cell. It has stops or other biological mechanisms to inhibit it’s use in nature. Yes it uses the same sugars, backbones, and other chemistry of biologically active DNA, but it has been specifically modified to inhibit its use by normal cellular machinery. 

DNA storage has a number of unique capabilities :

  • It can be made to last forever, by being dried out (dessicated) and encased in a crystal and takes 0 power/energy to be stored for eons.
  • It can be cheaply and easily replicated, almost an infinite number of times, for only the cost of chemical feedstock, chemical interactions and energy. Yes, this may take time but the process scales up nicely. One could make 2 copies in first cycle, 4 in the 2nd, 8 in the 3rd, etc and by doing this it would only take 20 cycles to create a million copies. If each cycle takes 10 minutes, in 3:20, you could have a million copies of 1EB of data.
  • It can be easily searched for target information. This involves fabricating a DNA search molecule and inserting it into the storage solution. Once there it would match up with the DNA segment that held your key. And of course, the search molecule and the data could be replicated to speed up any search process.
  • We already mentioned the extreme density advantage above.

Speed of DNA storage access

David said they can already write Catalog DNA storage in MB/sec.

The process they use to write is like a conveyer belt which starts off with a polyethylene sheet (web actually). Somewhere, the digital data comes in, is chunked and transformed into DNA strand (25-50 base pairs) molecules or dots. The polyethylene sheet rolls into a machine that uses multiple 3D print heads to deposit dots (the DNA strand data chunks) at web points. This machine/process deposits 100K or more of these dots onto the web. The sheet then moves to the next stage where the DNA molecules are scraped off and drained into a solution. Then a wet process occurs which uses chemistry to make the DNA more readable and enables the separate DNA molecules to connect into a data strand. Then this data strand goes into another process where it gets reduced in volume and so that it is more stable.

If needed, one can add another step that dries out or desiccates the data strand into even a smaller volume which can then be embedded into a crystalline structure which could last for centuries.

David compared the DNA molecules (data chunks) to Legos, only they are the same pieces in a million different colors Each piece represents some segment of data bits/bytes. Using chemistry and proprietary IP each separate DNA molecule self organizes (connects) into a data strand, representing the information you want to store.

Reading DNA involves, off the shelf, DNA sequencers. The one Catalog currently uses is the Oxford NanoPore device, but there are others. David didn’t say how fast they could read DNA data. But current DNA reading devices destroy the data. So making replicas of the data would be required to read it.

David said their current write device is L shaped with one leg about 14’ (4.3m) long and the other about 12’ (3.7m) long with each leg being about 3’ (0.9m) wide.

Searching EB of data in minutes?!

DNA strands can be searched (matched) using a search molecule and inserting this into the storage solution (that holds the data strands). Such a molecule will find a place in the data that has a matching (DNA) data element and I believe attach itself to the data strand.

For example, lets say you had recorded all of a country’s emails for a month or so and you wanted to search them for the words, “bomb”, “terrorist”, “kill”, etc. One could create a set of search molecules, replicate them any number of times (depending on how quickly you wanted to search the data and how many matches you expected), and insert them into a data pool with multiple data strands that stored the email traffic.

After some time, you’d come back and your search would be done. You’d need to then extract the search hits, and read out the portion of the data strands (emails) that matched. I’m guessing extraction would involve some sort of (wet) chemical process or filtration.

State of Catalog DNA storage

David mentioned that as a publicity stunt they wrote the whole Wikipedia onto Catalog DNA storage. The whole Wikipedia fit into a cylinder about the height of a big knuckle on your hand and in a width smaller than a finger. The size of the whole Wikipedia, with complete edit history is 10TB uncompressed and if they stored all the edit versions plus its media such as images, videos, audio and other graphics, that would add another 23TB (as of end of 2014), so ~33TB uncompressed.

David believes in 18 months they could have a WORM (write once, read many times) data storage solution that could be deployed in customer data centers which would supply immense data repositories in relatively small solution containers.

CatalogDNA is currently in a number of PoCs with major corporations (not labs or universities) to show how DNA storage technology can be used to solve problems.

David believes that at some point they will be able to make compute engines entirely of DNA. At that point, one could have a combined compute and storage (HCI-like) DNA server using the same technology in a solution. And as mentioned previously, one could replicate from one DNA server & storage to a million DNA servers & storage in just 20 cycles. How’s that for scale out.


David Turek, CTO Catalog DNA

Dave Turek is Catalog’s Chief Technology Officer. He comes to Catalog from IBM where he held numerous executive positions in High Performance Computing and emerging technologies.

He was the development executive for the IBM SP program which produced the first commercially successful massively parallel system; he started IBM’s Linux Cluster business; launched an early offering in Cloud computing called Deep Computing Capacity on Demand; produced the Roadrunner system, the world’s first petascale computer; and was responsible for IBM’s exascale strategy which led to the deployment of the Summit and Sierra systems at Oak Ridge and Lawrence Livermore National Laboratories respectively.

David has been invited to testify to Congress on numerous occasions regarding the future of computing in the US and has helped establish technical collaborations with universities, businesses, and government agencies around the world.

This image has an empty alt attribute; its file name is Subscribe_on_iTunes_Badge_US-UK_110x40_0824.png
This image has an empty alt attribute; its file name is play_prism_hlock_2x-300x64.png
This image has an empty alt attribute; its file name is Spotify_Logo_CMYK_Black-1024x307.png


107: GreyBeards talk MinIO’s support of VMware’s new Data Persistence Platform with AB Periasamy, CEO MinIO

Sponsored by:

The GreyBeards have talked with Anand Babu (AB) Periasamy (@ABPeriasamy), CEO MinIO, before (see 097: GreyBeards talk open source S3… episode). And we also saw him earlier this year, at their headquarters for Storage Field Day 19 (SFD19) where AB gave a great discussion of what they were doing and how it worked (see MinIO’s SFD18 presentation videos).

The podcast runs ~26 minutes. AB is very technically astute and always a delight to talk with. He’s extremely knowledgeable about the cloud, containerized applications and high performing S3 compatible object storage. And now with MinIO and vSAN Data Persistence under VCF Tanzu, very knowledgeable about the virtualized IT environment as well. Listen to the podcast to learn more. [We’re trying out a new format placing the podcast up front. Let us know what you think; The Eds.]


VMware VCF vSAN Data Persistence Platform with MinIO

Earlier this month VMware announced a new capability available with the next updates of vSAN, vSphere & VCF called the vSAN Data Persistence Platform. The Data Persistence Platform is a VMware framework designed to integrate stateful, independent vendor software defined storage services in vSphere. By doing so, VCF can provide API access to persistent storage services for containerized applications running under Tanzu Kubernetes (k8s) Grid service clusters.

At the announcement, VMware identified three object storage and one (Cassandra) database technical partners that had been integrated with the solution.  MinIO was an object storage, open source partner.

VMware’s VCF vSAN Data Persistence framework allows vCenter administrators to use vSphere cluster infrastructure to configure and deploy these new stateful storage services, like MinIO, into namespaces and enables app developers direct k8s API access to these storage namespaces to provide persistent, stateful object storage for applications. 

With VCF Tanzu and the vSAN Data Persistence Platform using MinIO, dev can have full support for their CiCd pipeline using native k8s tools to deploy and scale containerized apps on prem, in the public cloud and in hybrid cloud, all using VCF vSphere.

MinIO on the Data Persistence Platform

AB said MinIO with Data Persistence takes advantage of a new capability called vSAN Direct which gives vSAN almost JBOF types of IO control and performance. With MinIO vSAN Direct, storage and k8s cluster applications can co-reside on the same ESX node hardware so that IO activity doesn’t have to hop off host to be performed. In addition, can now populate ESX server nodes with lots (100s to 1000s?) of storage devices and be assured the storage will be used by applications running on that host.

As a result, MinIO’s object storage IO performance on VCF Tanzu is very good due to its use of vSAN Direct and MinIO’s inherent superior IO performance for S3 compatible object storage.

With MinIO on the VCF vSAN Data Persistence Platform, VMware takes over all the work of deploying MinIO software services on the VCF cluster. This way customers can take advantage of MiniO’s fully compatible S3 object storage system operating in their VCF cluster. For app developers they get the best of all worlds, infrastructure configured, deployed and managed by admins but completely controllable, scaleable and accessible through k8s API services.

If developers want to take advantage of MinIO specialized services such as data security or replication, they can do so directly using MinIOs APIs, just like they would when operating bare metal or in the cloud.

AB said the VMware development team was very responsive during development of Data Persistence. AB was surprised to see such a big company, like VMware, operate with almost startup like responsiveness. Keith mentioned he’s seen this in action as vSAN has matured very rapidly to a point of almost feature parity, with just about any storage system out there today .

With MinIO object storage, container applications that need PB of data, now have a home on VCF Tanzu. And it’s as easily usable as any public cloud storage. And with VCF Tanzu configuring and deploying the storage over its own infrastructure, and then having it all managed and administered by vCenter admins, its simple to create and use PB of object storage.

MinIO is already the most popular S3 compatible object storage provider for applications running in the cloud and on prem. And VMware is easily the most popular virtualization platform on the planet. Now with the two together on VCF Tanzu, there seems to be nothing in the way of conquering containerized applications running in IT as well.

With that, MinIO is available everywhere containers want to run, natively available in the cloud, on prem and hybrid cloud or running with VCF Tanzu everywhere as well.


AB Periasamy, CEO MinIO

AB Periasamy is the CEO and co-founder of MinIO. One of the leading thinkers and technologists in the open source software movement,

AB was a co-founder and CTO of GlusterFS which was acquired by RedHat in 2011. Following the acquisition, he served in the office of the CTO at RedHat prior to founding MinIO in late 2015.

AB is an active angel investor and serves on the board of H2O.ai and the Free Software Foundation of India.

He earned his BE in Computer Science and Engineering from Annamalai University.


This image has an empty alt attribute; its file name is Subscribe_on_iTunes_Badge_US-UK_110x40_0824.png
This image has an empty alt attribute; its file name is play_prism_hlock_2x-300x64.png
This image has an empty alt attribute; its file name is Spotify_Logo_CMYK_Black-1024x307.png


106: Greybeards talk Intel’s new HPC file system with Kelsey Prantis, Senior Software Eng. Manager, Intel

We had talked with Intel at Storage Field Day 20 (SFD20), about a month ago. At the virtual event, Intel’s focus was on their Optane PMEM (persistent memory) technology. Kelsey Prantis (@kelseyprantis), Senior Software Engineering Manager, Intel was on the show and gave an introduction into Intel’s DAOS (Distributed Architecture Object Storage, DAOS.io) a new HPC (high performance computing, super computers) file system they developed from scratch to use leading edge, Intel technologies, and Optane PMEM was one of them.

Kelsey has worked on LUSTRE and other HPC file systems for a long time now and came into the company from the acquisition of Whamcloud. Currently, she manages the development team working on DAOS. DAOS is a new HPC object storage file system which is completely open source (available on GitHub).

DAOS was designed from the start to take advantage of NVMe SSDs and Optane PMEM. With PMEM, current servers can support up to 20TB of memory. Besides the large memory sizes, Optane PMEM also offers non-volatile memory and byte addressability (just like DRAM). These two characteristics opens up new functionality that allows DAOS to move beyond legacy, block oriented, storage architectures that have been the only storage solution for HPC (and the enterprise) for decades now.

What’s different about DAOS

DAOS uses PMEM for all metadata and for storing small files. HPC IO has always focused on heavy bandwidth (IO using large blocks) oriented but lately newer applications have emerged, such as AI/ML/DL, data analytics and others, that use smaller files/blocks. Indeed, most new HPC clusters and supercomputers are deploying almost as many GPUs as CPUs in their configurations to support AI activities.

The problem is that these newer applications typically consume much smaller files. Matt mentioned one HPC client he worked with was processing small batches of seismic data, to predict, in real time, earthquakes that were happening around the world.

By using PMEM for metadata and small files, DAOS can be much more responsive to file requests (open, close, delete, status) as well as provide higher performing IO for small files. All this leads to a much better performing system for the new HPC workloads as well as great sustainable performance for the more traditional large file workloads.

DAOS storage

DAOS provides a cluster storage system that can be configured with from 1 (no data protection), but more normally 3 nodes (with data protection) at a minimum to 512 nodes (lab tested). Data protection in DAOS is currently based on mirroring data and can use from 0 to the number of nodes in a cluster as data mirrors.

DAOS system nodes are homogeneous. That is they all come with the same amount of PMEM and NVMe SSDs. Note, DAOS doesn’t support disk drives. Kelsey mentioned DAOS node hardware can be tailored to suit any particular application environment. But they typically require an average of 6% of overall DAOS system capacity in PMEM for metadata and small file activity.

DAOS current supports their own API, POSIX, HDFS5, MPIIO and Apache Spark storage protocols. Kelsey mentioned that standard POSIX uses a pessimistic conflict resolution mode which leads to performance bottlenecks during parallel access. In contrast, DAOS’s versos of POSIX uses optimistic conflict resolution, which means DAOS starts writes assuming there’s no conflict, but if one occurs it handles the conflict in real time. Of course with all the metadata byte addressable and in PMEM this doesn’t take up a lot of (IO) time.

As mentioned earlier, DAOS data protection uses mirror-replicas. However, unlike most other major file systems, DAOS mirroring can be done at the object level. DAOS internally is an object store. Data organization on DAOS starts at the pool level, underneath that is data containers, and then under that are objects. Any object in DAOS can have its own mirroring configuration. DAOS is working towards supporting Erasure Coding as another form of data protection for a future release.

DAOS performance

There’s a new storage benchmark that was developed specifically for HPC, called the IO500. The IO500 benchmark simulates a number of different HPC workloads, measures performance for each of them, and computes an (aggregate) performance score to rank HPC storage systems.

IO500 ranks system performance using two lists: one is for any sized configuration that typically range from 50 to 1000s of nodes and their other list limits the configuration to 10 nodes. The first performance ranking can sometimes be gamed by throwing more hardware into a cluster. The 10 node rankings are much harder to game this way and from our perspective, show a fairer comparison of system performance.

As presented (virtually) at ISC 2020, DAOS took the top spot on the IO500 any size configuration list and performed better than 2X the next best solution. And on the IO500 10 node list, Intel’s DAOS configuration, Texas Advanced Computing (TAC) DAOS configuration, and Argonne Nat Labs DAOS configuration took the top 3 spots and had 3X better performance than the next best, non-DAOS storage system.

The Argonne National Labs has already stated that they will be using DAOS in their new HPC system to be deployed in the near future. Early specifications for storage at the new Argonne Lab required support for 230PB of data and 25TB/sec of bandwidth.

The podcast ran ~43 minutes. Kelsey was great to talk with and very knowledgeable about HPC systems and HPC IO in particular. Matt has worked at Argonne in the past so understood these systems better than I. Sadly, we lost Matt’s end of the conversation about 1/2 way into the recording. Both Matt and I thought that DAOS represents the birth of a new generation of HPC storage. Listen to the podcast to learn more.


This image has an empty alt attribute; its file name is Spotify_Logo_CMYK_Black-1024x307.png

This image has an empty alt attribute; its file name is play_prism_hlock_2x-300x64.png
This image has an empty alt attribute; its file name is Subscribe_on_iTunes_Badge_US-UK_110x40_0824.png

Kelsey Prantis, Senior Software Engineering Manager, Intel

 Kelsey Prantis heads the Extreme Storage Architecture and Development division at Intel Corporation. She leads the development of Distributed Asynchronous Object Storage (DAOS), an open-source, low-latency and high IOPS object store designed from the ground up for massively distributed Non-Volatile Memory (NVM).

She joined Intel in 2012 with the acquisition of Whamcloud, where she led the development of the Intel Manager for Lustre* product.

Prior to Whamcloud, she was a software developer at personal genomics and biotechnology company 23andMe.

Prantis holds a Bachelor’s degree in Computer Science from Rochester Institute of Technology

103: Greybeards talk scale-out file and cloud data with Molly Presley & Ben Gitenstein, Qumulo

Sponsored by:

Ray has known Molly Presley (@Molly_J_Presley), Head of Global Product Marketing for just about a decade now and we both just met Ben Gitenstein (@Qumulo_Product), VP of Products & Solutions, Qumulo on this podcast. Both Molly and Ben were very knowledgeable about the problems customers have with massive data troves. 

Molly has been on our podcast before (with another company, (see: GreyBeards talk HPC storage with Molly Rector, CMO & EVP, DDN ). And we have talked with Qumulo before as well (see: GreyBeards talk data-aware, scale-out file systems with Peter Godman, Co-founder & CEO, Qumulo)

Qumulo has a long history of dealing with customer issues with data center application access to data, usually large data repositories, with billions of small or large files, they have accumulated over time.  But recently Qumulo has taken on similar problems in the cloud as well.

Qumulo’s secret has always been to allow researchers to run their applications wherever their data  resides. This has led Qumulo’s software defined storage to offer multiple protocol access as well as a completely native, AWS and GCP cloud version of their solution. 

That way customers can run Qumulo in their data center or in the cloud and have the same great access to data. Molly mentioned one customer that creates and gathers data using SMB protocol on prem and then, after replication, processes it in the cloud. 

Qumulo Shift

Ben mentioned that many competitive storage systems are business model focused. That is they are all about keeping customer data within their solutions so they can charge for capacity. Although Qumulo also charges for capacity, with the new <strong>Qumulo Shift</strong> service, customer can easily move data off Qumulo and into native cloud storage. Using Shift, customers can free up Qumulo storage space (and cost) for any data that only needs to be accessed as objects.

With Shift, customers can replicate or move on prem or in the cloud Qumulo file data to AWS S3 objects. Once in S3, customers can access it with AWS native applications, other applications that make use of AWS S3 data, or can have that data be accessible around the world.

Qumulo customers can select directories to Shift to an AWS S3 bucket. The Qumulo directory name will be mapped to a S3 bucket name and each file in that directory will be copied to an S3 object in that bucket with the same file name.

At the moment, Qumulo Shift only supports AWS S3. Over time, Qumulo plans to offer support for other public cloud storage targets for Shift.

Shift is based on Qumulo replication services. Qumulo has a number of patents on replication technology that provides for sophisticated monitoring, control and high performance for moving vast amounts of data.

How customers use Shift

One large customer uses Qumulo cloud file services to process seismic data but then makes the results of that analysis available to other clients as S3 objects. 

Customers can also take advantage of AWS and other applications that support objects only. For example, AWS SageMaker Machine Learning (ML) processes S3 object data. Qumulo customers could gather training data as files and Shift it to S3 objects for ML training.

Moreover, customers can use Shift to create AWS S3 object  backups, archives and DR repositories of Qumulo file data. Ben mentioned DevOps could also use Qumulo Shift via APIs to move file data to S3 objects as part of new application deployment.

Finally, using Shift to copy or move file data to AWS S3, makes it ideal for collaboration by researchers, analysts and just about other entity that needs access to data. 

The podcast ran ~26 minutes. Molly has always been easy to talk with and Ben turned out also to be easy to talk with and knew an awful lot  about the product and how customers can use it. Keith and I enjoyed our time with Molly and Ben discussing Qumulo and their new Shift service. Listen to the podcast to learn more.

Ben Gitenstein, VP of Products and Solutions, Qumulo

Ben Gitenstein runs Product at Qumulo. He and his team of product managers and data scientists have conducted nearly 1,000 interviews with storage users and analyzed millions of data points to understand customer needs and the direction of the storage market.

Prior to working at Qumulo, Ben spent five years at Microsoft, where he split his time between Corporate Strategy and Product Planning.

Molly Presley, Head of Global Product Marketing, Qumulo

Molly Presley joined Qumulo in 2018 and leads worldwide product marketing. Molly brings over 15 years of file system and archive technology leadership experience to the role. 

Prior to Qumulo, Molly held executive product and marketing leadership roles at Quantum, DataDirect Networks (DDN) and Spectra Logic.

Presley also created the term “Active Archive”, founded the Active Archive Alliance and has served on the Board of the Storage Networking Industry Association (SNIA).

(Updated due to formatting problem, The Eds.)