130: GreyBeards talk high-speed database access using Apache Arrow Flight, with James Duong and David Li

We had heard about Apache Arrow and Arrow Flight as being a hi-performing database with access speeds to match for a while now and finally got a chance to hear what it was all about with James Duong, Co-Fourder of Bit Quill Technologies/Senior Staff Developer at Dremio and David Li (@lidavidm), Apache PMC and software developer at Voltron Data.

First, Apache Arrow is an open source, in memory data base (GitHub repo) for columnar data that enables lightening fast access and processing of data. Apache Arrow Flight is a set of interfaces, protocols, and services that parallelizes access to load and unload Arrow data over the network, from storage to memory and back, very fast. Listen to the podcast to learn more.

Columnar databases are all the rage these days and have more or less taken over from row oriented data bases. With row based database, data is stored (and accessed) row by row. In a columnar database, data is stored in columns, i.e, all data for one column is stored in sequence and then the next column is stored in sequence. Columnar databases can be queried/processed faster than row databased (depending on whether you are looking at/accessing multiple columns per row or not). And columnar data should compress better as all the data in a single column is of the same type..

Also the fact that columns are located contiguous in memory means if you process a column at a time, CPU data caches should work better. This is because they can grab a whole vector (columns worth of data) with one request.

Arrow data is processed and accessed in record batches. These are 2D segments which represent all the columns in a sequence/set of rows. And record batches are the unit of parallelism in Arrow and Arrow Flight. So an Arrow client operating on a CPU thread/core/chip or server could be processing one record batch while another CPU thread/core/CPU or server could process a different record batch.

Arrow Flight (GitHub RPC format doc repo) is an RPC framework that includes API’s, protocols, standards (for on storage, on wire and in memory) and libraries used to transfer Arrow data and metadata (record batches) across the network. For the typical system there exists Flight clients and Flight services in a system.

Arrow Flight currently uses Google’s gRPC for data transfers. gRPC is a open source remote procedure call (RPC) service that supports within data center, across data centers and out to the edge processing services. Although Arrow Flight is currently implemented on top of gRPC, other network protocols will be supported in the future.

What makes Arrow Flight so fast is its ability to support parallel transfers. That is customers can configure Arrow (Flight) clients across clusters of servers and Arrow (Flight) services residing on one or more other servers. Any client can request metadata and record batches from any end point (Flight service) in the data center. And yes Arrow data can be supplied from multiple end points by being mirrored/replicated. All data transfers can operate in parallel across all Flight client and services, with no known bottleneck other than the network.

A single stream of Arrow Flight data was able to deliver 20GB/sec. The fact that you can have any (?) number of Arrow Flight data streams in operation at the same time makes that a very interesting number.

Also, Arrow data can be stored on or sourced from typical data lakes such as Azure Data Lake, AWS S3, Google Cloud storage, etc.

Another advantage of Arrow Flight is the ability to use the same format on the wire and in storage. Normally JDBC (and ODBC) have on storage and on wire formats which require format conversion (serialization) to move data from storage/memory to wire and another conversion (deserialization) to move data from on wire format to in storage/memory format. Arrow Flight does away with serialization and deserialization of data all together and uses the same format for on wire and in storage.

Arrow Flight SQL allows Arrow processing of SQL database data. My understanding is that customers using non Arrow databases such as Oracle, SQL Server, Postgres, etc. can use Arrow Flight SQL to provide Arrow in-memory database processin/query execution for their data.

Arrow and Arrow flight are primarily used to process data analytics workloads but Arrow also has a new execution engine, the Arrow Gandiva project, that enables vectorized processing of Arrow data. This is a special execution engine for Arrow that supports X86 cores with AVX instructions, (NVIDIA) GPUs, and FPGAs.

There’s also an open source package, Fletcher, used to create Arrow and Arrow flight processing HDLs so that customers can add Arrow data processing and Arrow Flight data transfer functionality to custom built FPGAs.

One challenge with open source software is support for problems/bugs that crop up. An active developer community helps, but enterprise customers require professional, on call 7×24 (5×12?) support for all their critical (and most non-critical) software. Voltron Data (David’s) company provides paid for support for Arrow Flight and Arrow data services.

The other major problem with open source software has been use complexity. At the moment the Arrow Flight team is very responsive in clarifying documentation and are trying to make it easier to use. But at the moment Arrow Flight is mostly a set of APIs, libraries and connectors that end users can use to standup Arrow (Flight) clients and servers to transfer Arrow data between them.

James Duong, Co-Founder Bit Quill Technologies & Sr. Staff Developer at Dremio

An Apache Arrow contributor, cofounder at Bit Quill Technologies, and contributor to Dremio Corporation projects, James Duong has worked with databases for over 15 years, from backend query engines to drivers and protocols. He’s worked with a variety of relational, big data, and cloud databases including Dremio, SQL Server, Redshift, and Hive.

Previously at Simba Technologies, James architected and built connectors for sources, as well as designing the Simba Engine SDK for developing connectivity solutions for any data source.

Bit Quill Technologies, the company James helped co-found, builds back end software in the data and cloud space. Bit Quill has built a name for itself as a producer of high-quality software, a collaborative approach to design and development, and a love for good tech and happy people.

Balancing his passion for the data ecosystem with a young family, James occasionally steps away from it all to go hiking.

David Li, Apache Arrow PMC and software engineer at Voltron Data

David is a PMC member for Apache Arrow and a software engineer at Voltron Data (formerly known as Ursa Computing). Prior to that, he worked on data services and Apache Arrow at Two Sigma.

David holds an M.Eng. in Computer Science from Cornell University.

124: GreyBeards talk k8s storage orchestration using CNCF Rook Project with Sébastien Han & Travis Nielsen, Red Hat

Stateful containers are becoming a hot topic these days so we thought it a good time to talk to the CNCF (Cloud Native Computing Foundation) Rook team about what they are doing to make storage easier to use for k8s container apps. CNCF put us into contact with Sébastien Han (@leseb_), Ceph Storage Architect and Travis Nielsen (@STravisNielsen), both Principal Software Engineers at Red Hat and active on the Rook project. Rook is a CNCF “graduated” open source project just like Kubernetes, Prometheus, ContainerD, etc., this means it’s mature enough to run production workloads.

Rook is used to configure, deploy and manage a Red Hat Ceph(r) Storage cluster under k8s. Rook creates all the k8s deployment scripts to set up a Ceph Storage cluster as containers, start it and monitor its activities. Rook monitoring of Ceph operations can restart any Ceph service container or scale any Ceph services up/down as needed by container apps using its storage. Rook is not in the Ceph data path, but rather provides a k8s based Ceph control or management plane for running Ceph storage under k8s.

Readers may recall we talked to SoftIron, an appliance provider, for Ceph Storage in the enterprise for our 120th episode. Rook has another take on using Ceph storage, only this time running it under k8s,. Listen to the podcast to learn more.

The main problem Rook is solving is how to easily incorporate storage services and stateful container apps within k8s control. Containerized apps can scale up or down based on activity and storage these apps use needs the same capabilities. The other option is to have storage that stands apart or outside k8s cluster and control. But then tho container apps and their storage have 2 (maybe more) different control environments. Better to have everything under k8s control or nothing at all.

Red Hat Ceph storage has been available as a standalone storage solutions for a long time now and has quite the extensive customer list, many with multiple PB of storage. Rook-Ceph and all of its components run as containers underneath k8s.

Ceph supports replication (mirroring) of data 1 to N ways typically 3 way or erasure coding for data protection and also supports file, block and object protocols or access methods. Ceph normally consumes raw block DAS for it’s backend but Ceph can also support a file gateway to NFS storage behind it. Similarly, Ceph can offers an object storage gateway option. But with either of these approaches, the (NFS or object) storage exists outside k8s scaling and resiliency capabilities and Rook management.

Ceph uses storage pools that can be defined using storage performance levels, storage data protection levels, system affinity, or any combination of the above. Ceph storage pools are mapped to k8s storage classes using the Ceph CSI. Container apps that want to use storage would issue a persistent volume claim (PVC) request specifying a Ceph storage class which would allocate the Ceph storage from the pool to the container.  

Besides configuring, deploying and monitoring/managing your Ceph storage cluster, Rook can also automatically upgrade your Ceph cluster for you. 

We discussed the difference between running Rook-Ceph within k8s and running Ceph outside k8s. Both approaches depend on Ceph CSI but with Rook, Ceph and all its software is all running under k8s control as containers and Rook manages the Ceph cluster for you. When it’s run outside 1) you manage the Ceph cluster and 2) Ceph storage scaling and resilience are not automatic. 

Sébastien Han, Principal Software Engineer, Ceph Architect, Red Hat

Sebastien Han currently serves as a Senior Principal Software Engineer, Storage Architect for Red Hat. He has been involved with Ceph Storage since 2011 and has built strong expertise around it.

Curious and passionate, he loves working on bleeding edge technologies and identifying opportunities where Ceph can enhance the user experience. He did that with various technology such as OpenStack, Docker.

Now on a daily basis, he rotates between Ceph, Kubernetes, and Rook in an effort to strengthen the integration between all three. He is one of the maintainers of Rook-Ceph.

Travis Nielson, Principal Software Engineer, Red Hat

Travis Nielsen is a Senior Principal Software Engineer at Red Hat with the Ceph distributed storage system team. Travis leads the Rook project and is one of the original maintainers, integrating Ceph storage with Kubernetes.

Prior to Rook, Travis was the storage platform tech lead at Symform, a P2P storage startup, and an engineering lead for the Windows Server group at Microsoft.

120: GreyBeards talk CEPH storage with Phil Straw, Co-Founder & CEO, SoftIron

GreyBeards talk universal CEPH storage solutions with Phil Straw (@SoftIronCEO), CEO of SoftIron. Phil’s been around IT and electronics technology for a long time and has gone from scuba diving electronics, to DARPA/DOD researcher, to networking, and is now doing storage. He’s also their former CTO and co-founder of the company. SoftIron make hardware storage appliances for CEPH, an open source, software defined storage system.

CEPH storage includes file (CEPHFS, POSIX), object (S3) and block (RBD, RADOS block device, Kernel/librbd) services and has been out since 2006. CEPH storage also offers redundancy, mirroring, encryption, thin provisioning, snapshots, and a host of other storage options. CEPH is available as an open source solution, downloadable at CEPH.io, but it’s also offered as a licensed option from RedHat, SUSE and others. For SoftIron, it’s bundled into their HyperDrive storage appliances. Listen to the podcast to learn more.

SoftIron uses the open source version of CEPH and incorporates this into their own, HyperDrive storage appliances, purpose built to support CEPH storage.

There are two challenges to using open source solutions:

  • Support is generally non-existent. Yes, the open source community behind the (CEPH) project supplies bug fixes and can possibly answer some questions but this is not considered enterprise support where customers require 7x24x365 support for a product
  • Useability is typically abysmal. Yes, open source systems can do anything that anyone could possibly want (if not, code it yourself), but trying to figure out how to use any of that often requires a PHD or two.

SoftIron has taken both of these on to offer a CEPH commercial product offering.

Take support, SoftIron offers enterprise level support that customers can contract for on their own, even if they don’t use SoftIron hardware. Phil said the would often get kudos for their expert support of CEPH and have often been requested to offer this as a standalone CEPH service. Needless to say their support of SoftIron appliances is also excellent.

As for ease of operations, SoftIron makes the HyperDrive Storage Manager appliance, which offers a standalone GUI, that takes the PHD out of managing CEPH. Anything one can do with the CEPH CLI can be done with SoftIron’s Storage Manager. It’s also a very popular offering with SoftIron customers. Similar to SoftIron’s CEPH support above, customers are requesting that their Storage Manager be offered as a standalone solution for CEPH users as well.

HyperDrive hardware appliances are storage media boxes that offer extremely low-power storage for CEPH. Their appliances range from high density (120TB/1U) to high performance NVMe SSDs (26TB/1U) to just about everything in between. On their website, I count 8 different storage appliance offerings with various spinning disk, hybrid (disk-SSD), SATA and NVMe SSDs (SSD only) systems.

SoftIron designs, develops and manufacturers all their own appliance hardware. Manufacturing is entirely in the US and design and development takes place in the US and Europe only. This provides a secure provenance for HyperDrive appliances that other storage companies can only dream about. Defense, intelligence and other security conscious organizations/industries are increasingly concerned about where electronic systems come from and want assurances that there are no security compromises inside them. SoftIron puts this concern to rest.

Yes they use CPUs, DRAMs and other standardized chips as well as storage media manufactured by others, but SoftIron has have gone out of their way to source all of these other parts and media from secure, trusted suppliers.

All other major storage companies use storage servers, shelves and media that come from anywhere, usually sourced from manufacturers anywhere in the world.

Moreover, such off the shelf hardware usually comes with added hardware that increases cost and complexity, such as graphics memory/interfaces, Cables, over configured power supplies, etc., but aren’t required for storage. Phil mentioned that each HyperDrive appliance has been reduced to just what’s required to support their CEPH storage appliance.

Each appliance has 6Tbps network that connects all the components, which means no cabling in the box. Also, each storage appliance has CPUs matched to its performance requirements, for low performance appliances – ARM cores, for high performance appliances – AMD EPYC CPUs. All HyperDrive appliances support wire speed IO, i.e, if a box is configured to support 1GbE or 100GbE, it transfers data at that speed, across all ports connected to it.

Because of their minimalist hardware design approach, HyperDrive appliances run much cooler and use less power than other storage appliances. They only consume 100W or 200W for high performance storage per appliance, where most other storage systems come in at around 1500W or more.

In fact, SoftIron HyperDrive boxes run so cold, that they don’t need fans for CPUs, they just redirect air flom from storage media over CPUs. And running colder, improves reliability of disk and SSD drives. Phil said they are seeing field results that are 2X better reliability than the drives normally see in the field.

They also offer a HyperDrive Storage Router that provides a NFS/SMB/iSCSI gateway to CEPH. With their Storage Router, customers using VMware, HyperV and other systems that depend on NFS/SMB/iSCSI for storage can just plug and play with SoftIron CEPH storage. With the Storage Router, the only storage interface HyperDrive appliances can’t support is FC.

Although we didn’t discuss this on the podcast, in addition to HyperDrive CEPH storage appliances, SoftIron also provides HyperCast, transcoding hardware designed for real time transcoding of one or more video streams and HyperSwitch networking hardware, which supplies a secure provenance, SONiC (Software for Open Networking in [the Azure] Cloud) SDN switch for 1GbE up to 100GbE networks.

Standing up PB of (CEPH) storage should always be this easy.

Phil Straw, Co-founder & CEO SoftIron

The technical visionary co-founder behind SoftIron, Phil Straw initially served as the company’s CTO before stepping into the role as CEO.

Previously Phil served as CEO of Heliox Technologies, co-founder and CTO of dotFX, VP of Engineering at Securify and worked in both technical and product roles at both Cisco and 3Com.

Phil holds a degree in Computer Science from UMIST.

117: GreyBeards talk HPC file systems with Frank Herold, CEO of ThinkParQ, makers of BeeGFS

We return back to our storage thread with a discussion of HPC file systems with Frank Herold, (@BeeGFS) CEO of ThinkParQ GmbH, the makers of BeeGFS. I’ve seen BeeGFS start to show up in some IO500 top storage benchmark results and as more and more data keeps coming online every day, we thought it time to start finding out how our friends in the HPC world handle their data deluge.

Frank’s a former rocket scientist, that’s been in and around the storage industry for years, and was very knowledgeable about BeeGFS’s software defined, parallel file system. He seemed to have a great grasp of the IO requirements in HPC, Life Sciences and other HPC-like applications. Listen to the podcast to learn more.

Turns out that ThinkParQ is a spinoff of the research institute in Germany that originally developed BeeGFS parallel file system. There are apparently two version of their product one which is publicly available (downloadable from their website) and another with commercial support. It’s not quite 100% open source but it’s got a lot of open source in it and their GIT repository is available

BeeGFS was primarily focused on HPC workloads but as this type of work has become more mainstream, they have moved beyond HPC and now have significant installations in Life Sciences, Oil&Gas and many other big data environments.

It runs on x86/AMD, OpenPower, and ARM CPUs. BeeGFS comes as a number of services, one of which is a storage service which uses a backend with ZFS or XFS file system. It also uses (POSIX compliant) host client software to access their system. There’s also a metadata and monitoring service. Most of the time these services run on separate servers but BeeGFS also supports a “converged mode”, where all these services run on a single server. And you can have multiple converged mode servers in a cluster.

BeeGFS is a parallel file system. This means that it intrinsically supports multiple metadata services/servers and multiple storage servers which allow it to scale up storage bandwidth and performance considerably beyond single appliance systems. Data is automatically distributed across all the storage servers in the configuration, unless you specify that data reside on specific, say all flash storage servers. Similarly, metadata is automatically distributed across all metadata servers in the system.

They don’t support any specific RAID protection other than mirroring and that really to speed up read throughput. Rather they depend on the underlying XFS/ZFS file system to provide drive failure protection (RAID5/6).

One of BeeGFS’s selling points is that it has few tuning parameters that a customer needs to fiddle with. Frank said it runs quite well right out of the box.

BeeGFS offers a single name space that spans the cluster (of metadata servers/storage servers). But customers can elect to split this name space across a subset of these metadata and storage servers, and by doing so they create multiple BeeGFS clusters.

There’s no inherent support for NFS or SMB but customers can configure NFS or SAMBA servers that use BeeGFS as backend storage. Also, there’s no data reduction built into BeeGFS and no automatic data tiering across the backend storage (file systems).

But as noted above, customers can direct which backend storage to use to hold their data. And they do offer a CLI data movement primitive and customers can use this in conjunction with other software to implement storage tiering or do it themselves.

Metadata performance is extremely important for small files and for large multi Billion object file systems. BeeGFS uses extensive metadata caching to provide faster access to this information.

Speaking of small file performance, we had a decent discussion on the tradeoffs involved between small and large file performance. And although BeeGFS has decent small file performance it’s not a be all for every small file intensive application. According to Frank, not every small file workload is optimal for BeeGFS.

They offer BeeOND which is BeeGFS on demand. This is an integration with Slurm workload scheduler (HPC work scheduler) that allows customers to spin up a scratch BeeGFS parallel file system across compute servers with storage.

Slurm’s BeeOND integration brings all BeeGFS services up and deploys them on compute nodes you specify. At this point you have a fully installed BeeGFS (scratch) parallel file system. Customers may use this scratch file system to support any compute-data intensive workload theyneed to run. When no longer needed, Slurm can be directed to automatically dismantle the BeeGFSl file system.

We talked about BeeGFS partners. They have a number of regional partners that provide installation and onsite support and a number of technical partners, such as NetApp, Dell, HPE and INSPUR, that supply BeeGFS configured servers and systems for deployment/installation.

Frank Herold, CEO ThinkparQ

Frank Herold is the CEO of ThinkParQ GmbH – the company behind BeeGFS. He actively leads the company and the product strategy of BeeGFS as a global player for parallel high-performance file systems.

Prior to joining ThinkParQ, he held various senior management positions within ADIC and Quantum Corporation, responsible for market segments within the academic and scientific research, oil and gas, broadcast and video surveillance sectors, focusing on large scale, high-performance and enterprise accounts within EMEA. 

Frank has over 25 years of experience in the IT industry and holds a master’s degree in engineering (Dipl. -Ing.) in rocket science.

107: GreyBeards talk MinIO’s support of VMware’s new Data Persistence Platform with AB Periasamy, CEO MinIO

Sponsored by:

The GreyBeards have talked with Anand Babu (AB) Periasamy (@ABPeriasamy), CEO MinIO, before (see 097: GreyBeards talk open source S3… episode). And we also saw him earlier this year, at their headquarters for Storage Field Day 19 (SFD19) where AB gave a great discussion of what they were doing and how it worked (see MinIO’s SFD18 presentation videos).

The podcast runs ~26 minutes. AB is very technically astute and always a delight to talk with. He’s extremely knowledgeable about the cloud, containerized applications and high performing S3 compatible object storage. And now with MinIO and vSAN Data Persistence under VCF Tanzu, very knowledgeable about the virtualized IT environment as well. Listen to the podcast to learn more. [We’re trying out a new format placing the podcast up front. Let us know what you think; The Eds.]

VMware VCF vSAN Data Persistence Platform with MinIO

Earlier this month VMware announced a new capability available with the next updates of vSAN, vSphere & VCF called the vSAN Data Persistence Platform. The Data Persistence Platform is a VMware framework designed to integrate stateful, independent vendor software defined storage services in vSphere. By doing so, VCF can provide API access to persistent storage services for containerized applications running under Tanzu Kubernetes (k8s) Grid service clusters.

At the announcement, VMware identified three object storage and one (Cassandra) database technical partners that had been integrated with the solution.  MinIO was an object storage, open source partner.

VMware’s VCF vSAN Data Persistence framework allows vCenter administrators to use vSphere cluster infrastructure to configure and deploy these new stateful storage services, like MinIO, into namespaces and enables app developers direct k8s API access to these storage namespaces to provide persistent, stateful object storage for applications. 

With VCF Tanzu and the vSAN Data Persistence Platform using MinIO, dev can have full support for their CiCd pipeline using native k8s tools to deploy and scale containerized apps on prem, in the public cloud and in hybrid cloud, all using VCF vSphere.

MinIO on the Data Persistence Platform

AB said MinIO with Data Persistence takes advantage of a new capability called vSAN Direct which gives vSAN almost JBOF types of IO control and performance. With MinIO vSAN Direct, storage and k8s cluster applications can co-reside on the same ESX node hardware so that IO activity doesn’t have to hop off host to be performed. In addition, can now populate ESX server nodes with lots (100s to 1000s?) of storage devices and be assured the storage will be used by applications running on that host.

As a result, MinIO’s object storage IO performance on VCF Tanzu is very good due to its use of vSAN Direct and MinIO’s inherent superior IO performance for S3 compatible object storage.

With MinIO on the VCF vSAN Data Persistence Platform, VMware takes over all the work of deploying MinIO software services on the VCF cluster. This way customers can take advantage of MiniO’s fully compatible S3 object storage system operating in their VCF cluster. For app developers they get the best of all worlds, infrastructure configured, deployed and managed by admins but completely controllable, scaleable and accessible through k8s API services.

If developers want to take advantage of MinIO specialized services such as data security or replication, they can do so directly using MinIOs APIs, just like they would when operating bare metal or in the cloud.

AB said the VMware development team was very responsive during development of Data Persistence. AB was surprised to see such a big company, like VMware, operate with almost startup like responsiveness. Keith mentioned he’s seen this in action as vSAN has matured very rapidly to a point of almost feature parity, with just about any storage system out there today .

With MinIO object storage, container applications that need PB of data, now have a home on VCF Tanzu. And it’s as easily usable as any public cloud storage. And with VCF Tanzu configuring and deploying the storage over its own infrastructure, and then having it all managed and administered by vCenter admins, its simple to create and use PB of object storage.

MinIO is already the most popular S3 compatible object storage provider for applications running in the cloud and on prem. And VMware is easily the most popular virtualization platform on the planet. Now with the two together on VCF Tanzu, there seems to be nothing in the way of conquering containerized applications running in IT as well.

With that, MinIO is available everywhere containers want to run, natively available in the cloud, on prem and hybrid cloud or running with VCF Tanzu everywhere as well.

AB Periasamy, CEO MinIO

AB Periasamy is the CEO and co-founder of MinIO. One of the leading thinkers and technologists in the open source software movement,

AB was a co-founder and CTO of GlusterFS which was acquired by RedHat in 2011. Following the acquisition, he served in the office of the CTO at RedHat prior to founding MinIO in late 2015.

AB is an active angel investor and serves on the board of H2O.ai and the Free Software Foundation of India.

He earned his BE in Computer Science and Engineering from Annamalai University.

This image has an empty alt attribute; its file name is Subscribe_on_iTunes_Badge_US-UK_110x40_0824.png
This image has an empty alt attribute; its file name is play_prism_hlock_2x-300x64.png
This image has an empty alt attribute; its file name is Spotify_Logo_CMYK_Black-1024x307.png