122: GreyBeards talk big data archive with Floyd Christofferson, CEO StrongBox Data Solutions

The GreyBeards had a great discussion with Floyd Christofferson, CEO, StrongBox Data Solutions on their big data/HPC file and archive solution. Floyd’s is very knowledgeable on problems of extremely large data repositories and has been around the HPC and other data intensive industries for decades.

StrongBox’s StrongLink solution offers a global namespace file system that virtualizes NFS, SMB, S3 and Posix file environments and maps this to a software-only, multi-tier, multi-site data repository that can span onsite flash, disk, S3 compatible or Azure object and LTFS tape iibrary storage as well as offsite versions of all the above tiers.

Typical StrongLink customers range in the 10s to 100s of PB, and ingesting or processing PBs a day. 200TB is a minimum StrongLink configuration, but Floyd said any shop with over 500TB has problems with data silos and other issues, but may not understand it yet. StrongLink manages data placement and movement, throughout this hierarchy to better support data access and economical storage. In the process StrongLink eliminates any data silos due to limitations of NAS systems while providing the most economic placement of data to meet user performance requirements.


Floyd said that StrongLink first installs in customer environment and then operates in the background to discover and ingest metadata from the primary customers file storage environment. Some point later the customer reconfigures their end-users share and mount points to StrongLink servers and it’s up and starts running.

The minimal StrongLink, HA environment consists of 3 nodes. They use a NoSQL metadata database which is replicated and sharded across the nodes. It’s shared for performance load balancing and fully replicated (2-way or 3-way) across all the StrongLink server nodes for HA.

The StrongLink nodes create a cluster, called a star in StrongBox vernacular. Multiple clusters onsite can be grouped together to form a StrongLink constellation. And multiple data center sites, can be grouped together to form a StrongLink galaxy. Presumably if you have a constellation or a galaxy, the same metadata is available to all the star clusters across all the sites.

They support any tape library and any NFS, SMB, S3 orAzure compatible object or file storage. Stronglink can move or copy data from one tier/cluster to another based on policies AND the end-users never sees any difference in their workflow or mount/share points.

One challenge with typical tape archives is that they can make use of proprietary tape data formats which are not accessible outside those systems. StrongLink has gone with a completely open-source, LTFS file format on tape, which is well documented and is available to anyone.

Floyd also made it a point of saying they don’t use any stubs, or soft links to provide their data placement magic. They only use standard file metadata.

File data moves across the hierarchy based on policies or by request. One of the secrets to StrongLink success is all the work they have done to ensure that any data movement can occur at line rate speeds. They heavily parallelize any data movement that’s required to support data placement across as many servers as the customer wants to throw at it. StrongBox services will help right-size the customer deployment to support any data movement performance that is required.

StrongLink supports up to 3-way replication of a customer’s data archives. This supports a primary archive and 3 more replicas of data.

Floyd mentioned a couple of big customers:

  • One autonomous automobile supplier, was downloading 2PB of data from cars in the field, processing this data and then moving it off their servers to get ready for the next day’s data load.
  • Another weather science research organization, had 150PB of data in an old tape archive and they brought in StrongLink to migrate all this data off and onto LTFS tape format as well as support their research activities which entail staging a significant chunk of file data on research servers to do a climate run/simulation.

NASA, another StrongLink customer, operates slightly differently than the above, in that they have integrated StrongLink functionality directly into their applications by making use of StrongBox’s API.

StrongLink can work in three ways.

  • Using normal file access services where StrongLink virtualizes your NFS, SMB, S3 or Posix file environment. For this service StrongLink is in the data path and you can use policy based management to have data moved or staged as the need arises.
  • Using StrongLink CLI to move or copy data from one tier to another. Many HPC customers use this approach through SLURM scripts or other orchestration solutions.
  • Using StrongLink API to move or copy data from one tier to another. This requires application changes to take advantage of data placement.

StrongBox customers can of course, use all three modes of operation, at the same time for their StrongLink data galaxy. StrongLink is billed by CPU/vCPU level and not for the amount of data customers throw into the archive. This has the effect of Customers gaining a flat expense cost, once StrongLink is deployed, at least until they decide to modify their server configuration.

Floyd Christofferson, CEO StrongBox Data Solutions

As a professional involved in content management and storage workflows for over 25 years, Floyd has focused on methods and technologies needed to manage massive volumes of data across many different storage types and use cases.

Prior to joining SBDS, Floyd worked with software and hardware companies in this space, including over 10 years at SGI, where he managed storage and data management products. In that role, he was part of the team that provided solutions used in some of the largest data environments around the world.

Floyd’s background includes work at CBS Television Distribution, where he helped implement file-based content management and syndicated content distribution strategies, and Pathfire (now ExtremeReach), where he led the team that developed and implemented a satellite-based IP-multicast content distribution platform that manages delivery of syndicated content to nearly 1,000 TV stations throughout the US.

Earlier in his career, he ran Potomac Television, a news syndication and production service in Washington DC, and Manhattan Center Studios, an audio, video, graphics, and performance facility in New York.

120: GreyBeards talk CEPH storage with Phil Straw, Co-Founder & CEO, SoftIron

GreyBeards talk universal CEPH storage solutions with Phil Straw (@SoftIronCEO), CEO of SoftIron. Phil’s been around IT and electronics technology for a long time and has gone from scuba diving electronics, to DARPA/DOD researcher, to networking, and is now doing storage. He’s also their former CTO and co-founder of the company. SoftIron make hardware storage appliances for CEPH, an open source, software defined storage system.

CEPH storage includes file (CEPHFS, POSIX), object (S3) and block (RBD, RADOS block device, Kernel/librbd) services and has been out since 2006. CEPH storage also offers redundancy, mirroring, encryption, thin provisioning, snapshots, and a host of other storage options. CEPH is available as an open source solution, downloadable at CEPH.io, but it’s also offered as a licensed option from RedHat, SUSE and others. For SoftIron, it’s bundled into their HyperDrive storage appliances. Listen to the podcast to learn more.

SoftIron uses the open source version of CEPH and incorporates this into their own, HyperDrive storage appliances, purpose built to support CEPH storage.

There are two challenges to using open source solutions:

  • Support is generally non-existent. Yes, the open source community behind the (CEPH) project supplies bug fixes and can possibly answer some questions but this is not considered enterprise support where customers require 7x24x365 support for a product
  • Useability is typically abysmal. Yes, open source systems can do anything that anyone could possibly want (if not, code it yourself), but trying to figure out how to use any of that often requires a PHD or two.

SoftIron has taken both of these on to offer a CEPH commercial product offering.

Take support, SoftIron offers enterprise level support that customers can contract for on their own, even if they don’t use SoftIron hardware. Phil said the would often get kudos for their expert support of CEPH and have often been requested to offer this as a standalone CEPH service. Needless to say their support of SoftIron appliances is also excellent.

As for ease of operations, SoftIron makes the HyperDrive Storage Manager appliance, which offers a standalone GUI, that takes the PHD out of managing CEPH. Anything one can do with the CEPH CLI can be done with SoftIron’s Storage Manager. It’s also a very popular offering with SoftIron customers. Similar to SoftIron’s CEPH support above, customers are requesting that their Storage Manager be offered as a standalone solution for CEPH users as well.

HyperDrive hardware appliances are storage media boxes that offer extremely low-power storage for CEPH. Their appliances range from high density (120TB/1U) to high performance NVMe SSDs (26TB/1U) to just about everything in between. On their website, I count 8 different storage appliance offerings with various spinning disk, hybrid (disk-SSD), SATA and NVMe SSDs (SSD only) systems.

SoftIron designs, develops and manufacturers all their own appliance hardware. Manufacturing is entirely in the US and design and development takes place in the US and Europe only. This provides a secure provenance for HyperDrive appliances that other storage companies can only dream about. Defense, intelligence and other security conscious organizations/industries are increasingly concerned about where electronic systems come from and want assurances that there are no security compromises inside them. SoftIron puts this concern to rest.

Yes they use CPUs, DRAMs and other standardized chips as well as storage media manufactured by others, but SoftIron has have gone out of their way to source all of these other parts and media from secure, trusted suppliers.

All other major storage companies use storage servers, shelves and media that come from anywhere, usually sourced from manufacturers anywhere in the world.

Moreover, such off the shelf hardware usually comes with added hardware that increases cost and complexity, such as graphics memory/interfaces, Cables, over configured power supplies, etc., but aren’t required for storage. Phil mentioned that each HyperDrive appliance has been reduced to just what’s required to support their CEPH storage appliance.

Each appliance has 6Tbps network that connects all the components, which means no cabling in the box. Also, each storage appliance has CPUs matched to its performance requirements, for low performance appliances – ARM cores, for high performance appliances – AMD EPYC CPUs. All HyperDrive appliances support wire speed IO, i.e, if a box is configured to support 1GbE or 100GbE, it transfers data at that speed, across all ports connected to it.

Because of their minimalist hardware design approach, HyperDrive appliances run much cooler and use less power than other storage appliances. They only consume 100W or 200W for high performance storage per appliance, where most other storage systems come in at around 1500W or more.

In fact, SoftIron HyperDrive boxes run so cold, that they don’t need fans for CPUs, they just redirect air flom from storage media over CPUs. And running colder, improves reliability of disk and SSD drives. Phil said they are seeing field results that are 2X better reliability than the drives normally see in the field.

They also offer a HyperDrive Storage Router that provides a NFS/SMB/iSCSI gateway to CEPH. With their Storage Router, customers using VMware, HyperV and other systems that depend on NFS/SMB/iSCSI for storage can just plug and play with SoftIron CEPH storage. With the Storage Router, the only storage interface HyperDrive appliances can’t support is FC.

Although we didn’t discuss this on the podcast, in addition to HyperDrive CEPH storage appliances, SoftIron also provides HyperCast, transcoding hardware designed for real time transcoding of one or more video streams and HyperSwitch networking hardware, which supplies a secure provenance, SONiC (Software for Open Networking in [the Azure] Cloud) SDN switch for 1GbE up to 100GbE networks.

Standing up PB of (CEPH) storage should always be this easy.

Phil Straw, Co-founder & CEO SoftIron

The technical visionary co-founder behind SoftIron, Phil Straw initially served as the company’s CTO before stepping into the role as CEO.

Previously Phil served as CEO of Heliox Technologies, co-founder and CTO of dotFX, VP of Engineering at Securify and worked in both technical and product roles at both Cisco and 3Com.

Phil holds a degree in Computer Science from UMIST.

116: GreyBeards talk VCF on VxBlock 1000 with Martin Hayes, DMTS, Dell Technologies

Sponsored By:

This past week, we had a great talk with Martin Hayes (@hayes_martinf), Distinguished Member Technical Staff at Dell Technologies about running VMware Cloud Foundation (VCF) on VxBlock 1000 converged infrastructure (CI). It used to be that Cloud Foundation required VMware vSAN primary storage but that changed a few years ago. . When that happened, the Dell Technologies team saw it as a great opportunity to support VCF on VxBlock CI.

This is the first GreyBeards podcast for Martin, but he was extremely knowledgeable about VxBlock and Cloud Foundation technologies. He’s been a technical product manager on the VxBlock converged infrastructure at Dell Technologies for many years. He’s an expert on Cloud Foundation and he knows an awful lot more about VMware NSX-T networking than seems reasonable (good thing). In any case, Martin’s expertise covers the whole gamut of VCF services as well as VxBlock 1000 infrastructure. The podcast is a bit longer than our normal sponsored podcast but there was a lot of information to cover. Listen to the podcast to learn more.

With VCF enabling primary storage on networked storage systems, all the storage vendors in the world gave a mighty cheer. But VMware Cloud Foundation still requires the vSAN servers to run its management domain. Late in 2020, VxBlock 1000 from Dell Technologies released a new software defined version of its Advanced Management Platform (AMP) to run on vSAN Ready Nodes. AMP is VxBlock’s management platform but also runs management domains for VCF and NSX-T.

For workload domains, VxBlock 1000 offers Cisco UCS M5 rack and blade servers, that can be configured to support just about any workload needed by a data center.

Historically, VMware vSphere problems with DR weren’t as much storage replication issues as networking problems. But NSX-T and VCF seemed to have solved that problem.

And with vRealize Automation plugins and NSX-T APIs, customers can have 0 touch network provisioning which enables the use of IaaS or infrastructure as code for their data center.

VMware vVOLs are now available with Dell EMC PowerMax storage. So, now VxBlock 1000 customers can use vSphere storage policy-based management (SPBM) as well as automated vVOL replication for data on PowerMax.

VMware NSX-T implements Application Virtual Networks (AVNs) using a GENEVE overlay network, which make extensive use of encapsulation. But where there’s encapsulation, de-encapsulation must follow to access outside networks. All this (encapsulation on ingress, de-encapsulation on egress) is done through NSX-T Edge clusters.

The net result of all this is that VMware customers have more choice, i.e., now they can run VCF on HCI or CI. And with VxBlock 1000 CI, VCF customers can select a best of breed components for each level of their 3-tier infrastructure.

Martin Hayes, DMTS, Dell Technologies

Martin Hayes is a Technical Product Manager at Dell Technologies, where he develops and executes data center product strategies that incorporate virtualization, software-defined networking (SDN) and converged systems.

Previously, he served in network advisory and architect roles at Dell EMC, converged systems pioneer VCE and Irish broadband provider eircom.

113: GreyBeards talk storage for next gen. workloads with Liran Zvibel, Co-Founder & CEO WekaIO

Sponsored By:

I’ve known Liran Zvibel, Co-founder and CEO of Weka IO for many years now and it’s the second time he’s been on our show, (see: Episode 56: GreyBeards talk high performance file storage...). In those days, WekaIO was just coming out and hitting the world with this extremely high-performing, scale out unstructured data solution. Well since then, they’ve just gotten better.

Keith and I had a great time talking with Liran again. Liran has deep knowledge about unstructured data and how enterprises use it these days. WekaIO’s story, over the last two years has gone beyond great performance to real world, hybrid cloud offerings e as well as going after the cloud native app’s (read Kubernetes [K8S]) persistent storage. Listen to the podcast to learn more.

We started with a history lesson on WekaIO. Back in those days (which persists today, I might add) there were many IO workloads that required companies to purchase different solutions for different work. For example, they needed DAS or SAN for performance, NAS for ease of access and object for scale. WekaIO came out with an answer to all these problems in a single, scaleable storage system. That is, they performed IO as fast as DAS or SAN block, had all the ease of access of NAS, and could scale as much as object.

However, the real culprit holding the world back was “NFS”. At the outset NFS was designed (back in the 1990s) with the then current networking speeds available (10-100Mbps), which performed just fine at those speeds. But when 10-100GbE came out in the 2000’s, NFS’s metadata overhead was too chatty to support wire speeds. Thus, any storage that depended on NFS protocols couldn’t supply (small) files fast enough for modern applications.

This is why WekaIO has moved to not only support NFS and SMB but also POSIX and NVIDIA® GPUDirect® Storage interfaces. By offering POSIX, WekaIO is able to plug into standard Linux and Windows server systems and provide excellent small file performance. Of course applications that demand small file performance today are mostly data analytics and AI/ML/DL workloads.

Consequently., NVIDIA came out with their GPUDirect Storage protocol to address getting small file (data) into GPUs faster. With GPUDirect, storage systems can RDMA data directly from storage to GPU memory and vice versa, with no OS intervention (other than to set up the transfer). If you happen to have a small file, high performing storage system attached to your fabric that supports GPUDirect , like WekaIO, you can significantly speed up your AI/ML/DL workloads.

Next we started talking K8S storage. WekaIO usestheir POSIX interface in their CSI plugin to support K8S container persistent storage. Again, supplying high performance for small files seems to be tailor made for K8S container applications that exist today and will for the foreseeable future.

Enter the cloud. Almong other things, WekaIO is a AWS primary storage vendor. It also offers snap to cloud. And with both of these in tandem, it’s just become a lot easier to move and access your unstructured data in the cloud. Liran mentioned that WekaIO primary storage in AWS operates across AZ’s. This means it can be configured to support better availability than EBS.

Large BioPharma companies are using WekaIO in AWS to store and process field data and research data, so that this work can be done around the world. Some companies have run out of compute in a single AZ (unbelievable I know but it’s COVID). By offering multi-AZ support unstructured data access with WekaIO, these companies can spread their compute across AZ’s and region and still access their data. And when their products are ready for gov’t certification, having all this data in the cloud, can make provide an easy way to have gov’t access this same data.

Liran Zvibel, Co-founder and CEO WekaIO

As Co-Founder and CEO, Mr. Liran Zvibel guides long term vision and strategy at WekaIO. Prior to creating the opportunity at WekaIO, he ran engineering at social startup and Fortune 100 organizations including Fusic, where he managed product definition, design, and development for a portfolio of rich social media applications.

Liran also held principal architectural responsibilities for the hardware platform, clustering infrastructure and overall systems integration for XIV Storage System, acquired by IBM in 2007.

Mr. Zvibel holds a BSc.in Mathematics and Computer Science from Tel Aviv University.

112: GreyBeards annual year end wrap-up with Keith & Matt

It’s the end of the year, so time for our regular year end wrap up discussion with the GreyBeards. 2020 has been an interesting year to say the least. It started out just fine, then COVID19 showed up and threw a wrench in everyone’s plans and as the year closes, we were just starting to see some semblance of the new normal, when one of the largest security breaches in years shows up. Whew, almost glad that’s over and onto 2021.

As always the GreyBeards had a great discussion on these and other topics to highlight the year just past. The talk was wide ranging and hard to characterize but I did my best below. Listen to the podcast to learn more.

COVID19s impact on the enterprise

It will probably take some time before we learn the true, long term impacts of COVID19 on IT but one major change has to be the massive Work From Home (WFH) transition that took place overnight.

While WFH can be more productive for some, the lack of face2face interaction can be challenging for others. The fact that many of the GreyBeards have been working from home for decades now, left us a bit oblivious to how jarring this transition can be for newcomers.

There’s definitely some psychological changes that need to occur to be productive at WFH. Organization skills become even more important. Structured interactions (read conference calls, zoom/webex and other forms of communication become much more important. And then there’s security.

Turns out VMware and others have been touting VDI solutions for the past decade or so to better support remote work and at the same time providing corporate levels of security for remote work. While occasionally this doesn’t work quite as well as expected, it’s certainly much much better than having end users access corporate data without any security around that data or worse yet, the “bring your own device”. All these VDI solutions had a field day when WFH happened.

Many workers found they could be more productive at WFH, due the less distractions, no commute time and more flexible hours. What happens when COVID19 is vanquished to all these current WFHers is anyone’s guess.

We thought there might be less need for large office campuses/buildings. But there’s something to be said for more collaboration and random interactions through face2face meetings that can only occur in an office setting with workers present at the same time. Some organizations will take to this new way of work while others will try to dial WFH back to non-existent. Where your organization fits on this spectrum and why, will be telling across a number of dimensions.

The rise of ARM

There’s been a slow but steady improvement in ARM processors over the last almost half century. Nowadays it’s starting to make a place for itself in the enterprise. ARH has always been the goto microprocessor for low power solutions (like smartphones) but nowadays they are being deployed in the cloud and even the enterprise. These can be used as server processors but even outside servers, ARM cores are showing up in hardware accelerators as the brains behind SmartNICs, DPUs, SPUs, etc.

Keith made mention AWS 2nd generation Graviton 64-bit ARM processor EC2 instances. And yes there’s significant cost ( & power) savings that can be had using AWS Graviton ARM instances. So the cloud is starting to adopt them. Somewhere over the past couple of years I heard that VMware was porting ESX to work on ARM cores.

But apparently, it’s not just as simple as dropping an ARM multi-core processor into a server and recompiling your code and away you go. Applications need a certain amount of optimization to run effectively on ARM processors. And the speed up between non-optimized and optimized versions of an application running on ARM cores is significant.

As for SmartNICs and DPUs, these are data networking hardware accelerators that provide real time processing capabilities needed to keep up with higher speed networking, 100GbE and beyond. These DPUs perform deep packet inspection, data compression, encryption and other services all at wire speeds.. Yes you could devote 1 or more X86 cores to do this, but it’s much cheaper (and more effective) to do this outside the CPU core. Moreover, performing this activity at the network entry point to the server means that much of this data doesn’t have to be transferred back and forth through server memory. So not only does it save CPU core cycles but also memory size and memory & PCIe bus bandwidth. We published a recent podcast with Kevin Deierling, NVIDIA Networking discussing DPUs if you want to learn more.

Pat made mention at (virtual) VMworld their plans to port ESX to the DPU. Keith followed up on this and asked some other exec’s at VMware about this and they said VMware will more likely support DPUs as just another hardware accelerator in their cluster. In either case, CPU cycles should be freed up and this should help VMware use X86 cores more efficiently. And perhaps this will help them engage in more CPU constrained environments such as Telcom.

Then there’s computational storage. We have been watching this technology for a couple of years now and it’s seeing some success in being deployed to public cloud environments. They seem to be being used to provide outboard data compression. It’s unclear whether these systems depend on ARM processing or not but my bet is that they do. To learn more about computational storage check out these podcasts, FMS2020 wrap up with Jim Handy and our talk with Scott Shadley on NGD’s computational storage.

System security

At yearend, we are learning of a massive security breach throughout US government IT facilities. All based on what is believed to be a Russian hack to a software package that is embedded in a popular networking tool software solution, SolarWinds. They are calling this a software supply chain hack. Although we are mainly hearing about government agencies being hacked, SolarWinds is also pervasive in the enterprise as well.

There have been many hardware supply chain hacks in the past, where a board supplier used chips or logic that weren’t properly vetted. Over time, hardware suppliers have started to scrutinize their supply chains better and have reduced this risk.

And the US government have been lobbying for the industry to use a security chip with a backdoor or to supply back doors to smartphone encryption capabilities. Luckily, so far, none of these have been implemented by industry.

What Russia has shown us is that this particular hack is not limited to the hardware sphere. Software supply chain risk can’t be ignored anymore.

This means that any software application supplier will need to secure their supply chain or bring it all in house. Which may mean that costs for these packages will go up. It’s possible that using a pure open source supply chain may reduce this risk as well. At least that’s the promise of open source.

We said 2020 was an interesting year and it’s going out with a bang.

Matt Leib (@MBLeib), one of our co-hosts, has been blogging in the storage space for over 10 years, with work experience both on the engineering and presales/product marketing.. His blog is at Virtually Tied to My Desktop and he’s on LinkedIN.

Keith Townsend (@CTOAdvisor) is a IT thought leader who has written articles for many industry publications, interviewed many industry heavyweights, worked with Silicon Valley startups, and engineered cloud infrastructure for large government organizations. Keith is the co-founder of The CTO Advisor, blogs at Virtualized Geek, and can be found on LinkedIN.