The GreyBeards had a great discussion with Floyd Christofferson, CEO, StrongBox Data Solutions on their big data/HPC file and archive solution. Floyd’s is very knowledgeable on problems of extremely large data repositories and has been around the HPC and other data intensive industries for decades.
StrongBox’s StrongLink solution offers a global namespace file system that virtualizes NFS, SMB, S3 and Posix file environments and maps this to a software-only, multi-tier, multi-site data repository that can span onsite flash, disk, S3 compatible or Azure object and LTFS tape iibrary storage as well as offsite versions of all the above tiers.
Typical StrongLink customers range in the 10s to 100s of PB, and ingesting or processing PBs a day. 200TB is a minimum StrongLink configuration, but Floyd said any shop with over 500TB has problems with data silos and other issues, but may not understand it yet. StrongLink manages data placement and movement, throughout this hierarchy to better support data access and economical storage. In the process StrongLink eliminates any data silos due to limitations of NAS systems while providing the most economic placement of data to meet user performance requirements.
Floyd said that StrongLink first installs in customer environment and then operates in the background to discover and ingest metadata from the primary customers file storage environment. Some point later the customer reconfigures their end-users share and mount points to StrongLink servers and it’s up and starts running.
The minimal StrongLink, HA environment consists of 3 nodes. They use a NoSQL metadata database which is replicated and sharded across the nodes. It’s shared for performance load balancing and fully replicated (2-way or 3-way) across all the StrongLink server nodes for HA.
The StrongLink nodes create a cluster, called a star in StrongBox vernacular. Multiple clusters onsite can be grouped together to form a StrongLink constellation. And multiple data center sites, can be grouped together to form a StrongLink galaxy. Presumably if you have a constellation or a galaxy, the same metadata is available to all the star clusters across all the sites.
They support any tape library and any NFS, SMB, S3 orAzure compatible object or file storage. Stronglink can move or copy data from one tier/cluster to another based on policies AND the end-users never sees any difference in their workflow or mount/share points.
One challenge with typical tape archives is that they can make use of proprietary tape data formats which are not accessible outside those systems. StrongLink has gone with a completely open-source, LTFS file format on tape, which is well documented and is available to anyone.
Floyd also made it a point of saying they don’t use any stubs, or soft links to provide their data placement magic. They only use standard file metadata.
File data moves across the hierarchy based on policies or by request. One of the secrets to StrongLink success is all the work they have done to ensure that any data movement can occur at line rate speeds. They heavily parallelize any data movement that’s required to support data placement across as many servers as the customer wants to throw at it. StrongBox services will help right-size the customer deployment to support any data movement performance that is required.
StrongLink supports up to 3-way replication of a customer’s data archives. This supports a primary archive and 3 more replicas of data.
Floyd mentioned a couple of big customers:
- One autonomous automobile supplier, was downloading 2PB of data from cars in the field, processing this data and then moving it off their servers to get ready for the next day’s data load.
- Another weather science research organization, had 150PB of data in an old tape archive and they brought in StrongLink to migrate all this data off and onto LTFS tape format as well as support their research activities which entail staging a significant chunk of file data on research servers to do a climate run/simulation.
NASA, another StrongLink customer, operates slightly differently than the above, in that they have integrated StrongLink functionality directly into their applications by making use of StrongBox’s API.
StrongLink can work in three ways.
- Using normal file access services where StrongLink virtualizes your NFS, SMB, S3 or Posix file environment. For this service StrongLink is in the data path and you can use policy based management to have data moved or staged as the need arises.
- Using StrongLink CLI to move or copy data from one tier to another. Many HPC customers use this approach through SLURM scripts or other orchestration solutions.
- Using StrongLink API to move or copy data from one tier to another. This requires application changes to take advantage of data placement.
StrongBox customers can of course, use all three modes of operation, at the same time for their StrongLink data galaxy. StrongLink is billed by CPU/vCPU level and not for the amount of data customers throw into the archive. This has the effect of Customers gaining a flat expense cost, once StrongLink is deployed, at least until they decide to modify their server configuration.
Floyd Christofferson, CEO StrongBox Data Solutions
As a professional involved in content management and storage workflows for over 25 years, Floyd has focused on methods and technologies needed to manage massive volumes of data across many different storage types and use cases.
Prior to joining SBDS, Floyd worked with software and hardware companies in this space, including over 10 years at SGI, where he managed storage and data management products. In that role, he was part of the team that provided solutions used in some of the largest data environments around the world.
Floyd’s background includes work at CBS Television Distribution, where he helped implement file-based content management and syndicated content distribution strategies, and Pathfire (now ExtremeReach), where he led the team that developed and implemented a satellite-based IP-multicast content distribution platform that manages delivery of syndicated content to nearly 1,000 TV stations throughout the US.
Earlier in his career, he ran Potomac Television, a news syndication and production service in Washington DC, and Manhattan Center Studios, an audio, video, graphics, and performance facility in New York.