Supercomputing Conference 2019 (SC19) is coming to Denver next week and in anticipation of that show, we thought it would be a good to talk with some HPC storage group. We contacted HPE and given their recent acquisition of Cray, they offered up Larry and Mark to talk about their new ClusterStor E1000 storage system.
There are a number of components that go into Cray supercomputers and besides the ClusterStor, Larry and Mark mentioned their new SlingShot cluster interconnect which is Ethernet based with significant enhancements to congestion handling. But the call focused on ClusterStor.
What is ClusterStor
ClusterStor, is a Lustre file system hardwareappliance. Lustre has always been popular with the HPC crowd as it offered high bandwidth file services. But Lustre often took a team of (PhD) scientists to configure, deploy and run properly because of all the parameters that had to be setup for optimum performance.
Cray’s ClusterStor was designed to make configuring, deploying and running Lustre a lot simpler with a GUI and system defaults that provided an optimal running environment. But if customers still want access to all Lustre features and functionality, all the Lustre parameters can still be tweaked to personalize it.
What sort of appliance
The ClusterStore team has created a Lustre storage appliance using two systems, a 2U-24 NVMe SSD system and a 4U-106 disk drive system. Both systems use PCIe Gen 4 buses which offer 2X the bandwidth of Gen 3 and NVMe Gen 4 SSDs. Each ClusterStore E1000 appliance comes with 2 servers for HA and the storage behind it.
Larry said the 2U NVMe Gen 4 appliance offers 80GB/sec of read and 60GB/sec of write data bandwidth. And a full rack of these, could support ~2.5TB/sec of data bandwidth. One TB/sec seems like an awful lot to the GreyBeards, 2.5TB/sec, out of this world.
We asked if it supported InfiniBAND interconnects? Yes, they said it supports the latest generation of InfiniBAND but it also offers Cray’s own (SlingShot) Ethernet interconnect, unusual for HPC environments. And as in any Lustre parallel file system, servers accessing storage use Lustre client software.
ClusterStor Data Services
But on the backend, where normally one would see only LDISKFS for backend storage, ClusterStor also offers ZFS. Larry and Mark said that LDISKFS is faster but ZFS offers more functionality like snapshots and data compression.
Many of the Top 100 & Top 500 supercomputing environments are starting to deploy ML DL (machine learning-deep learning) workloads along with their normal HPC activities. But whereas HPC work has historically depended on bandwidth to read, write and move large files around, ML DL deals with small files and needs high IOPS. ClusterStor was designed to satisfy both high bandwidth and high IOPS workloads.
In previous HPC Lustre flash solutions, customers had to deal with the complexity of where to place data, such as on flash or on disk. But with net ClusterStor E1000, the system can do all this for you. That is it will move data from disk to flash when it sees an advantage to doing so and move it back again when that advantage is gone. But, just as with Lustre configuration parameters above, customers can still pre-stage data to flash.
The other challenge for HPC environments is extreme size. Cray and others are starting to see requirements for Exascale (exabyte, 10**18) byte) storage systems. In fact, Cray has a couple of ClusterStor E1000 configurations of 400PB or more already, As these systems age they may indeed grow to exceed an exabyte.
With an exabyte of data, systems need to support billions of files/inodes and better metadata services and indexing. ClusterStor offers optimized inode indexing and search to enable HPC users to quickly find the data they are looking for. Further, ClusterStor offers, data at rest encryption and supports virtual file systems, for multi-tenancy.
With a ZFS backend, ClusterStor can supply data compression and snapshots. Cray has tested ZFS compression on HPC scientific ( mostly already application compressed) data and still see ~30% reduction is storage footprint. At an exabyte of storage 30% can be a significant cost reduction
The podcast ran long, ~46 minutes. Larry and Mark had a good knowledge of the HPC storage space and were easy to talk with. Matt’s an old ZFS hand, so wanted to take even more about ZFS. I had a good time discussing ClusterStor and Lustre features/functionalit and how the HPC workloads are changing. Listen to the podcast to learn more. [The podcast was recorded on November 6th, not the 5th as mentioned in the lead in, Ed.]
Larry Jones, Director Storage Product Management
Larry Jones is a director of storage product management for Cray, a Hewlett Packard Enterprise company.
Jones previously held senior product management roles at Seagate, DDN and Panasas.
Mark Wiertalla, Director Storage Product Marketing
Mark Wiertalla is a product marketing director for Cray, a Hewlett Packard Enterprise company.
Prior to Cray, Wiertalla held product manager roles at EMC and SGI.