Was at AI Infrastructure Field Day 3 (AIIFD3) last week in CA and Hammerspace presented. (videos here). Molly and Floyd talked about their solution and some of their recent MLCommon’s performance results but Kurt discussed the Open Flash Platform (OFP) Consortium, announced last July which they and partners have been working on..
OFP currently has 6 partners ranging from Hammerspace (storage software supplier), SK Hynix (NAND and SSDs) and Linux Foundation among others and includes end users (Las Alamos National Labs), computational storage (ScaleFlux) and AI solution providers (Xsight).
As I understand it, the OFP is pushing to become a standard adopted by the Open Compute Project (OCP).
OFP is an attempt to redefine NAS as we know it. Hammerspace has been on this journey for a long time with their software only solution but technology is now at a place where it’s time to tackle hardware changes to NAS that would enable even better performance and throughput for AI and other data intensive workloads.
Some of the technology changes driving the need for a different approach to NAS storage:
- NAND capacities are going through the roof, accessing all that capacity in an effective and performant way, requires a re-architecturing of the storage stack
- Compute is becoming more widespread and ubiquitous. Every thing seems to have more and more compute capability that it’s causing a rethink as to how to take advantage of all this ubiquitous compute to better address IT (and AI) performance needs
- AI bandwidth and performance requirements are extreme and are only becoming more so. .
- Power has become a limiting factor in many AI deployments.
Hammerspace has addressed much of this from a software perspective with their Linux standards efforts to implement Parallel File System and Flex Files in the Linux kernel and in NFS standards as NFSv4.2. PFS and FlexFiles allows Hammerspace to offer very high file bandwidth and data mobility that can’t be supplied any other way.
So it’s time to see what can be done in hardware to make this even better. Enter OFP.
OFP, NAS storage reborn

The idea is to come up with a new packaging of an NFS (v3) server that’s all storage with high amounts of networking and enough compute to serve the storage. Effectively they are putting a DPU (computational intensive networking card) with 1-800Gbps Ethernet connection in front of a train (or toboggan) of NVMe SSDs and calling this a sled.
Their first version using U.2 NVMe SSDs, offers 1PB of capacity with 800Gbps of networking in a 3.5″ X 1.75″ form factor. They would load a NFS v3 Linux based storage server in the DPU and have it run that along with the Networking stack (and more) on the DPU and have access to all this storage capacity in what essentially is a NFSv3 (relatively dumb storage) storage sled.

Package 6 of these together with a couple of power supplies and now you have 6PB raw capacity in 1RU, with 4.8Tbps of bandwidth, consuming .6 kW of power (presumably this is power consumption at idle).
You will no doubt note that the sled, as configured above, does not allow for hot (or even cold) drive replacement. So when drives fail, the NFSv3 code would need to recover from them and take them out of service. So that over time the sled could still be used even though some SSDs have failed.

In the future, moving from U.2 SSDs to E2(E) NVMe SSDs in the storage sled quadruples the capacity while staying in the same power envelope and supplying the same bandwidth. Again the SSDs are not intended to be (hot or cold) swappable, so drive failure would need to be handled by software. With E2(E) SSDs in a sled and 6 of these in a 1RU, one would have 24PB of storage capacity.
Presumably, OFP Sleds could be hot swappable when enough SSDs in a sled fails.
And of course QLC capacities are not standing still so another doubling of these capacities could easily be possible within the next couple of years (imagine 48PB in a single RU, boggles the mind).
The NAS software one runs in the OFP SLED could be any NFSv3 server software but Hammerspace has their own, called DSX. And when you combine DSX servers with lots of capacity and lots of networking bandwidth, Hammerspace’s NFSv4.2 PFS and FlexFiles can really fly.
And with the power and space efficiency as well as extreme bandwidth available, it could be a winning formula for the AI environments, in contrast to scale-out NAS which is the current alternative.

~~~~
But it seems to me any organization (hypervisors are you listening) with intense storage capacity and storage bandwidth needs would be very interested in the OFP for their own environment.
Comments?









The new material, “hexa-tert-butyldysprosocenium complex—[Dy(Cpttt)2][B(C6F5)4], with Cpttt = {C5H2tBu3-1,2,4} and tBu = C(CH3)3“, dysprosocenium for short was designed (?) by the researchers at Manchester and was shown to exhibit magnetism at the molecular level at 60K.


