Researchers at MIT (see Storage system for ‘big data’ dramatically speeds access to information) have come up with a novel storage cluster using FPGAs and flash chips to create a new form of database machine.
In their system they have an FPGA that supports limited computational offload/acceleration along with flash controller functionality for a set of flash chips. They call their system the BlueDBM or Blue Database Machine.
Their storage device is used as PCIe flash card on a host PC. But in their implementation each of the PCIe flash cards are interconnected via an FPGA serial link. This approach creates a distributed controller across all the PCIe flash cards in the host servers and allows any host PC to access any of the flash card data at high speed.
They claim that node to node access latencies are on the order of 60-80 microseconds and their distributed controller can sustain 70% of theoretical system bandwidth. In their prototype 4-node system their performance testing shows that it’s an order of magnitude faster than Microsoft Research’s CORFU (Cluster of Raw Flash Units).
There are two novel aspects of their system: 1 ) Is the computational offload capabilities provided by the FPGA in front of the flash and 2) Is their implementation of a distributed controller across the storage nodes using the FPGA serial network.
Both of these characteristics are dependent on the FPGA. Also by using FPGAs system cost would be less and the FPGAs had a readily available, internally supported serial link that could be used.
But by using an FPGA, the computational capabilities are more limited and re-configurating (re-programming) the storage cluster’s compute capabilities will take more time. If they used a more general purpose CPU in front of the flash chips they could support a much richer computational offload next to the storage chips. For example, in their prototype the FPGAs supported ‘word-counting’ offload functionality.
Nonetheless, as most flash storage these days already have a fairly sophisticated controller, it’s not much of a stretch to bump this compute power up to something a bit more programmable and make its functionality more available via APIs. I suppose to gain equivalent performance this would need to use PCIe flash cards.
Where they would get the internal card to card serial link with general purpose CPUs may be a concern, which brings up another question.
The distributed controller gives them what exactly?
I believe that with a serial link based distributed controller they don’t need a full networking stack to access the PCIe flash storage on other nodes. This should save both access time and compute power.
In follow on work, the MIT researchers plan to implement a Linux based, distributed file system across the BlueDBM. This should give them a more normal storage stack for their system. How this may interact with the computational offload capabilities is another question.
I would have to say the reduction in access latency is what they were after with the distributed controller and they seem to have achieved it, as noted above. I suppose something similar could be done with multiple PCIe cards in the same host but with the potential to grow from 4 to 20 nodes, the BlueDBM starts to look more interesting.
What sort of application could use such a device?
They talked about performing near real-time analysis of scientific data or modeling all the particles in a simulation of the universe. But just about any application that required extremely low access time with limited data services could potentially take advantage of their storage system. High Frequency Trading comes to mind.
As for big data applications, I haven’t heard of any big data deployments that use SSDs for basic storage let alone PCIe flash cards. I don’t believe there’s going to be a lot of big data analytics that has need for this fast a storage system.
Utilizing excess compute power in a storage controller has been an ongoing dream for a long time. Aside from running VMs and a couple of other specialized services such as A-V scanning within a storage controller there hasn’t been a lot of this type of functionality ever released for use inside a storage controller. With software defined storage coming online, it may not even make that much sense anymore.
MIT research’s BlueDBM solution is somewhat novel but unless they can more easily generalize the computational offload it doesn’t seem as if it will become a very popular way to go for analytics applications.
As for their reduction in access latencies, that might have some legs if they can put more storage capacity behind it and continue to support similar access latencies. But they will need to provide a more normal access method to it. The distributed Linux file system might be just the ticket to get this off into the market.
Photo Credits: Lightening by Jolene