Bringing compute to storage

Researchers at MIT (see Storage system for ‘big data’ dramatically speeds access to information) have come up with a novel storage cluster using FPGAs and flash chips to create a new form of database machine.

In their system they have an FPGA that supports limited computational offload/acceleration along with flash controller functionality for a set of flash chips. They call their system the BlueDBM or Blue Database Machine.

Their storage device is used as PCIe flash card on a host PC. But in their implementation each of the PCIe flash cards are interconnected via an FPGA serial link. This approach creates a distributed controller across all the PCIe flash cards in the host servers and allows any host PC to access any of the flash card data at high speed.

They claim that node to node access latencies are on the order of 60-80 microseconds and their distributed controller can sustain 70% of theoretical system bandwidth.  In their prototype 4-node system their performance testing shows that it’s an order of magnitude faster than Microsoft Research’s CORFU (Cluster of Raw Flash Units).

Why FPGAs?

There are two novel aspects of their system: 1 ) Is the computational offload capabilities provided by the FPGA in front of the flash and 2) Is their implementation of a  distributed controller across the storage nodes using the FPGA serial network.

Both of these characteristics are dependent on the FPGA. Also by using FPGAs system cost would be less and the FPGAs had a readily available, internally supported serial link that could be used.

But by using an FPGA, the computational capabilities are more limited and re-configurating (re-programming) the storage cluster’s compute capabilities will take more time. If they used a more general purpose CPU in front of the flash chips they could support a much richer computational offload next to the storage chips.  For example, in their prototype the FPGAs supported ‘word-counting’ offload functionality.

Nonetheless, as most flash storage these days already have a fairly sophisticated controller, it’s not much of a stretch to bump this compute power up to something a bit more programmable and make its functionality more available via APIs.  I suppose to gain equivalent performance this would need to use PCIe flash cards.

Where they would get the internal card to card serial link with general purpose CPUs may be a concern, which brings up another question.

The distributed controller gives them what exactly?

I believe that with a serial link based distributed controller they don’t need a full networking stack to access the PCIe flash storage on other nodes. This should save both access time and compute power.

In follow on work, the MIT researchers plan to implement a Linux based, distributed file system across the BlueDBM. This should give them a more normal storage stack for their system. How this may interact with the computational offload capabilities is another question.

I would have to say the reduction in access latency is what they were after with the distributed controller and they seem to have achieved it, as noted above. I suppose something similar could be done with multiple PCIe cards in the same host but with the potential to grow from 4 to 20 nodes, the BlueDBM starts to look more interesting.

What sort of application could use such a device?

They talked about performing near real-time analysis of scientific data or modeling all the particles in a simulation of the universe.  But just about any application that required extremely low access time with limited data services could potentially take advantage of their storage system. High Frequency Trading comes to mind.

As for big data applications, I haven’t heard of any big data deployments that use SSDs for basic storage let alone PCIe flash cards. I don’t believe there’s going to be a lot of big data analytics that has need for this fast a storage system.

~~~~

Utilizing excess compute power in a storage controller has been an ongoing dream for a long time. Aside from running VMs and a couple of other specialized services such as A-V scanning within a storage controller there hasn’t been a lot of this type of functionality  ever released for use inside a storage controller. With software defined storage coming online, it may not even make that much sense anymore.

MIT research’s BlueDBM solution is somewhat novel but unless they can more easily generalize the computational offload it doesn’t seem as if it will become a very popular way to go for analytics applications.

As for their reduction in access latencies, that might have some legs if they can put more storage capacity behind it and continue to support similar access latencies. But they will need to provide a more normal access method to it. The distributed Linux file system might be just the ticket to get this off into the market.

Comments?

Photo Credits: Lightening by Jolene

Racetrack memory gets rolling

A recent MIT study showed how new technology can be used to control and write magnetized bits in nano-structures, using voltage alone. This new technique also consumes much less power than using magnets or magnetism as well.

They envision a sort of nano-circuit, -wire or -racetrack with a series of transistor-like structures spaced at regular intervals above it.  Nano-bits would be racing around these nano-wires as a series of magnetized domains.  These new transitor-like devices would be a sort of onramp for the bits as well as stop-lights/speed limits for the racetrack.

Magnetic based racetrack memory issues

The problems with using magnets to write the bits in nano-racetrack is that magnetism casts a wide shadow and can impact adjacent race tracks, sort of like shingled writes (we last discussed in Shingled magnetic recording disks).   The other problem has been a way to (magnetically) control the speed of racing bits so they can be isolated and read or written effectively.

Magneto-ionic racetrack memory solutions

But MIT researchers have discovered a way to use voltage to change the magnetic orientation of a bit on a race track.  They also found a way through the use of voltage to precisely control the position of magnetic bits speeding around the track and to electronically isolate and select a bit.

What they have created is sort of a transistor for magnetized domains using ion-rich materials.  Voltages can be used to attract or repel those ions and then those ions can interact with flowing magnetic domains to speed up or slow down the movement of magnetic domains.

Thus, the transistor-like device can  be set to attract (or speed up) magnetized domains, slow down magnetized domains or stop them and also be used to change the magnetic orientation of a domain.  MIT researchers call these devices Magneto-ionic devices.

Racetrack memory redefined

So now we have a way to (electronically) seek to bit data on a race track,  a way to precisely (electronically) select bits on the race track, and a way to precisely (electronically) write data on a race track.  And presumably, with an appropriate (magnetic) read head, a way to read this data.  As an added bonus, apparently data once written on the racetrack requires no additional power to stay magnetized.

So the transistor-like devices are a combination of write heads, motors and brakes for the racetrack memory.  Not sure,  but if they can write, slow down and speedup magnetic domains, why can’t they read them as well that way the transistor-like devices could be a read head as well.

Why do they need more than one write-head per track. It seems to me that one should suffice for a fairly long track, not unlike disk drives. I suppose  more of them would make the track faster to write. But  they would all have to operate in tandem, speeding up or stoping the racing bits on the track all together and then starting them all back up, together again.  Maybe this way they can write a byte or a word or a chunk of data all at the same time.

In any event, it seems that race track memory took a (literally) quantum leap  forward with this new research out of MIT.

Racetrack memory futures

IBM has been talking about race track memory for some time now and this might be the last hurdle to overcome to getting there (we last discussed this in A “few exabytes-a-day” from SKA post).

In addition,  there doesn’t appear to be any write cycle, bit duration or the need for erasing whole page issues with this type of technology.  So as an underlying storage for a new sort of semi-conductor storage device (SSD) this has significant inherent advantages.

Not to mention that is all based on nano-based device sizes which means that it can pack a lot of bits in very little volume or area.  So SSDs based on these racetrack memory technologies will be denser, faster, and require less energy – could you want.

Image: Nürburgring 2012 by Juriën Minke