IBM using PCM to implement better AI – round 6

Saw a recent article that discussed IBM’s research into new computing architectures that are inspired by brain computational techniques (see A new brain inspired architecture … ). The article reports on research done by IBM R&D into using Phase Change Memory (PCM) technology to implement various versions of computer architectures for AI (see Tutorial: Brain inspired computation using PCM, in the AIP Journal of Applied Physics).

As you may recall, we have been reporting on IBM Research into different computing architectures to support AI processing for quite awhile now, (see: Parts 1, 2, 3, 4, & 5). In our last post, More power efficient deep learning through IBM and PCM, we reported on a unique hybrid PCM-silicon solution to deep learning computation.

Readers should also be familiar with PCM as well as it’s been discussed at length in a number of our posts (see The end of NAND is near, maybe; The future of data storage is MRAM; and New chip architectures with CPU, storage & sensors …). MRAM, ReRAM and current 3D XPoint seem to be all different forms of PCM (I think).

In the current research, IBM discusses three different approaches to support AI  utilizing PCM devices. All three approaches stem from the physical characteristics of PCM.

(Some) PCM physics

FIG. 2. (a) Phase-change memory is based on the rapid and reversible phase transition of certain types of materials between crystalline and amorphous phases by the application of suitable electrical pulses. (b) Transmission electron micrograph of a mushroom-type PCM device in a RESET state. It can be seen that the bottom electrode is blocked by the amorphous phase.

It turns out that PCM devices have many  characteristics that lend themselves to be useful for specialized computation. PCM devices crystalize and melt in order to change state. The properties associated with melting and crystallization of the PCM media cell can be used to support unique forms of computation. Some of these PCM characteristics include::

  • Analog, not digital memory – PCM devices are, at the core, an analog memory device. We mean that they don’t record just a 0 or 1 (actually resistant or conductive) state, but rather a continuum of values between those two.
  • PCM devices have an accumulation capability –   each PCM cell actually  accumulates a level of activation. This means that one cell can be more or less likely to change state depending on prior activity.
  • PCM devices are noisy – PCM cells arenot perfect recorders of state chang signals  but rather have a well known, random noise which impacts the state level attained, that can be used to introduce randomness into processing.

The other major advantage of PCM devices is that they take a lot less power than a GPU-CPU to work.

Three ways to use PCM for AI learning

FIG. 4. “In-memory computing,” computation is performed in place by exploiting the physical attributes of memory devices organized as a “computational memory” unit. For example, if data A is stored in a computational memory unit and if we would like to perform f(A), then it is not required to bring A to the processing unit. This saves energy and time that would have to be spent in the case of conventional computing system and memory unit. Adapted from Ref. 19.

The Applied Physics article describes three ways to use PCM devices in AI learning. These three include:

  1. Computational storage – which uses the analog capabilities of PCM to perform  arithmetic and learning computations. In a sort of combined compute and storage device.
  2. AI co-processor – which uses PCM devices, in an “all PCM nodes connected to all other PCM nodes” operation that could be used to perform neural network learning. In an AI co-processor there would be multiple all connected PCM modules, each emulating a neural network layer.
  3. Spiking neural networks –  which uses PCM activation accumulation characteristics & inherent randomness to mimic, biological spiking neuron activation.
FIG. 11.
A proposed chip architecture for a co-processor for deep learning based on PCM arrays.28

It’s the last approach that intrigues me.

Spiking neural nets (SNN)

FIG. 12. (a) Schematic illustration of a synaptic connection and the corresponding pre- and post-synaptic neurons. The synaptic connection strengthens or weakens based on the spike activity of these neurons; a process referred to as synaptic plasticity. (b) A well-known plasticity mechanism is spike-time-dependent plasticity (STDP), leading to weight changes that depend on the relative timing between the pre- and post-synaptic neuronal spike activities. Adapted from Ref. 31.

Biological neurons accumulate charge from all input (connected) neurons and when they reach some input threshold, generate an output signal or spike. This spike is then used to start the process with another neuron up stream from it

Biological neurons also exhibit randomness in their threshold-spiking process.

Emulating spiking neurons, n today’s neural nets, takes computation.  Also randomness takes more.

But with PCM SNN, both the spiking process and its randomness, comes from device physics. Using PCM to create SNN seems a logical progression.

PCM as storage, as memory, as compute or all the above

In the storage business, we look at Optane (see our 3D Xpoint post) SSDs as blazingly fast storage. Intel has also announced that they will use 3D Xpoint in a memory form factor which should provide sadly slower, but larger memory devices.

But using PCM for compute, is a radical departure from the von Neumann computer architectures we know and love today. HPE has been discussing another new computing architecture with their memristor technology, but only in prototype form.

It seems IBM, is also prototyping hardware done this path.

Welcome to the next computing revolution.

Photo & Caption Credit(s): Photo and caption from Figure 2 in AIP Journal of Applied Physics article

Photo and caption from Figure 4 in AIP Journal of Applied Physics article

Photo and caption from Figure 11 in AIP Journal of Applied Physics article

Photo and caption from Figure 12 in AIP Journal of Applied Physics article

 

 

ReRAM to the rescue

I was at the Solid State Storage Symposium a couple of weeks ago where Robin Harris (StorageMojo) gave the keynote presentation. In his talk, Robin mentioned a new technology on the horizon which holds the promise of replacing DRAM, SRAM and NAND called resistive random access memory (ReRAM or RRAM).

If so, ReRAM will enter the technological race pitting MRAM, Graphene Flash, PCM and racetrack memory as followons for NAND technology.  But none of these have any intention of replacing DRAM.

Problems with NAND

There are a few problems with NAND today but the main problem that affects future NAND technologies is as devices shrink they lose endurance. For instance, today’s SLC NAND technology has an endurance of ~100K P/E (program/erase) cycles, MLC NAND can endure around 5000 P/E cycles and eMLC somewhere in between.  Newly emerging TLC (three bits/cell) has less even endurance than MLC.

But that’s all at 30nm or larger.  The belief is that as NAND feature size shrinks below 20nm its endurance will get much worse, perhaps orders of magnitude worse.

While MLC may be ok for enterprise storage today, much less than 5000 P/E cycles could become a problem and would require ever more sophistication in order to work around these limitation.    Which is why most enterprise class, MLC NAND based storage uses specialized algorithms and NAND controller functionality to support storage reliability and durability.

ReRAM solves NAND, DRAM and NvRAM problems.

Enter ReRAM, it has the potential to be faster than PCM-RAM, has smaller features than MRAM which means more bits per square inch and uses lower voltage than racetrack memory and NAND.    The other nice thing about ReRAM is that it seems readily scaleable to below 30nm feature geometries.  Also as it’s a static memory it doesn’t have to be refreshed like DRAM and thus uses less power.

In addition, it appears that  ReRAM is much more flexible than NAND or DRAM which can be designed and/or tailored to support different memory requirements.   Thus, one ReRAM design can be focused on standard  DRAM applications while another ReRAM design can be targeted at mass storage or solid state drives (SSD).

On the negative side there are still some problems with ReRAM, namely the large “sneak parasitic current” [whatever that is] that impacts adjacent bit cells and drains power.  There are a few solutions to this problem but none yet completely satisfactory.

But it’s a ways out, isn’t it?

No it’s not. BBC and Tech-On reported that Panasonic will start sampling devices soon and plan to reach volume manufacturing next year.   Elpida-Sharp  and HP-Hynix are also at work on ReRAM (or memristor) devices and expect to ship sometime in 2013.  But for the moment it appears that Panasonic is ahead of the pack.

At first, these devices will likely emerge in low power applications but as vendors ramp up development and mass production it’s unclear where it will ultimately end up.

The allure of ReRAM technology is significant in that it holds out the promise of replacing both RAM and NAND used in consumer devices as well as IT equipment with the same single technology.  If you consider that the combined current market for DRAM and NAND is over $50B, people start to notice.

~~~~

Whether ReRAM will meet all of its objectives is yet TBD.  But we seldom see any one technology which has this high a potential.  The one remaining question is why everybody else isn’t going after ReRAM as well, like Samsung, Toshiba and Intel-Micron.

I have to thank StorageMojo and the Solid State Storage Symposium team for bringing ReRAM to my attention.

[Update] @storagezilla (Mark Twomey) said that “… Micron’s aquisition of Elpida gives them a play there.”

Wasn’t aware of that but yes they are definitely in the hunt now.

Comments?

Image: Memristor by Luke Kilpatrick

 

A “few exabytes-a-day” from SKA

A number of radio telescopes, positioned close together pointed at a cloudy sky
VLA by C. G. P. Grey (cc) (from Flickr)

ArsTechnica reported today on the proposed Square Kilometer Array (SKA) radio telescope and it’s data requirements. IBM is in collaboration with the Netherlands Institute for Radio Astronomy (ASTRON) to help develop the SKA called the DOME project.

When completed in ~2024, the SKA will generate over an exabyte a day (10**18) of raw data.  I reported in a previous post how the world was generating an exabyte-a-day, but that was way back in 2009.

What is the SKA?

The new SKA telescope will be a configuration of “millions of radio telescopes” which when combined together will create a telescope with an aperture of one square kilometer, which is no small feet.  They hope that the telescope will be able to shed some light on galaxy evolution, cosmology and dark energy.  But it will go beyond that to investigating “strong-field tests of gravity“, “origins and evolution of cosmic magnetism” and search for life on other planets.

But the interesting part from a storage perspective is that the SKA will be generating a “few exabytes a day” of radio telescopic data for every full day of operation.   Apparently the new radio telescopes will make use of a new, more sensitive detector able to generate data of up to 10GB/second.

How much data, really?

The team projects final storage needs at between 300 to 1500 PB per year. This compares to the LHC at CERN which consumes ~15PB of storage per year.

It would seem that the immediate data download would be the few exabytes and then it would be post- or inline-processed into something more mangeable and store-able.  Unless they have some hellaciously fast processing, I am hard pressed to believe this could all happen inline.  But then they would need at least another “few exabytes” of storage to buffer the data feed before processing.

I guess that’s why it’s still a research project.  Presumably, this also says that the telescope won’t be in full operation every day of the year, at least at first.

The IBM-ASTRON DOME collaboration project

The joint research project was named for the structure that covers a major telescope and for a famous Swiss mountain.  Focus areas for the IBM-ASTRON DOME project include:

  • Advanced high performance computing utilizing 3D chip stacks for better energy efficiency
  • Optical interconnects with nanophotonics for high-speed data transfer
  • Storage for both high access performance access and for dense/energy efficient data storage.

In this last focus area, IBM is considering the use of phase change memories (PCM) for high access performance and new generation tape for dense/efficient storage.  We have discussed PCM before in a previous post as an alternative to NAND based storage today (see Graphene Flash Memory).  But IBM has also been investigating MRAM based race track memory as a potential future storage technology.  I would guess the advantage of PCM over MRAM might be access speed.

As for tape, IBM has already demonstrated in their labs technologies for a 35TB tape. However storing 1500 PB would take over 40K tapes per year so they may need another even higher capacities to support SKA tape data needs.

Of course new optical interconnects will be needed to move this much data around from telescope to data center and beyond.  It’s likely that the nanophotonics will play some part as an all optical network for transceivers, amplifiers, and other networking switching gear.

The 3D chip stacks have the advantage of decreasing chip IO and more dense packing of components will make efficient use of board space.  But how these help with energy efficiency is another question.  The team projects very high energy and cooling requirements for their exascale high performance computing complex.

If this is anything like CERN, datasets gathered onsite are initially processed then replicated for finer processing elsewhere (see 15PB a year created by CERN post.  But moving PBs around like SKA will require is way beyond today’s Internet infrastructure.

~~~~

Big science like this gives a whole new meaning to BIGData. Glad I am in the storage business.  Now just what exactly is nanophotonics, mems based phote-electronics?

Graphene Flash Memory

Model of graphene structure by CORE-Materials (cc) (from Flickr)
Model of graphene structure by CORE-Materials (cc) (from Flickr)

I have been thinking about writing a post on “Is Flash Dead?” for a while now.  Well at least since talking with IBM research a couple of weeks ago on their new memory technologies that they have been working on.

But then this new Technology Review article came out  discussing recent research on Graphene Flash Memory.

Problems with NAND Flash

As we have discussed before, NAND flash memory has some serious limitations as it’s shrunk below 11nm or so. For instance, write endurance plummets, memory retention times are reduced and cell-to-cell interactions increase significantly.

These issues are not that much of a problem with today’s flash at 20nm or so. But to continue to follow Moore’s law and drop the price of NAND flash on a $/Gb basis, it will need to shrink below 16nm.  At that point or soon thereafter, current NAND flash technology will no longer be viable.

Other non-NAND based non-volatile memories

That’s why IBM and others are working on different types of non-volatile storage such as PCM (phase change memory), MRAM (magnetic RAM) , FeRAM (Ferroelectric RAM) and others.  All these have the potential to improve general reliability characteristics beyond where NAND Flash is today and where it will be tomorrow as chip geometries shrink even more.

IBM seems to be betting on MRAM or racetrack memory technology because it has near DRAM performance, extremely low power and can store far more data in the same amount of space. It sort of reminds me of delay line memory where bits were stored on a wire line and read out as they passed across a read/write circuit. Only in the case of racetrack memory, the delay line is etched in a silicon circuit indentation with the read/write head implemented at the bottom of the cleft.

Graphene as the solution

Then along comes Graphene based Flash Memory.  Graphene can apparently be used as a substitute for the storage layer in a flash memory cell.  According to the report, the graphene stores data using less power and with better stability over time.  Both crucial problems with NAND flash memory as it’s shrunk below today’s geometries.  The research is being done at UCLA and is supported by Samsung, a significant manufacturer of NAND flash memory today.

Current demonstration chips are much larger than would be useful.  However, given graphene’s material characteristics, the researchers believe there should be no problem scaling it down below where NAND Flash would start exhibiting problems.  The next iteration of research will be to see if their scaling assumptions can hold when device geometry is shrunk.

The other problem is getting graphene, a new material, into current chip production.  Current materials used in chip manufacturing lines are very tightly controlled and  building hybrid graphene devices to the same level of manufacturing tolerances and control will take some effort.

So don’t look for Graphene Flash Memory to show up anytime soon. But given that 16nm chip geometries are only a couple of years out and 11nm, a couple of years beyond that, it wouldn’t surprise me to see Graphene based Flash Memory introduced in about 4 years or so.  Then again, I am no materials expert, so don’t hold me to this timeline.

 

—-

Comments?