Hybrid digital training-analog inferencing AI

Read an article from IBM Research, Iso-accuracy DL inferencing with in-memory computing, the other day that referred to an article in Nature, Accurate DNN inferencing using computational PCM (phase change memory or memresistive technology) which discussed using a hybrid digital-analog computational approach to DNN (deep neural network) training-inferencing AI systems. It’s important to note that the PCM device is both a storage device and a computational device, thus performing two functions in one circuit.

In the past, we have seenPCM circuitry used in neuromorphic AI. The use of PCM here is not that (see our Are neuromorphic chips a dead end? post).

Hybrid digital-analog AI has the potential to be more energy efficient and use a smaller footprint than digital AI alone. Presumably, the new approach is focused on edge devices for IoT and other energy or space limited AI deployments.

Whats different in Hybrid digital-analog AI

As researchers began examining the use of analog circuitry for use in AI deployments, the nature of analog technology led to inaccuracy and under performance in DNN inferencing. This was because of the “non-idealities” of analog circuitry. In other words, analog electronics has some intrinsic capabilities that induce some difficulties when modeling digital logic and digital exactitude is difficult to implement precisely in analog circuitry.

The caption for Figure 1 in the article runs to great length but to summarize (a) is the DNN model for an image classification DNN with fewer inputs and outputs so that it can ultimately fit on a PCM array of 512×512; (b) shows how noise is injected during the forward propagation phase of the DNN training and how the DNN weights are flattened into a 2D matrix and are programmed into the PCM device using differential conductance with additional normalization circuitry

As a result, the researchers had to come up with some slight modifications to the typical DNN training and inferencing process to improve analog PCM inferencing. Those changes involve:

  • Injecting noise during DNN neural network training, so that the resultant DNN model becomes more noise resistant;
  • Flattening the resultant DNN model from 3D to 2D so that neural network node weights can be implementing as differential conductance in the analog PCM circuitry.
  • Normalizing the internal DNN layer outputs before input to the next layer in the model

Analog devices are intrinsically more noisy than digital devices, so DNN noise sensitivity had to be reduced. During normal DNN training there is both forward pass of inputs to generate outputs and a backward propagation pass (to adjust node weights) to fit the model to the required outputs. The researchers found that by injecting noise during the forward pass they were able to create a more noise resistant DNN.

Differential conductance uses the difference between the conductance of two circuits. So a single node weight is mapped to two different circuit conductance values in the PCM device. By using differential conductance, the PCM devices inherent noisiness can be reduced from the DNN node propagation.

In addition, each layer’s outputs are normalized via additional circuitry before being used as input for the next layer in the model. This has the affect of counteracting PCM circuitry drift over time (see below).

Hybrid AI results

The researchers modeled their new approach and also performed some physical testing of a digital-analog DNN. Using CIFAR-10 image data and the ResNet-32 DNN model. The process began with an already trained DNN which was then retrained while injecting noise during forward pass processing. The resultant DNN was then modeled and programed into a PCM circuit for implementation testing.

Part D of Figure 4 shows the Baseline which represents a completely digital implementation using FP32 multiplication logic; Experiment which represents the actual use of the PCM device with a global drift calibration performed on each layer before inferencing; Mode which represents theira digital model of the PCM device and its expected accuracy. Blue band is one standard-deviation on the modeled result.

One challenge with any memristive device is that over time its functionality can drift. The researchers implemented a global drift calibration or normalization circuitry to counteract this. One can see evidence of drift in experimental results between ~20sec and ~60 seconds into testing. During this interval PCM inferencing accuracy dropped from 93.8% to 93.2% but then stayed there for the remainder of the experiment (~28 hrs). The baseline noted in the chart used digital FP32 arithmetic for infererenci and achieved ~93.9% for the duration of the test.

Certainly not as accurate as the baseline all digital implementation, but implementing DNN inferencing model in PCM and only losing 0.7% accuracy seems more than offset by the clear gain in energy and footprint reduction.

While the simplistic global drift calibration (GDC) worked fairly well during testing, the researchers developed another adaptive (batch normalization statistical [AdaBS]) approach, using a calibration image set (from the training data) and at idle times, feed these through the PCM device to calculate an average error used to adjust the PCM circuitry. As modeled and tested, the AdaBS approach increased accuracy and retained (at least modeling showed) accuracy over longer time frames.

The researchers were also able to show that implementing part (first and last layers) of the DNN model in digital FP32 and the rest in PCM improved inferencing accuracy even more.

~~~~

As shown above, a hybrid digital-analog PCM AI deployment can provide similar accuracy (at least for CIFAR-10/ResNet-24 image recognition) to an all digital DNN model but due to the efficiencies of the PCM analog circuitry allowed for a more energy efficient DNN deployment.

Photo Credit(s):

IBM using PCM to implement better AI – round 6

Saw a recent article that discussed IBM’s research into new computing architectures that are inspired by brain computational techniques (see A new brain inspired architecture … ). The article reports on research done by IBM R&D into using Phase Change Memory (PCM) technology to implement various versions of computer architectures for AI (see Tutorial: Brain inspired computation using PCM, in the AIP Journal of Applied Physics).

But first please take our new poll:

As you may recall, we have been reporting on IBM Research into different computing architectures to support AI processing for quite awhile now, (see: Parts 1, 2, 3, 4, & 5). In our last post, More power efficient deep learning through IBM and PCM, we reported on a unique hybrid PCM-silicon solution to deep learning computation.

Readers should also be familiar with PCM as well as it’s been discussed at length in a number of our posts (see The end of NAND is near, maybe; The future of data storage is MRAM; and New chip architectures with CPU, storage & sensors …). MRAM, ReRAM and current 3D XPoint seem to be all different forms of PCM (I think).

In the current research, IBM discusses three different approaches to support AI  utilizing PCM devices. All three approaches stem from the physical characteristics of PCM.

(Some) PCM physics

FIG. 2. (a) Phase-change memory is based on the rapid and reversible phase transition of certain types of materials between crystalline and amorphous phases by the application of suitable electrical pulses. (b) Transmission electron micrograph of a mushroom-type PCM device in a RESET state. It can be seen that the bottom electrode is blocked by the amorphous phase.

It turns out that PCM devices have many  characteristics that lend themselves to be useful for specialized computation. PCM devices crystalize and melt in order to change state. The properties associated with melting and crystallization of the PCM media cell can be used to support unique forms of computation. Some of these PCM characteristics include::

  • Analog, not digital memory – PCM devices are, at the core, an analog memory device. We mean that they don’t record just a 0 or 1 (actually resistant or conductive) state, but rather a continuum of values between those two.
  • PCM devices have an accumulation capability –   each PCM cell actually  accumulates a level of activation. This means that one cell can be more or less likely to change state depending on prior activity.
  • PCM devices are noisy – PCM cells arenot perfect recorders of state chang signals  but rather have a well known, random noise which impacts the state level attained, that can be used to introduce randomness into processing.

The other major advantage of PCM devices is that they take a lot less power than a GPU-CPU to work.

Three ways to use PCM for AI learning

FIG. 4. “In-memory computing,” computation is performed in place by exploiting the physical attributes of memory devices organized as a “computational memory” unit. For example, if data A is stored in a computational memory unit and if we would like to perform f(A), then it is not required to bring A to the processing unit. This saves energy and time that would have to be spent in the case of conventional computing system and memory unit. Adapted from Ref. 19.

The Applied Physics article describes three ways to use PCM devices in AI learning. These three include:

  1. Computational storage – which uses the analog capabilities of PCM to perform  arithmetic and learning computations. In a sort of combined compute and storage device.
  2. AI co-processor – which uses PCM devices, in an “all PCM nodes connected to all other PCM nodes” operation that could be used to perform neural network learning. In an AI co-processor there would be multiple all connected PCM modules, each emulating a neural network layer.
  3. Spiking neural networks –  which uses PCM activation accumulation characteristics & inherent randomness to mimic, biological spiking neuron activation.
FIG. 11.
A proposed chip architecture for a co-processor for deep learning based on PCM arrays.28

It’s the last approach that intrigues me.

Spiking neural nets (SNN)

FIG. 12. (a) Schematic illustration of a synaptic connection and the corresponding pre- and post-synaptic neurons. The synaptic connection strengthens or weakens based on the spike activity of these neurons; a process referred to as synaptic plasticity. (b) A well-known plasticity mechanism is spike-time-dependent plasticity (STDP), leading to weight changes that depend on the relative timing between the pre- and post-synaptic neuronal spike activities. Adapted from Ref. 31.

Biological neurons accumulate charge from all input (connected) neurons and when they reach some input threshold, generate an output signal or spike. This spike is then used to start the process with another neuron up stream from it

Biological neurons also exhibit randomness in their threshold-spiking process.

Emulating spiking neurons, n today’s neural nets, takes computation.  Also randomness takes more.

But with PCM SNN, both the spiking process and its randomness, comes from device physics. Using PCM to create SNN seems a logical progression.

PCM as storage, as memory, as compute or all the above

In the storage business, we look at Optane (see our 3D Xpoint post) SSDs as blazingly fast storage. Intel has also announced that they will use 3D Xpoint in a memory form factor which should provide sadly slower, but larger memory devices.

But using PCM for compute, is a radical departure from the von Neumann computer architectures we know and love today. HPE has been discussing another new computing architecture with their memristor technology, but only in prototype form.

It seems IBM, is also prototyping hardware done this path.

Welcome to the next computing revolution.

Photo & Caption Credit(s): Photo and caption from Figure 2 in AIP Journal of Applied Physics article

Photo and caption from Figure 4 in AIP Journal of Applied Physics article

Photo and caption from Figure 11 in AIP Journal of Applied Physics article

Photo and caption from Figure 12 in AIP Journal of Applied Physics article

More power efficient deep learning through IBM and PCM

Read an article today from MIT Technical Review (TR) (AI could get 100 times more efficient with IBM’s new artificial synapses). Discussing the power efficiency of a new analog approach to neural nets and deep learning.

We have talked about IBM’s TrueNorth and Synapse neuromorphic devices  and PCM neural nets before (see: Parts 1, 2, 3, & 4).

The paper in Nature (Equivalent accuracy accelerated neural training using analogue memory ) referred to by the TR article is behind a pay wall. However, another ArsTechnica (Ars) article (Training a neural network in phase change memory beats GPUs) on the new research was a bit more informative.

Both articles discuss a new analog approach, using phase change memory (PCM) which has significant power/training efficiency when compared to today’s standard GPU AI processor. Both the TR and Ars papers report on IBM developments simulating a new (PCM based) neuromorphic device that reduces training  power consumption AND training time by a factor of 100.   But the Nature paper abstract says it reduces both power consumption and computational space (computations per sq mm) by a factor of 100, not exactly the same.

Why PCM

PCM is a nonvolatile memory technology (see part 4 above for more info) that uses electronically induced phase changes in a material to establish a 1’s or 0’s state for a PCM bit.

However, another advantage of PCM is that it also can take on a state between 0 and 1. This is bad for data memory/storage but good for neural nets.

For a PCM based neural net you could have a layer of PCM (neuron) structures and standard wiring that wires all the PCM neurons to the next layer down, for however many layers required for your neural net. The PCM value would indicate the strength of the connection between neurons (synapses).

But, the problem with a PCM neural net is that PCM states don’t provide enough graduations of values between 0 and 1 to fully map today’s neural net weights.

IBM’s latest design has two different tiers of neural nets

According to Ars article, IBM’s latest design has a two tier approach to using PCM in its neural net. The first, top tier uses a PCM structure and the second lower tier uses a more traditional, silicon based structure and together they implement the neural net.

The Ars article speaks of the new two tier design as providing two digit resolution for the weight between  neuron. The structure implemented in PCM determines the higher order digit and the more traditional, silicon based, neural net segment determines the lower order digit in the two digit neural net weight.

With this approach, training occurs mostly in the more traditional, silicon layer neural net, but every 100 or so training events (epochs),  information is used to modify the PCM structure as well. In this fashion, the PCM-silicon neural net is fine tuned using 1 out of 100 or so training events to correct the PCM layer and the other 99 or so training events to modify the silicon layer.

In addition, the silicon layer is apparently implemented in silicon to mimic the PCM layer, using capacitors and transistors.

~~~~

I wonder why not just use two tiers of PCM to do the same thing but it’s possible that training the silicon layer is more power efficient, speedy or both than the PCM layer.

The TR and Ars articles seem to make a point of saying this is analogue computing. And I would guess because the PCM and the silicon layer can take on many values between 0 and 1 that means it’s not digital.

Much of the article is based on combined hardware (built using 90nm technology) and software simulations of the new PCM-silicon neuromorphic device. However, simulations like this are a standard step in ASIC design process, and if successful, we would expect an chip to emerge from foundry within 6-12 months from now.

The Nature paper’s abstract indicated that they simulated the device using standard (MNIST, MNIST-backrand, CIFAR-10 and CIFAR-100) training datasets for handwritten digit recognition and color image classification/recognition. The new device was able to approach within 1% accuracy of software trained neural net with 1% the power and (when updated to latest foundry technologies) in 1% the space.

Furthermore, the abstract said that the current device supports ~205K synapses. The previous generation, IBM TrueNorth (see part 2 above) had the “equivalent of 1M neurons” and their earlier IBM SYNAPSE (see part 1 above) chip had “256K programable synapses” and 256 computational elements. But I believe both of those were single tier devices.

I’d also be very interested in whether the neuromorphic device is compatible with and could be programmed with PyTorch or TensorFlow but I didn’t see any information on how the devices were programmed.

Comments?

Photo Credit(s): neuron by mararie 

3D CrossPoint graphic, taken from Intel-Micron session at FMS16

brain-neurons by Fotis Bobolas

Flaming Lotus Girls Neuron by SanFranAnnie (cc) (from Flickr)

IBM Research creates PCM synapses – cognitive computing, round 4

Last year we reported on IBM’s progress in taking PCM (phase change memory) and using it to create a new, neuromorphic computing architecture (see Phase Change Memory (PCM) based neuromorphic processors). And earlier we discussed IBM’s (2nd generation), True North chip and IBM’s (1st generation) Synapse Chip.

This past week IBM made another cognitive computing announcement. This time they have taken their neuromorphic technologies another step closer to precise emulation of neurological processing of the brain.

Their research paper was not directly available, but IBM Research has summarized its contents in a short web article with a video (see IBM Scientists imitate the functionality of neurons with Phase-Change device).
Continue reading “IBM Research creates PCM synapses – cognitive computing, round 4”