More power efficient deep learning through IBM and PCM

Read an article today from MIT Technical Review (TR) (AI could get 100 times more efficient with IBM’s new artificial synapses). Discussing the power efficiency of a new analog approach to neural nets and deep learning.

We have talked about IBM’s TrueNorth and Synapse neuromorphic devices  and PCM neural nets before (see: Parts 1, 2, 3, & 4).

The paper in Nature (Equivalent accuracy accelerated neural training using analogue memory ) referred to by the TR article is behind a pay wall. However, another ArsTechnica (Ars) article (Training a neural network in phase change memory beats GPUs) on the new research was a bit more informative.

Both articles discuss a new analog approach, using phase change memory (PCM) which has significant power/training efficiency when compared to today’s standard GPU AI processor. Both the TR and Ars papers report on IBM developments simulating a new (PCM based) neuromorphic device that reduces training  power consumption AND training time by a factor of 100.   But the Nature paper abstract says it reduces both power consumption and computational space (computations per sq mm) by a factor of 100, not exactly the same.


PCM is a nonvolatile memory technology (see part 4 above for more info) that uses electronically induced phase changes in a material to establish a 1’s or 0’s state for a PCM bit.

However, another advantage of PCM is that it also can take on a state between 0 and 1. This is bad for data memory/storage but good for neural nets.

For a PCM based neural net you could have a layer of PCM (neuron) structures and standard wiring that wires all the PCM neurons to the next layer down, for however many layers required for your neural net. The PCM value would indicate the strength of the connection between neurons (synapses).

But, the problem with a PCM neural net is that PCM states don’t provide enough graduations of values between 0 and 1 to fully map today’s neural net weights.

IBM’s latest design has two different tiers of neural nets

According to Ars article, IBM’s latest design has a two tier approach to using PCM in its neural net. The first, top tier uses a PCM structure and the second lower tier uses a more traditional, silicon based structure and together they implement the neural net.

The Ars article speaks of the new two tier design as providing two digit resolution for the weight between  neuron. The structure implemented in PCM determines the higher order digit and the more traditional, silicon based, neural net segment determines the lower order digit in the two digit neural net weight.

With this approach, training occurs mostly in the more traditional, silicon layer neural net, but every 100 or so training events (epochs),  information is used to modify the PCM structure as well. In this fashion, the PCM-silicon neural net is fine tuned using 1 out of 100 or so training events to correct the PCM layer and the other 99 or so training events to modify the silicon layer.

In addition, the silicon layer is apparently implemented in silicon to mimic the PCM layer, using capacitors and transistors.


I wonder why not just use two tiers of PCM to do the same thing but it’s possible that training the silicon layer is more power efficient, speedy or both than the PCM layer.

The TR and Ars articles seem to make a point of saying this is analogue computing. And I would guess because the PCM and the silicon layer can take on many values between 0 and 1 that means it’s not digital.

Much of the article is based on combined hardware (built using 90nm technology) and software simulations of the new PCM-silicon neuromorphic device. However, simulations like this are a standard step in ASIC design process, and if successful, we would expect an chip to emerge from foundry within 6-12 months from now.

The Nature paper’s abstract indicated that they simulated the device using standard (MNIST, MNIST-backrand, CIFAR-10 and CIFAR-100) training datasets for handwritten digit recognition and color image classification/recognition. The new device was able to approach within 1% accuracy of software trained neural net with 1% the power and (when updated to latest foundry technologies) in 1% the space.

Furthermore, the abstract said that the current device supports ~205K synapses. The previous generation, IBM TrueNorth (see part 2 above) had the “equivalent of 1M neurons” and their earlier IBM SYNAPSE (see part 1 above) chip had “256K programable synapses” and 256 computational elements. But I believe both of those were single tier devices.

I’d also be very interested in whether the neuromorphic device is compatible with and could be programmed with PyTorch or TensorFlow but I didn’t see any information on how the devices were programmed.


Photo Credit(s): neuron by mararie 

3D CrossPoint graphic, taken from Intel-Micron session at FMS16

brain-neurons by Fotis Bobolas

IBM’s next generation, TrueNorth neuromorphic chip

Ok, I admit it, besides being a storage nut I also have an enduring interest in AI. And as the technology of more sophisticated neuromorphic chips starts to emerge it seems to me to herald a whole new class of AI capabilities coming online. I suppose it’s both a bit frightening as well as exciting which is why it interests me so.

IBM announced a new version of their neuromorphic chip line, called TrueNorth with +5B transistors and the equivalent of ~1M neurons. There were a number of articles on this yesterday but the one I found most interesting was in MIT Technical Review, IBM’s new brainlike chip processes data the way your brain does, (based on a Journal Science article requires login, A million spiking neuron integrated circuit with a scaleable communications network and interface).  We discussed an earlier generation of their SyNAPSE chip in a previous post (see my IBM research introduces SyNAPSE chip post).

How does TrueNorth compare to the previous chip?

The previous generation SyNAPSE chip had a multi-mode approach which used  65K “learning synapses” together with ~256K “programming synapses”. Their current generation, TrueNorth chip has 256M “configurable synapses” and 1M “programmable spiking neurons”.  So the current chip has quadrupled the previous chips “programmable synapses” and multiplied the “configurable synapses” by a factor of a 1000.

Not sure why the configurable synapses went up so high but it could be an aspect of connectivity, something akin to what happens to a “complete graph” which has a direct edge connection to every node in the graph. In a complete graph if you have N nodes then the number of edges is given as [N*(N-1)]/2, which for 1M nodes would be ~500M edges. So it must not be a complete graph, but it’s “close to complete” with 1/2 the number of edges.

Analog vs. Digital?

When last I talked with IBM on their earlier version chip I wondered why they used digital logic to create it rather than analog. They said to be able to better follow along the technology curve of normal chip electronics digital was the way to go.

It seemed to me at the time that if you really  wanted to simulate a brains neural processing then you would want to use an analog approach and this should use much less power. I wrote a couple of posts on the subject, one of which was on MIT’s analog neuromorphic chip (see my MIT builds analog neuromorphic chip post) and the other was on why analog made more sense than digital technology for neuromorphic computation (see my Analog neural simulation or Digital neuromorphic computing vs. AI post).

The funny thing is that IBM’s TrueNorth chip uses a lot less power (1000X, milliwatts vs watts) than normal CMOS chips in e use today. Not sure why this would be the case with digital logic but if this is true maybe there’s more of a potential to utilize these sorts of chips in wider applications beyond just traditional AI domains.

How do you program it?

I would really like to get a deeper look at the specs for TrueNorth and its programming model.  But there was a conference last year where IBM presented three technical papers on TrueNorth architecture and programming capabilities (see MIT Technical Report: IBM scientists show blueprints for brain like computing).

Apparently the 1M programming spike neurons are organized into blocks of 256 neurons each (with a prodigious amount of “configurable” synapses as well). These seem equivalent to what I would call a computational unit. One programs these blockss with “corelets” which map out the neural activity that the 256-neuron blocks can perform. Also these corelets “programs” can be linked together or one be subsumed within another sort of like subroutines.  IBM as of last year had a library of 150 corelets which do stuff like detect visual artifacts, motion in a visual image, detect color, etc.

Scale-out neuromorphic chips?

The abstract of the Journal Science paper talked specifically about a communications network interface that allows the TrueNorth chips to be “tiled in two dimensions” to some arbitrary size. So it is apparent that with the TrueNorth design, IBM has somehow extended a within chip block interface that allows corelets to call one another, to go off chip as well. With this capability they have created a scale-out model with the TrueNorth chip.

Unclear why they felt it had to go only two dimensional rather than three but, it seems to mimic the sort of cortex layer connections we have in our brains today. But even with only two dimensional scaling there are all sorts of interesting topologies that are possible.

There doesn’t appear to be any theoretical limit to the number of chips that can be connected in this fashion but I would suppose they would all need to be on a single board or at least “close” together because there’s some sort of time frame that couldn’t be exceeded for propagation delay, i.e., the time it takes for a spike to transverse from one chip to the farthest chip in the chain couldn’t exceed say 10msec. or so.

So how close are we to brain level computations?

In one of my previous post I reported Wikipedia stating that  a typical brain has 86B neurons with between 100M and 500M synapses. I was able to find the 86B number reference today but couldn’t find the 100M to 500M synapses quote again.  However, if these numbers are close to the truth, the ratio between human neurons and synapses is much less in a human brain than in the TrueNorth chip. And TrueNorth would need about 86,000 chips connected together to match the neuronal computation of a human brain.

I suppose the excess synapses in the TrueNorth chip is due to the fact that electronic connection have to be fixed in place for a neuron to neuron connection to exist. Whereas in the brain, we can always grow synapse connections as needed. Also, I read somewhere (can’t remember where) that a human brain at birth has a lot more synapse connections than an adult brain and that part of the learning process that goes on during early life is to trim excess synapses down to something that is more manageable or at least needed.

So to conclude, we (or at least IBM) seem to be making good strides in coming up with a neuromorphic computational model and physical hardware, but we are still six or seven generations away from a human brain’s capabilities (assuming a 1000 of these chips could be connected together into one “brain”).  If a neuromorphic chip generation takes ~2 years then we should be getting pretty close to human levels of computation by 2028 or so.

The Tech Review article said that the 5B transistors on TrueNorth are more transistors than any other chip that IBM has produced. So they seem to be at current technology capabilities with this chip design (which is probably proof that their selection of digital logic was a wise decision).

Let’s just hope it doesn’t take it 18 years of programming/education to attain college level understanding…


Photo Credit(s): New 20x [view of mouse cortex] by Robert Cudmore

Top 10 blog posts for 2011

Merry Christmas! Buon Natale! Frohe Weihnachten! by Jakob Montrasio (cc) (from Flickr)
Merry Christmas! Buon Natale! Frohe Weihnachten! by Jakob Montrasio (cc) (from Flickr)

Happy Holidays.

I ranked my blog posts using a ratio of hits to post age and have identified with the top 10 most popular posts for 2011 (so far):

  1. Vsphere 5 storage enhancements – We discuss some of the more interesting storage oriented Vsphere 5 announcements that included a new DAS storage appliance, host based (software) replication service, storage DRS and other capabilities.
  2. Intel’s 320 SSD 8MB problem – We discuss a recent bug (since fixed) which left the Intel 320 SSD drive with only 8MB of storage, we presumed the bug was in the load leveling logic/block mapping logic of the drive controller.
  3. Analog neural simulation or digital neuromorphic computing vs AI – We talk about recent advances to providing both analog (MIT) and digital versions (IBM) of neural computation vs. the more traditional AI approaches to intelligent computing.
  4. Potential data loss using SSD RAID groups – We note the possibility for catastrophic data loss when using equally used SSDs in RAID groups.
  5. How has IBM researched changed – We examine some of the changes at IBM research that have occurred over the past 50 years or so which have led to much more productive research results.
  6. HDS buys BlueArc – We consider the implications of the recent acquisition of BlueArc storage systems by their major OEM partner, Hitachi Data Systems.
  7. OCZ’s latest Z-Drive R4 series PCIe SSD – Not sure why this got so much traffic but its OCZ’s latest PCIe SSD device with 500K IOPS performance.
  8. Will Hybrid drives conquer enterprise storage – We discuss the unlikely possibility that Hybrid drives (NAND/Flash cache and disk drive in the same device) will be used as backend storage for enterprise storage systems.
  9. SNIA CDMI plugfest for cloud storage and cloud data services – We were invited to sit in on a recent SNIA Cloud Data Management Initiative (CDMI) plugfest and talk to some of the participants about where CDMI is heading and what it means for cloud storage and data services.
  10. Is FC dead?! – What with the introduction of 40GbE FCoE just around the corner, 10GbE cards coming down in price and Brocade’s poor YoY quarterly storage revenue results, we discuss the potential implications on FC infrastructure and its future in the data center.


I would have to say #3, 5, and 9 were the most fun for me to do. Not sure why, but #10 probably generated the most twitter traffic. Why the others were so popular is hard for me to understand.


Analog neural simulation or digital neuromorphic computing vs. AI

DSC_9051 by Greg Gorman (cc) (from Flickr)
DSC_9051 by Greg Gorman (cc) (from Flickr)

At last week’s IBM Smarter Computing Forum we had a session on Watson, IBM’s artificial intelligence machine which won Jeopardy last year and another session on IBM sponsored research helping to create the SyNAPSE digital neuromorphic computing chip.

Putting “Watson to work”

Apparently, IBM is taking Watson’s smarts and applying it to health care and other information intensive verticals (intelligence, financial services, etc.).  At the conference IBM had Monoj Saxena, senior director Watson Solutions and Dr. Herbert Chase, a professor of clinical medicine a senior medical professor from Columbia School of Medicine come up and talk about Watson in healthcare.

Mr. Saxena’s contention and Dr. Chase concurred that Watson can play at important part in helping healthcare apply current knowledge.  Watson’s core capability is the ability to ingest and make sense of information and then be able to apply that knowledge.  In this case, using medical research knowledge to help diagnose patient problems.

Dr. Chase had been struck at a young age by one patient that had what appeared to be an incurable and unusual disease.  He was an intern at the time and was given the task to diagnose her issue.  Eventually, he was able to provide a proper diagnosis but it irked him that it took so long and so many doctors to get there.

So as a test of Watson’s capabilities, Dr. Chase input this person’s medical symptoms into Watson and it was able to provide a list of potential diagnosises.  Sure enough, Watson did list the medical problem the patient actually had those many years ago.

At the time, I mentioned to another analyst that Watson seemed to represent the end game of artificial intelligence. Almost a final culmination and accumulation of 60 years in AI research, creating a comprehensive service offering for a number of verticals.

That’s all great, but it’s time to move on.

SyNAPSE is born

In the next session IBM had Dr. Dharmenrad Modta come up and talk about their latest SyNAPSE chip, a new neueromorphic digital silicon chip that mimicked the brain to model neurological processes.

We are quite a ways away from productization of the SyNAPSE chip.  Dr. Modha showed us a real-time exhibition of the SyNAPSE chip in action (connected to his laptop) with it interpreting a handwritten numeral into it’s numerical representation.  I would say it’s a bit early yet, to see putting “SyNAPSE to work”.

Digital vs. analog redux

I have written about the SyNAPSE neuromorphic chip and a competing technology, the direct analog simulation of neural processes before (see IBM introduces SyNAPSE chip and MIT builds analog synapse chip).  In the MIT brain chip post I discussed the differences between the two approaches focusing on the digital vs. analog divide.

It seems that IBM research is betting on digital neuromorphic computing.  At the Forum last week, I had a discussion with a senior exec in IBM’s STG group, who said that the history of electronic computing over the last half century or so has been mostly about the migration from analog to digital technologies.

Yes, but that doesn’t mean that digital is better, just more easy to produce.

On that topic, I asked the Dr. Modha, on what he thought of MIT’s analog brain chip.  He said

  • MIT’s brain chip was built on 180nm fabrication processes whereas his is on 45nm or over 3X finer. Perhaps the fact that IBM has some of the best fab’s in the world may have something to do with this.
  • The digital SyNAPSE chip can potentially operate at 5.67Ghz and will be absolutely faster than any analog brain simulation.   Yes, but each analog simulated neuron is actually one of a parallel processing complex and with a 1’000 or a million of them operating even 1000X or million X slower it’s should be able to keep up.
  • The digital SyNAPSE chip was carefully designed to be complementary to current digital technology.   As I look at IT today we are surrounded by analog devices that interface very well with the digital computing environment, so I don’t think this will be a problem when we are ready to use it.

Analog still surrounds us and defines the real world.  Someday the computing industry will awaken from it’s digital hobby horse and somehow see the truth in that statement.


In any case, if it takes another 60 years to productize one of these technologies then the Singularity is farther away than I thought, somewhere around 2071 should about do it.