Ok, maybe neuromorphic chips aren’t a deadend – Neuromorphic Part 6

Posted on November 25, 2020October 22, 2024 by Ray in Artificial Intelligence, Brain emulation, Cognitive computing, Deep Learning, Energy efficiency, IoT, Mobile computing, Neural network, Neuromorphic, System effectiveness

Those of you who followe my blog will no doubt recall that I pronounced neuromorphic chips dead (see our Are neuromorphic chips a deadend blog post). Not because the hardware technology wasn’t improving or good enough, but because software support for the technology was sorely lacking and it was extremely complex or nigh impossible to program and use.

But first please take our new poll:

And, in the meantime GPUs, TPUs and other more “normal” neural network hardware and accelerators, all were able to utilize standard, easy to use, mostly open source, AI DL frameworks. And all this hardware was steadily improving, coming out regularly with more power and performance, with no end in sight.

But then I attended AIFD1 (AI Field Day 1) and at one of the sessions, Anil Mankar, COO & Co-Founder of a company named BrainChip Inc, (see video of their talk) presented yet another neuromorphic chip, called the AKIDA Neural Processor. Their current generation of the technology is available in their AKD 1000 SoC chip, focused on IoT solutions. But they had created a a software development environment that allowed one to use standard TensorFlow neural network trained models and deploy these on their hardware. And that got my interest.

BrainChip’s AKIDA AKD 1000 hardware AND software

Their AI DL nueromoryhic chip is made app of Event Domain Neural Processing Units (NPUs). AKIDA technology is focused on low power, sensor like applications. They claim to save power by only consumuing power (or is running) when an event takes place. They are also able to save on memory requirements by using 1, 2 or 4 bits (vs. 8, 16, 32 or more bits) for model weights/activations

Their hardware seems to run spiking neural networks (SNN, see our blog post on another chip technology using SNNs). In their SDK, they have a CNN2SNN tool that could take a any (TensorFlow) trained CNN model and convert it to a SNN, that could then run on their AKIDA tecnology.

They also have an AKIDA Model Zoo with a handful of pre-trained CNN type models that have already been converted to run on their technology. They also provide a tutorial on their technology. Mankar, said that if you understand how to use TensorFlow Keras today, to construct and train your models, it shouldn’t be too hard to understand how to use their tools to do what you want.

Their chip hardware is available today on a separate PCIe card, M.2 form factor card. or as a chip. Finally, they also license their AKIDA IP to other chip designers.

AKIDA AKD 1000 performance

At the AIFD1 Mankar showed statistics on the performance and accuracy attained using their chip vs. using standard 32 bit floating point CNN implementations.

As discussed above, their processor uses 1-4 bits for weight quantization and as such loses some accuracy but as you can see it’s a matter of one to a few percent vs. these same models using a 32bit floating point CNN implementation.

Because of their smaller weights, AKIDA uses less memory and less bandwidth to update models vs. models using larger weights.

As shown in the chart the the memory required for the 8-bit deep learning algorithms (DLAs) were all significantly larger than the memory requirements for the AKIDA solution. For one algorithm, they required ~1/2 the memory size of the 8-bit DLA version of the model.

Mankar also provided information on the amount of calculations required per inference using AKIDA vs. 8-bit DLAs.

Just to set the stage, MMACs/Inference is (matrix or multiple) multiplications and accumulations required to perform a single inference with the selected CNN model. ImageNet (1000), ImageNette (20) and Visual Wake Word models are all standard CNN models, that have pre-trained on vast repositories of data, that can run in many hardware environments. The non-AKIDA solutions above were all running using an 8-bit DLA CNN model. Activity regularization is a method of reducing the learning rate and weights used during training that shrinks the weight changes during training to reduce model overfit.

He also showed some comparisons of their technology vs. Intel’s LoiHi hardware. LoiHi is another neuromorphic chip, whose original introduction prompted me to write the “Are neuromorphic chips a deadend” post (link above). Unfortunately, I didn’t capture any of these charts, but from my recollection, they showed that AKIDA technology used slightly less power than LoiHi technology in all their comparisons.

AKIDA technology demo

In their live, on camera, demo, they used a previously downloaded VGG16 (if I recall correctly) CNN trained model. Offline they had replaced the last classification layer with a (blank, untrained) dense network and they converted this to a SNN and downloaded onto one of their boards. They had developed an application that used this board with a camera to perform more CNN training or CNN image inferencing (classification).

They first (one-shot) trained their board’s model to recognize the background of what the camera was seeing and then proceeded to perform (one-shot) trainings to classify toys of tigers, elephants and cars. All these were completed in real time in the demo. They were able to verify the training took using pictures of tigers, elephants and cars as well as classify all the toys in different orientations and a different toy car

The AIFD1 (a tuff) crowd, said had seen all this before but would be really interested to see if their chip could distinguish between different cars (one a toy race car and the other a toy police car). On camera, they were able to re-train their CNN to distinguish between (toy) car 1 and car 2 to classify properly between the two of them. They had one or two instances where their CNN model was confused, but they were able to re-train it to recognize the toy car and place it into the correct classification (using two-shot[?] learning).

At AIFD1, Mankar also presented detailed, real world data on how they were able to perform Keyword spotting, person detection, E-nose classification, E-tongue classification, and auditory (E-ear?) classification in embedded sensor systems.

AKIDA technology limitations

At the moment, their chip doesn’t support neural networks that use memory such as LSTM or RNN’s but it seems to work fine for any CNN, which was shown multiple times in the data they presented and in their demo.

We were really impressed with their software stack, liked what we saw of their hardware/IP, and enjoyed their demo and its one-shot learning. Check out their videos (link above) for more information on them.

Photo Credit(s): all charts are from BrainChip Inc’s website or were presented at their AIFD1 session

University of Manchester fires up world’s largest neuromorphic computer – Neuromorphic part 4

Posted on November 5, 2018October 22, 2024 by Ray in Artificial Intelligence, Brain emulation, Distributed computing, Neural network, Strategic Inflection Points

Read an article the other day about SpiNNaker, the University of Manchester’s neuromorphic supercomputer (see Live Science Article: Worlds largest supercomputer brain…). There’s also a wikipedia page on SpiNNaker and a SpiNNaker project page.

SpiNNaker is part of the European Union Human Brain Project (HBP), Brain Simulation Program.

SpiNNaker supercomputing hardware

(Most of the following information is from the SpiNNaker project home page and SpiNNaker architectural overview page.)

The system has 1 million ARM9 (968) cores and ~7TB of memory, with each core emulating 1000 spiking neurons. With this amount of computing power, it should be able to emulate a 1B (1 billion, 10^9) neuron brain (region).

The system will consist of 1200 PCBs with each PCB containing a 48 chip array and associated networking hardware & memory. Each node contains a SpiNNaker chip with its 18 ARM9 cores.

Each node has two chips stitch bonded together on top of one another. The bottom consists of the 18-ARM9 cores and the top the double DDR memory and networking layer.

Total bisectional networking bandwidth is 5 B packets/second with each packet consisting of 5 or 9 bytes of data.

SpiNNaker operates on 1W per chip or 90KW of power to run the entire machine. Given that each chip is 18 cores and each core is 1000 neurons, this means each neuron simulation takes about 55.5µW of power to run.

You can deploy a single board as IoT solution but @ ~48W per board it may be be too energy consumptive for IoT.

SpiNNaker supercomputing software

According to the home page and the Live Science article, SpiNNaker is intended to be used to model critical segments of the human brain such as the basal ganglia brain area for the EU HBP brain simulation program.

The system architecture has three tiers, a host machine (layer) which communicates with the monitor layer to start and monitor application execution and uses “ybug” to communicate, a monitor core (layer) which interacts with ybug at the host and uses “scamp” to communicate with the application processors, and the application processors (layer) consisting of the ARM cores, memory and packet networking hardware which runs the SpiNNaker Application Runtime Kernel (sark).

Applications which run on sark can consist of spiking neural networks or multi-layer perceptrons (MLP), classical deep learning neural networks.

MLP applications use back propagation and a training and inference phases, familiar to any deep learning application and uses a fixed neural network topology.
Spiking neural network applications use ongoing learning so there’s no training or inference phases (it’s always learning), use a variable network topology (reconfiguring the ARM core-packet network) and currently supports the PyNN spiking neural network simulator.

Unfortunately most of the links in the SpiNNaker project pages referring to PyNN spiking networking applications are broken. But PyNN is a Python based spiking neural network simulator that can run on a number of different hardware platforms (including sark/SpiNNaker).

Most of the AI groups I’ve talked with mention PyTorch or TensorFlow as AI frameworks of choice these days. But it’s unclear to me whether these two support spiking neural network generation/simulation.

If you want to learn more about programming SpiNNaker please check out their Software for SpiNNaker wiki page.

~~~~

As you may recall, a homo sapiens brain has an estimated 16B to 86B neurons in its average cerebral cortex (see wikipedia “animals listed by neuron count” article, for low estimate, EU’s HBP Brain Simulation page, for high estimate), which puts SpiNNaker today, at about the equivalent of less than a average tufted capuchin cerebral cortex (@1.2B neurons).

Given the above and with SpiNNaker @1B neurons, we are only 4 to 7 generations away from human equivalence. That means we have at most ~14 years left before a 128B spiking neuronal simulation machine is available.

But SpiNNaker today is based on ARM9 cores and ARM11 cores already exist. So, if they redesigned/reimplemented the chip today, it would already be 2X the core count. aake that human equivalence is only a max of 12 years away.

The mean estimate for AGI (artificial general intelligence) seems to be 2040-2050 (see wikipedia Technological Singularity article). But given what University of Manchester’s SpiNNaker is capable of doing today, I don’t think we have that long to wait.

Photo Credits: All photos/charts above are from the SpiNNaker Project pages at the University of Manchester website

PCM based neuromorphic processors – Neuromorphic Part 3

Posted on January 19, 2015October 22, 2024 by Ray in Cognitive computing, Neural network, Visionary organizations

Read an interesting article from Register the other day about IBM’s Almadan Research lab using standard Non-volatile memory devices to implement a neural net. They apparently used 2-PCM (Phase Change Memory) devices to implement a 913 neuron/165K synapse pattern recognition system.

This seems to be another (simpler, cheaper) way to create neuromorphic chips. We’ve written about neuromorphic chips before (see my posts on IBM SyNAPSE, IBM TrueNorth and MIT’s analog neuromorphic chip). The latest TrueNorth chip from IBM uses ~5B transistors and provides 1M neurons with 256M synapses.

But none of the other research I have read actually described the neuromorphic “programming” process at the same level nor provided a “success rate” on a standard AI pattern matching benchmark as IBM has with the PCM device.

PCM based AI

The IBM summary report on the research discusses at length how the pattern recognition neural network (NN) was “trained” and how the 913 neuron/165K synapse NN was able to achieve 82% accuracy on NIST’s handwritten digit training database.

The paper has many impressive graphics. The NN was designed as a 3-layer network and used back propagation for its learning process. They show how the back propagation training was used to determine the weights.

The other interesting thing was they analyzed how hardware faults (stuck-ats, dead conductors, number of resets, etc.) and different learning parameters (stochasticity, learning batch size, variable maxima, etc.) impacted NN effectiveness on the test database.

Turns out the NN could tolerate ~30% dead conductors (in the Synapses) or 20% of stuck-at’s in the PCM memory and still generate pretty good accuracy on the training event. Not sure I understand the learning parameters but they varied batch size from 1 to 10 and this didn’t seem to impact NN accuracy whatsoever.

Which PCM was used?

In trying to understand which PCM devices were in use, the only information available said it was a 180nm device. According to a 2012 Flash Memory Summit Report report on alternative NVM technologies, 180nm PCM devices have been around since 2004, a 90nm PCM device was introduced in 2008 with 128Mb and even newer PCM devices at 45nm were introduced in 2010 with 1Gb of memory. So I would conclude that the 180nm PCM device supported ~16 to 32Mb.

What can we do with todays PCM technology?

With the industry supporting a doubling of transistors/chip every 2 years a PCM device in 2014 should have 4X the transistors of the 45nm, 2010 device above and ~4-8X the memory. So today we should be seeing 4-16Gb PCM chips at ~22nm. Given this, current PCM technology should support 32-64X more neurons than the 180nm devices or ~29K to ~58K neurons or so

Unclear what technology was used for the ‘synapses’ but based on the time frame for the PCM devices, this should also be able to scale up by a factor of 32-64X or between ~5.3M to ~10.6M synapses.

Still this doesn’t approach TrueNorth’s Neurons/Synapse levels, but it’s close. But then 2 4-16Gb PCMs probably don’t cost nearly as much to purchase as TrueNorth costs to create.

The programing model for the TrueNorth/Synapse chips doesn’t appear to be neural network like. So perhaps another advantage of the PCM model of hardware based AI is that you can use standard, well known NN programming methods to train and simulate it.

So, PCM based neural networks seem an easier way to create hardware based AI. Not sure this will ever match Neuron/Synapse levels that the dedicated, special purpose neuromorphic chips in development can accomplish but in the end, they both are hardware based AI that can support better pattern recognition.

Using commodity PCM devices any organization with suitable technological skills should be able to create a hardware based NN that operates much faster than any NN software simulation. And if PCM technology starts to obtain market acceptance, the funding available to advance PCMs will vastly exceed that which IBM/MIT can devote to TrueNorth and its descendants.

Now, what is HP up to with their memristor technology and The Machine?

Photo Credits: Neurons by Leandro Agrò

IBM’s next generation, TrueNorth neuromorphic chip – Neuromorphic Part 2

Posted on August 8, 2014October 22, 2024 by Ray in Artificial Intelligence, Cognitive computing, Cognitive science, Distributed computing, IBM SyNAPSE chip, Information economy, MIT analog brain chip, Neuron connection mapping, Processing performance, Strategic Inflection Points, Strategy, Systems

Ok, I admit it, besides being a storage nut I also have an enduring interest in AI. And as the technology of more sophisticated neuromorphic chips starts to emerge it seems to me to herald a whole new class of AI capabilities coming online. I suppose it’s both a bit frightening as well as exciting which is why it interests me so.

IBM announced a new version of their neuromorphic chip line, called TrueNorth with +5B transistors and the equivalent of ~1M neurons. There were a number of articles on this yesterday but the one I found most interesting was in MIT Technical Review, IBM’s new brainlike chip processes data the way your brain does, (based on a Journal Science article requires login, A million spiking neuron integrated circuit with a scaleable communications network and interface). We discussed an earlier generation of their SyNAPSE chip in a previous post (see my IBM research introduces SyNAPSE chip post).

But first please take our new poll:

How does TrueNorth compare to the previous chip?

The previous generation SyNAPSE chip had a multi-mode approach which used 65K “learning synapses” together with ~256K “programming synapses”. Their current generation, TrueNorth chip has 256M “configurable synapses” and 1M “programmable spiking neurons”. So the current chip has quadrupled the previous chips “programmable synapses” and multiplied the “configurable synapses” by a factor of a 1000.

Not sure why the configurable synapses went up so high but it could be an aspect of connectivity, something akin to what happens to a “complete graph” which has a direct edge connection to every node in the graph. In a complete graph if you have N nodes then the number of edges is given as [N*(N-1)]/2, which for 1M nodes would be ~500M edges. So it must not be a complete graph, but it’s “close to complete” with 1/2 the number of edges.

Analog vs. Digital?

When last I talked with IBM on their earlier version chip I wondered why they used digital logic to create it rather than analog. They said to be able to better follow along the technology curve of normal chip electronics digital was the way to go.

It seemed to me at the time that if you really wanted to simulate a brains neural processing then you would want to use an analog approach and this should use much less power. I wrote a couple of posts on the subject, one of which was on MIT’s analog neuromorphic chip (see my MIT builds analog neuromorphic chip post) and the other was on why analog made more sense than digital technology for neuromorphic computation (see my Analog neural simulation or Digital neuromorphic computing vs. AI post).

The funny thing is that IBM’s TrueNorth chip uses a lot less power (1000X, milliwatts vs watts) than normal CMOS chips in e use today. Not sure why this would be the case with digital logic but if this is true maybe there’s more of a potential to utilize these sorts of chips in wider applications beyond just traditional AI domains.

How do you program it?

I would really like to get a deeper look at the specs for TrueNorth and its programming model. But there was a conference last year where IBM presented three technical papers on TrueNorth architecture and programming capabilities (see MIT Technical Report: IBM scientists show blueprints for brain like computing).

Apparently the 1M programming spike neurons are organized into blocks of 256 neurons each (with a prodigious amount of “configurable” synapses as well). These seem equivalent to what I would call a computational unit. One programs these blockss with “corelets” which map out the neural activity that the 256-neuron blocks can perform. Also these corelets “programs” can be linked together or one be subsumed within another sort of like subroutines. IBM as of last year had a library of 150 corelets which do stuff like detect visual artifacts, motion in a visual image, detect color, etc.

Scale-out neuromorphic chips?

The abstract of the Journal Science paper talked specifically about a communications network interface that allows the TrueNorth chips to be “tiled in two dimensions” to some arbitrary size. So it is apparent that with the TrueNorth design, IBM has somehow extended a within chip block interface that allows corelets to call one another, to go off chip as well. With this capability they have created a scale-out model with the TrueNorth chip.

Unclear why they felt it had to go only two dimensional rather than three but, it seems to mimic the sort of cortex layer connections we have in our brains today. But even with only two dimensional scaling there are all sorts of interesting topologies that are possible.

There doesn’t appear to be any theoretical limit to the number of chips that can be connected in this fashion but I would suppose they would all need to be on a single board or at least “close” together because there’s some sort of time frame that couldn’t be exceeded for propagation delay, i.e., the time it takes for a spike to transverse from one chip to the farthest chip in the chain couldn’t exceed say 10msec. or so.

So how close are we to brain level computations?

In one of my previous post I reported Wikipedia stating that a typical brain has 86B neurons with between 100M and 500M synapses. I was able to find the 86B number reference today but couldn’t find the 100M to 500M synapses quote again. However, if these numbers are close to the truth, the ratio between human neurons and synapses is much less in a human brain than in the TrueNorth chip. And TrueNorth would need about 86,000 chips connected together to match the neuronal computation of a human brain.

I suppose the excess synapses in the TrueNorth chip is due to the fact that electronic connection have to be fixed in place for a neuron to neuron connection to exist. Whereas in the brain, we can always grow synapse connections as needed. Also, I read somewhere (can’t remember where) that a human brain at birth has a lot more synapse connections than an adult brain and that part of the learning process that goes on during early life is to trim excess synapses down to something that is more manageable or at least needed.

So to conclude, we (or at least IBM) seem to be making good strides in coming up with a neuromorphic computational model and physical hardware, but we are still six or seven generations away from a human brain’s capabilities (assuming a 1000 of these chips could be connected together into one “brain”). If a neuromorphic chip generation takes ~2 years then we should be getting pretty close to human levels of computation by 2028 or so.

The Tech Review article said that the 5B transistors on TrueNorth are more transistors than any other chip that IBM has produced. So they seem to be at current technology capabilities with this chip design (which is probably proof that their selection of digital logic was a wise decision).

Let’s just hope it doesn’t take it 18 years of programming/education to attain college level understanding…

Comments?

Photo Credit(s): New 20x [view of mouse cortex] by Robert Cudmore

MIT builds analog synapse chip

Posted on November 15, 2011April 8, 2021 by Ray in Distributed computing, Information economy, Strategic Inflection Points, System effectiveness

2011 Wikimedia commons (400px-Synapse_Illustration_unlabeled.svg)

Recently MIT announced a new brain chip, a breakthrough device that simulates a single brain synapse with an analog chip.

We have discussed before the digital nueromorphic chip activity going on (see my IBM introducing their SyNAPSE chip and Electro-human interface posts). However both those were digital, this new MIT chip is analog. The chip uses ~400 transistors and was fabricated using VLSI processing.

But first please take our new poll:

Analog, whats that?

Given that the world has gone digital, analog devices may be foreign to most of us. But analog dominated the way electronics worked for the first half of last century and were still pretty prominent during the last half.

Nowadays, such devices are used primarily in signal processing, and where streams of data are transformed from one mode to another (serial/deserializers). An analog signal has a theoretically an infinite resolution (Wikipedia), which should make it closer to real life and may be why some stereophiles perfer records to CDs.

Neurons are analog devices

That being said, it’s a treat to see some new analog technology come out that’s better than digital implementations. One would have to say that neural activity is by definition analog and as such, should make simulating brain activity much easier.

The advantage of analog can be seen in that the neural synapse is the connection between two neurons. Information is transferred between the two neurons by the take up of Ions. In the case of the MIT synapse chip, the same sort of process occurs but in this case information flows based on gradients of electronic potential.

In testament to the capabilities of the new synapse chip they were able to resolve a long standing debate in neuro-biology. The question was on how long term potentation (LTP) and long term depression (LTD) which enhances or depresses the information transfer across the synapse was accomplished in real neurons. Previously, it had been postulated that LTP and LTD would depend on two different mechanisms in real cells. But there was one theory that said with a specific type of receptor, both LTP and LTD could be performed in a single way.

MIT researchers were able to configure their synapse-chip to mimic that new receptor and were able to show how LTP and LTD could work with this single receptor in the brain.

Onto the brain

Of course a single synapse is not much considering the brain has 100B neurons each with many 100’s if not 1000’s of synapses. But it’s a start.

Naturally, considering its built out of transistors using CMOS technology, it should follow Moore’s law and after 18 months or so we should have a chip with two synapses on it. Another 40 or so doublings more (~60 years from now in 2071), if Moore’s law holds, we can have a brain-chip with 100B neurons and 100T synapses on it.

Of course, this being a prototype, I suppose with today’s fabrication capable of creating 40M transistors/chip, we may already be able to simulate 100K synapses and 100 neurons. Which means we should have a brain’s level of neurons and synapses in 30 doublings or ~2056.

Analog is better than biological

The other nice thing about analog logic and transistors, is that information processing in the brain-chip should be orders of magnitude faster than the brain’s biological processing. Which is probably even more frightening.

The IBM SyNAPSE chip mentioned earlier was an all digital creation and had two chip cores, one provided “learning synapses” and the other “programmable synapses”. This was probably an attempt to mimic neural processing in digital logic.

The analog brain-chip that MIT has invented, has no such distinction, supplying all synapse functionality in 400 transistors. Nonetheless, any accurate simulation of neural processes can help us to understand how to mimic it better. The fact that we have an analog simulation neural processes should help us improve the digital simulation to more closely match the brain.

—-

Not sure what we should call this chip, it’s certainly not neuromorphic, because it’s a real simulation of analog neural synapses not a digital approximation. I would use synapse- chip but its already in use. I kind of like the brain-chip but that may be stretching it a bit. Maybe the neuron-chip is best for now

Now that we know the date for the singularity, hopefully we can be ready to deal with whatever happens then.

Comments?