The myth of AGI

Sorry seem to be on an AGI bent this month…

Read an article the other day about a new book (The myth of AI, by Erik. J. Larson) that explains how the present direction of AI-ML-DL will be very unlikely to achieve artificial general intelligence (AGI) given it’s current direction. Amazon and others offer a short preview of the book which is where most of this discussion comes from.

Types of (human) reasoning

Near as I can tell, (don’t have the book), the book discusses the three types of reasoning that exist in human intellect, i.e., deduction, induction and abduction.

  • Deduction uses formal logic (or its equivalents) to derive facts or theorems from basic principles.
  • Induction uses a multitude of samples and constructs general principles from the analysis of them
  • Abduction uses a set of probabilistic assertions and formal logic, to come up with a probabilistic principle.

Deduction is most famously observed in geometry and arithmetic proofs and was most evident in the early years of AI through its use of expert systems. The challenge with expert systems is that the real world is vastly more complex than any geometrical or arithmetical artifice that humankind can produce.

Expert systems became champions of checkers, chess and some other games but in the end was not easily generalizable beyond a few (gaming and medically) restricted domains.

Induction is presently all the rage and represents what machine learning and deep neural networks (DNN) are doing with all that training data and resultant classification inferencing.

Today we have DNNs that can classify the objects in an image, can learn to play any game on the planet better than humans, and can even safely drive a car down the road.

The current AI world view is that this form of reasoning, DNN induction, will if taken to its extreme will ultimately result in some level of AGI, or human-equivalent levels of intelligence in a system. The author of the book begs to differ.

Abduction is less well known or discussed in rational circles. It’s essentially what any human does when presented with real world examples/experiences to derive an understanding (or principe) of what happened.

For example, a plate full of cookies last night becomes an almost empty plate of crumbs and two cookies. So what happened, your son woke up early, consumed most if not all of them, and left for work. This is a probabilistic (most likely) inference, but has a high probability of being true.

Any AGI will need all forms of reasoning

The challenge is that AI has been through the deduction phase through the rise of expert systems which crashed and burned because of the cost and time required to produce an exhaustive and correct expert system. And AI is currently in the induction phase, via DNN training, which seems to be entirely more generalizable and successfully usable in many different domains, but no one is talking seriously about doing abduction in AI (anymore).

The author claims (again, have not read the book) that any AGI will require as much abduction as induction (as well as perhaps deduction), and therefore, AGI is not inevitable based on our current AI DNN (or induction) intensive path.

Previous and current attempts at abduction reasoning

Some may recall fuzzy logic as one of the avenues taken after expert systems seemed to fail at doing successful and realistic inferencing around the end of last century. Fuzzy logic was a way of bring probabilities into deduction, not unlike abduction as defined above. With fuzzy logic each assertion or base assumption was given a probabilistic value (of being true) and the final derivation was assigned some level of probability of being true.

The wikipedia article has definitions for fuzzy logic and, or and not which of course would allow any system to make these assertions. But fuzzy logic (like expert systems above) suffered from the inability to exhaustively cover all examples in a real world situation.

Furthermore, the (funny) thing about DNNs is that they are much more probabilistic than it appears. If one examines classification outputs of any DNN, it is extremely rare to see some sort of boolean (true or false) yes or no answers. Mostly one sees a series of probabilities that are assigned to each classification bucket.

DNN systems hide these probabilities by just selecting the maximum (or minimum) probability generated as its final classification. This is entirely an artifact of needing to have some discrete output (classification selection). But DNN (internal) results always result in probabilistic values.

So although, pure induction doesn’t include probabilities, DNN induction as practiced today in AI systems, uses probabilistic reasoning in every layer of a DNN and in its final results.

What else may be missing from AI to allow AGI to be developed

Personally, AGI seems to require not just the reasoning approaches above, but a more workable and general purpose planning solution. I’ve tried to identify to see whether some researchers are using DNNs to provide general purpose planning solutions but have been yet to find any (in publcly available research). These are probably the one place where expert (or control) fuzzy systems still shine. But again they are hard to generalize and prove almost impossible to be completely exhaustive.

Nonetheless, in the end, I think that all the above just proves, that there are a number of distinct reasoning and other (planning) techniques that may need to come together to provide AGI. As any of us can attest, all of these different approaches are available within any human intellect.

And if we assume that any AGI will need to follow the human design to intelligence (not a given), they will all need to be stitched together, combined and brought to bear to realize AGI.

But, at present, with all the focus on DNN/induction, we, as AI researchers, are not making any progress on using these other techniques or in combining them into a single system.

And for that I am happy. I would be very pleased to have any AGI be farther out than nearer term. Because for the life of me, AGI scares the s&#t out of me.

Mostly because I don’t see any real way to control AGI, once unleashed. That and given the diversity of motives around this world, I don’t see any realistic mechanism to instill a universal and firm (unalterable) belief in the sanctity of human and other life, the dependance this life has on our environment/biosphere and the rule of law needed to maintain peace across humankind (and I’m probably missing a half dozen more things that we would want any AGI to adhere to).

Maybe, if I saw more effort on how, we as a species can come up with universal views on these and other topics and can come up with some way of instilling, essentially a system of programs, with these unalterable beliefs and AGI controls based on these, I’d be less fearful of AGI emerging.

Lacking that, any way of delaying its emergence, is fine by me.

Comments?

Photo Credit(s):

Is hardware innovation accelerating – hardware vs. software innovation (round 6)

There’s something happening to the IT industry, that maybe has not happened in a couple of decades or so but hardware innovation is back. We’ve been covering bits and pieces of it in our hardware vs software innovation series (see Open source ASiCs – HW vs. SW innovation [round 5] post).

But first please take our new poll:

Hardware innovation never really went away, Intel, AMD, Apple and others had always worked on new compute chips. DRAM and NAND also have taken giant leaps over the last two decades. These were all major hardware suppliers. But special purpose chips, non CPU compute engines, and hardware accelerators had been relegated to the dustbins of history as the CPU giants kept assimilating their functionality into the next round of CPU chips.

And then something happened. It kind of made sense for GPUs to be their own electronics as these were SIMD architectures intrinsically different than SISD, standard von Neumann X86 and ARM CPUs architectures

But for some reason it didn’t stop there. We first started seeing some inklings of new hardware innovation in the AI space with a number of special purpose DL NN accelerators coming online over the last 5 years or so (see Google TPU, SC20-Cerebras, GraphCore GC2 IPU chip, AI at the Edge Mythic and Syntiants IPU chips, and neuromorphic chips from BrainChip, Intel, IBM , others). Again, one could look at these as taking the SIMD model of GPUs into a slightly different direction. It’s probably one reason that GPUs were so useful for AI-ML-DL but further accelerations were now possible.

But it hasn’t stopped there either. In the last year or so we have seen SPUs (Nebulon Storage), DPUs (Fungible, NVIDIA Networking, others), and computational storage (NGD Systems, ScaleFlux Storage, others) all come online and become available to the enterprise. And most of these are for more normal workload environments, i.e., not AI-ML-DL workloads,

I thought at first these were just FPGAs implementing different logic but now I understand that many of these include ASICs as well. Most of these incorporate a standard von Neumann CPU (mostly ARM) along with special purpose hardware to speed up certain types of processing (such as low latency data transfer, encryption, compression, etc.).

What happened?

It’s pretty easy to understand why non-von Neumann computing architectures should come about. Witness all those new AI-ML-DL chips that have become available. And why these would be implemented outside the normal X86-ARM CPU environment.

But SPU, DPUs and computational storage, all have typical von Neumann CPUs (mostly ARM) as well as other special purpose logic on them.

Why?

I believe there are a few reasons, but the main two are that Moore’s law (every 2 years halving the size of transistors, effectively doubling transistor counts in same area) is slowing down and Dennard scaling (as you reduce the size of transistors their power consumption goes down and speed goes up) has stopped almost. Both of these have caused major CPU chip manufacturers to focus on adding cores to boost performance rather than just adding more transistors to the same core to increase functionality.

This hasn’t stopped adding instruction functionality to each CPU, but it has slowed considerably. And single (core) processor speeds (GHz) have reached a plateau.

But what it has stopped is having the real estate available on a CPU chip to absorb lots of additional hardware functionality. Which had been the case since the 1980’s.

I was talking with a friend who used to work on math co-processors, like the 8087, 80287, & 80387 that performed floating point arithmetic. But after the 486, floating point logic was completely integrated into the CPU chip itself, killing off the co-processors business.

Hardware design is getting easier & chip fabrication is becoming a commodity

We wrote a post a couple of weeks back talking about an open foundry (see HW vs. SW innovation round 5 noted above)that would take a hardware design and manufacture the ASICs for you for free (or at little cost). This says that the tool chain to perform chip design is becoming more standardized and much less complex. Does this mean that it takes less than 18 months to create an ASIC. I don’t know but it seems so.

But the real interesting aspect of this is that world class foundries are now available outside the major CPU developers. And these foundries, for a fair but high price, would be glad to fabricate a 1000 or million chips for you.

Yes your basic state of the art fab probably costs $12B plus these days. But all that has meant is that A) they will take any chip design and manufacture it, B) they need to keep the factory volume up by manufacturing chips in order to amortize the FAB’s high price and C) they have to keep their technology competitive or chip manufacturing will go elsewhere.

So chip fabrication is not quite a commodity. But there’s enough state of the art FABs in existence to make it seem so.

But it’s also physics

The extremely low latencies that are available with NVMe storage and, higher speed networking (100GbE & above) are demanding a lot more processing power to keep up with. And just the physics of how long it takes to transfer data across a distance (aka racks) is starting to consume too much overhead and impacting other work that could be done.

When we start measuring IO latencies in under 50 microseconds, there’s just not a lot of CPU instructions and task switching that can go on anymore. Yes, you could devote a whole core or two to this process and keep up with it. But wouldn’t the data center be better served keeping that core busy with normal work and offloading that low-latency, realtime (like) work to a hardware accelerator that could be executing on the network rather than behind a NIC.

So real time processing has become faster, or rather the amount of time to execute CPU instructions to switch tasks and to process data that needs to be done in realtime to keep up with faster line speed is becoming shorter.

So that explains DPUs, smart NICS, DPUs, & SPUs. What about the other hardware accelerator cards.

  • AI-ML-DL is becoming such an important and data AND compute intensive workload that just like GPUs before them, TPUs & IPUs are becoming a necessary evil if we want to service those workloads effectively and expeditiously.
  • Computational storage is becoming more wide spread because although data compression can be easily done at the CPU, it can be done faster (less data needs to be transferred back and forth) at the smart Drive.

My guess we haven’t seen the end of this at all. When you open up the possibility of having a long term business model, focused on hardware accelerators there would seem to be a lot of stuff that needs to be done and could be done faster and more effectively outside the core CPU.

There was a point over the last decade where software was destined to “eat the world”. I get a lot of flack for saying that was BS and that hardware innovation is really eating the world. Now that hardtware innovation’s back, it seems to be a little of both.

Comments?

Photo Credits:

  • Cerebras chip, Cerebras (see SC20 post)
  • Mythic architecture, Mythic computing (see AI at the edge post)
  • TPU2-iot, Google (see TPU post)
  • 130nm layouts (see Open source ASICs post)
  • Moore’s law chart – wikipedia, By Max Roser – https://ourworldindata.org/uploads/2019/05/Transistor-Count-over-time-to-2018.png, CC BY-SA 4.0, https://commons.wikimedia.org/w/index.php?curid=79751151

Ok, maybe neuromorphic chips aren’t a deadend

Those of you who followe my blog will no doubt recall that I pronounced neuromorphic chips dead (see our Are neuromorphic chips a deadend blog post). Not because the hardware technology wasn’t improving or good enough, but because software support for the technology was sorely lacking and it was extremely complex or nigh impossible to program and use.

But first please take our new poll:

And, in the meantime GPUs, TPUs and other more “normal” neural network hardware and accelerators, all were able to utilize standard, easy to use, mostly open source, AI DL frameworks. And all this hardware was steadily improving, coming out regularly with more power and performance, with no end in sight.

But then I attended AIFD1 (AI Field Day 1) and at one of the sessions, Anil Mankar, COO & Co-Founder of a company named BrainChip Inc, (see video of their talk) presented yet another neuromorphic chip, called the AKIDA Neural Processor. Their current generation of the technology is available in their AKD 1000 SoC chip, focused on IoT solutions. But they had created a a software development environment that allowed one to use standard TensorFlow neural network trained models and deploy these on their hardware. And that got my interest.

BrainChip’s AKIDA AKD 1000 hardware AND software

Their AI DL nueromoryhic chip is made app of Event Domain Neural Processing Units (NPUs). AKIDA technology is focused on low power, sensor like applications. They claim to save power by only consumuing power (or is running) when an event takes place. They are also able to save on memory requirements by using 1, 2 or 4 bits (vs. 8, 16, 32 or more bits) for model weights/activations

Their hardware seems to run spiking neural networks (SNN, see our blog post on another chip technology using SNNs). In their SDK, they have a CNN2SNN tool that could take a any (TensorFlow) trained CNN model and convert it to a SNN, that could then run on their AKIDA tecnology.

They also have an AKIDA Model Zoo with a handful of pre-trained CNN type models that have already been converted to run on their technology. They also provide a tutorial on their technology. Mankar, said that if you understand how to use TensorFlow Keras today, to construct and train your models, it shouldn’t be too hard to understand how to use their tools to do what you want.

Their chip hardware is available today on a separate PCIe card, M.2 form factor card. or as a chip. Finally, they also license their AKIDA IP to other chip designers.

AKIDA AKD 1000 performance

At the AIFD1 Mankar showed statistics on the performance and accuracy attained using their chip vs. using standard 32 bit floating point CNN implementations.

As discussed above, their processor uses 1-4 bits for weight quantization and as such loses some accuracy but as you can see it’s a matter of one to a few percent vs. these same models using a 32bit floating point CNN implementation.

Because of their smaller weights, AKIDA uses less memory and less bandwidth to update models vs. models using larger weights.

As shown in the chart the the memory required for the 8-bit deep learning algorithms (DLAs) were all significantly larger than the memory requirements for the AKIDA solution. For one algorithm, they required ~1/2 the memory size of the 8-bit DLA version of the model.

Mankar also provided information on the amount of calculations required per inference using AKIDA vs. 8-bit DLAs.

Just to set the stage, MMACs/Inference is (matrix or multiple) multiplications and accumulations required to perform a single inference with the selected CNN model. ImageNet (1000), ImageNette (20) and Visual Wake Word models are all standard CNN models, that have pre-trained on vast repositories of data, that can run in many hardware environments. The non-AKIDA solutions above were all running using an 8-bit DLA CNN model. Activity regularization is a method of reducing the learning rate and weights used during training that shrinks the weight changes during training to reduce model overfit.

He also showed some comparisons of their technology vs. Intel’s LoiHi hardware. LoiHi is another neuromorphic chip, whose original introduction prompted me to write the “Are neuromorphic chips a deadend” post (link above). Unfortunately, I didn’t capture any of these charts, but from my recollection, they showed that AKIDA technology used slightly less power than LoiHi technology in all their comparisons.

AKIDA technology demo

In their live, on camera, demo, they used a previously downloaded VGG16 (if I recall correctly) CNN trained model. Offline they had replaced the last classification layer with a (blank, untrained) dense network and they converted this to a SNN and downloaded onto one of their boards. They had developed an application that used this board with a camera to perform more CNN training or CNN image inferencing (classification).

They first (one-shot) trained their board’s model to recognize the background of what the camera was seeing and then proceeded to perform (one-shot) trainings to classify toys of tigers, elephants and cars. All these were completed in real time in the demo. They were able to verify the training took using pictures of tigers, elephants and cars as well as classify all the toys in different orientations and a different toy car

The AIFD1 (a tuff) crowd, said had seen all this before but would be really interested to see if their chip could distinguish between different cars (one a toy race car and the other a toy police car). On camera, they were able to re-train their CNN to distinguish between (toy) car 1 and car 2 to classify properly between the two of them. They had one or two instances where their CNN model was confused, but they were able to re-train it to recognize the toy car and place it into the correct classification (using two-shot[?] learning).

At AIFD1, Mankar also presented detailed, real world data on how they were able to perform Keyword spotting, person detection, E-nose classification, E-tongue classification, and auditory (E-ear?) classification in embedded sensor systems.

AKIDA technology limitations

At the moment, their chip doesn’t support neural networks that use memory such as LSTM or RNN’s but it seems to work fine for any CNN, which was shown multiple times in the data they presented and in their demo.

We were really impressed with their software stack, liked what we saw of their hardware/IP, and enjoyed their demo and its one-shot learning. Check out their videos (link above) for more information on them.

Photo Credit(s): all charts are from BrainChip Inc’s website or were presented at their AIFD1 session

Cambrian Explosion of AI DL app’s in industry and the world

I was at the NetApp Insight conference last week and recorded a podcast (see: GreyBeards Podcast) on what NetApp is doing in the AI DL (Deep Learning) space. On the podcast, we talked about a number of verticals that were deploying AI DL right now and using it to improve outcomes.

It was only is 2012 that AI DL broke out and pretty much conquered the speech recognition contest by improving recognition accuracy by leaps and bounds. Prior to that improvements had been very small and incremental at best. Here we are, just 7 years later and AI DL models are proliferating across industry and every other sector of the world economy.

DL applications in the real world

At the show. we talked about AI DL models being used in healthcare (radiological image analysis, cell counts for infection assessments), automotive (self driving cars), financial services (fraud detection), and retail (predicting how make up would look on someone).

And early this year, at HPE Discover, they discussed a new technique to share training data but still keep it private. In this case, they use block chain technology to publish and share a DL neural network model weights and other hyper parameters trained for some real world purpose.

Customers download and use the model in their day to day activities but record the data that their model analyzes and its predictions. They use this data to update (re-train) their DL neural net. They then publish their new neural net model weights and other parameters to all the other customers. Each customer of the model do the same, updating (re-training) their DL neural net.

At some point an owner or global model arbitrator takes all these individual model updates and aggregates the neural net weights, into a new neural net model and publishes the new model. And then the process starts over again. In this way, training data is never revealed, kept secure and private but DL model updates that result from re-training the model with secured private data would be available to any customer.

Recently, there’s been a slew of articles across many different organizations that show how AI DL is being adopted to work in different areas:

And that’s just a sample of the last few weeks of papers of AI DL activity.

Next Steps

All it takes is data, that can be quantified and classified. With data and classifications in hand, anyone can train a DL model that performs that classification. It doesn’t require GPU farms, decent CPUs are up to the task for TB of data.

But if you want better prediction/classificatoin accuracy, you will need more data which means longer AI DL training runs. So at some point, maybe at >100TB of data, or use AI DL training a lot, you may want that GPU farm.

The Deep Learning with Python book (my favorite) has a number of examples such as, sentiment analysis of text, median real estate pricing predictions, generating text that looks like an authors work, with maybe a dozen more that one can use to understand AI DL technology. But it’s not rocket science, I believe any qualified programmer could do it, with some serious study.

So the real question is what are you doing with your data to make use of AI DLmodels now?

I suppose the other question ought to be, how can you collect more data and classification information, to train more AI DL models?

~~~~

It’s great to be in the storage business.

Photo Credit(s):

Are neuromorphic chips a dead end?

Read a recent article about Intel’s Pohoiki Beach neuromorphic system and their Loihi chips, that has scaled up to 8M neurons in IEEE Spectrum (Intel’s neuromorphic system hits 8 M neurons). In the last month or so I wrote up about a two startups one of which seemed (?) to be working on a neuromorphic chip development (see my Photonics computing sees the light of day post).

But first please take our new poll:

I’ve been writing about neuromorphic chips since 2011, 8 long years (see IBM SyNAPSE chip post from 2011 or search my site for “neuromorphic”) and none have been successfully reached the market. The problems with neurmorphic architectures have always been twofold, scaling AND software.

Scaling up neurons

The human brain has ~86B neurons (see wikipedia human brain article). So, 8 million neuromorphic neurons is great, but it’s about 10K X too few. And that doesn’t count the connections between neurons. Some human neurons have over 1000 connections between nerve cells (can’t seem to find this reference anymore?).

Wikimedia commons (481px-Chemical_synapse_schema_cropped)
Wikimedia commons (481px-Chemical_synapse_schema_cropped)

To get from a single chip with 125K neurons to their 8M neuron system, Intel took 64 chips and put them on a couple of boards. To scale that to 86B or so would take ~690, 000 of their neuromorphic chips. Now, no one can say if there’s not some level below 85B neuromorphic neurons, that could support a useful AI solution, but the scaling problem still exists.

Then there’s the synapse connections between neuromorphic neurons problem. The article says that Loihi chips are connected in a heirarchical routing network, which implies to me that there are switches and master switches (and maybe a really big master switch) in their 8M neuromorphic neuron system. Adding another 4 orders of magnitude more neuromorphic neurons to this may be impossible or at least may require another 4 sets of progressively larger switches to be added to their interconnect network. There’s a question of how many hops and the resultant latency in connecting two neuromorphic neurons together but that seems to be the least of the problem with neuromorphic architectures.

Missing software abstractions

The first time I heard about neuromorphic chips I asked what the software looks like and the only thing I heard was that it was complex and not very user friendly and they didn’t want to talk about it.

I keep asking about software for neuromorphic chips and still haven’t gotten a decent answer. So, what’s the problem. In today’s day and age, software is easy to do, relatively inexpensive to produce and can range from spaghetti code to a hierarchical masterpieces, so there’s plenty of room to innovate here.

But whenever I talk to engineers about what the software looks like, it almost seems like a software version of an early plug board unit-record computer (essentially card sorters). Only instead of wires, you have software neuromorphic network connections and instead of electro-magnetic devices, one has software spiking neuromorphic neuron hardware.

The way we left plugboards behind was by building up hardware abstractions such as adders, shifters, multipliers, etc. and moving away from punch cards as a storage medium. Somewhere along this transition, we created programing languages like (macro) Assemblers, COBOL, FORTRAN, LISP, etc. It’s the software languages that brought computing out of the labs and into the market.

It’s been at least 8 years now, and yet, no-one has built a spiking neuromorphic computer language yet. Why not?

I think the problem is there’s no level of abstraction above a neuron. Where’s the aritmetic logic unit (ALU) or register equivalents in neuromorphic computers? They don’t exist as far as I can see.

Until we can come up with some higher levels of abstraction, coding neuromorphic chips is going to be an engineering problem not a commercial endeavor.

But neuromorphism has advantages

The IEEE article states a couple of advantages for neuromorphic computing: less energy to perform inferencing (and possibly training) and the ability to train on incremental data rather than having to train across whole datasets again.

Yes these are great, but there’s a gaggle of startups (e.g., see New GraphCore GC2 chip…, AI processing at the edge, TPU and HW-SW innovation) going after the energy problem in AI DL using Von Neumann architectures.

And the incremental training issue doesn’t seem any easier when you have ~80B neurons, with an occasional 1000s of connections between them to adjust correctly. From my perspective, its training advantage seems illusory at best.

Another advantage of neuromorphism is that it simulates the real analog logic of a human brain. Again, that’s great but a brain takes ~22 years to train (college level). Maybe because neuromorphic chips are electronic perhaps training can be done 100 times faster. But there’s still the software issue

~~~~

I hate to be the bearer of bad news. There’s been some major R&D spend on neuromorphism and it continues today with no abatement.

I just think we’d all be better served figuring out how to program the beast than on –spending more to develop more chip hardware..

This is hard for me to say, as I have always been a proponent of hardware innovation. It’s just that neuromorphic software tools don’t exist yet. And I’m afraid, I don’t see any easy way forward to make any progress on this.

Comments?.

Picture credit(s):

University of Manchester fires up world’s largest neuromorphic computer

Read an article the other day about SpiNNaker, the University of Manchester’s neuromorphic supercomputer (see Live Science Article: Worlds largest supercomputer brain…). There’s also a wikipedia page on SpiNNaker and a SpiNNaker project page.

 

SpiNNaker is part of the European Union Human Brain Project (HBP), Brain Simulation Program.

SpiNNaker supercomputing hardware

(Most of the following information is from the SpiNNaker project home page and SpiNNaker architectural overview page.)

The system has 1 million ARM9 (968) cores and ~7TB of memory, with each core emulating 1000 spiking neurons. With this amount of computing power, it should be able to emulate a 1B (1 billion, 10^9) neuron brain (region).

The system will consist of 1200 PCBs with each PCB containing a 48 chip array and associated networking hardware & memory. Each node contains a SpiNNaker chip with its 18 ARM9 cores.

Each node has two chips stitch bonded together on top of one another. The bottom consists of the 18-ARM9 cores and the top the double DDR memory and networking layer.

Total bisectional networking bandwidth is 5 B packets/second with each packet consisting of 5 or 9 bytes of data.

SpiNNaker operates on 1W per chip or 90KW of power to run the entire machine. Given that each chip is 18 cores and each core is 1000 neurons, this means each neuron simulation takes about 55.5µW of power to run.

You can deploy a single board as IoT solution but @ ~48W per board it may be be too energy consumptive for IoT.

SpiNNaker supercomputing software

According to the home page and the Live Science article, SpiNNaker is intended to be used to model critical segments of the human brain such as the  basal ganglia brain area for the EU HBP brain simulation program.

The system architecture has three tiers, a host machine (layer) which communicates with the monitor layer to start and monitor application execution and uses “ybug” to communicate,  a monitor core (layer) which interacts with ybug at the host and uses “scamp” to communicate with the application processors, and the application processors (layer) consisting of the ARM cores, memory and packet networking hardware which runs the  SpiNNaker Application Runtime Kernel (sark).

Applications which run on sark can consist of spiking neural networks or multi-layer perceptrons (MLP), classical deep learning neural networks.

  • MLP applications use back propagation and a training and inference phases, familiar to any deep learning application and uses a fixed neural network topology.
  • Spiking neural network applications use ongoing learning so there’s no training or inference phases (it’s always learning), use a variable network topology (reconfiguring the ARM core-packet network) and currently supports the PyNN spiking neural network simulator.

Unfortunately most of the links in the SpiNNaker project pages referring to PyNN spiking networking applications are broken. But PyNN is a Python based spiking neural network simulator that can run on a number of different hardware platforms (including sark/SpiNNaker).

Most of the AI groups I’ve talked with mention PyTorch or TensorFlow as AI frameworks of choice these days. But it’s unclear to me whether these two support spiking neural network generation/simulation.

If you want to learn more about programming SpiNNaker please check out their Software for SpiNNaker wiki page.

~~~~

As you may recall, a homo sapiens brain has an estimated 16B to 86B neurons in its average cerebral cortex (see wikipedia “animals listed by neuron count” article, for low estimate, EU’s HBP Brain Simulation page, for high estimate), which puts SpiNNaker today, at about the equivalent of less than a average tufted capuchin cerebral cortex (@1.2B neurons).

Given the above and with SpiNNaker @1B neurons, we are only  4 to 7 generations away from human equivalence. That means we have at most ~14 years left before a 128B spiking neuronal simulation machine is available.

But SpiNNaker today is based on ARM9 cores and ARM11 cores already exist. So, if they redesigned/reimplemented the chip today, it would already be 2X the core count. aake that human equivalence is only a max of 12 years away.

The mean estimate for AGI (artificial general intelligence) seems to be 2040-2050 (see wikipedia Technological Singularity article). But given what University of Manchester’s SpiNNaker is capable of doing today, I don’t think we have that long to wait.

Photo Credits: All photos/charts above are from the SpiNNaker Project pages at the University of Manchester website

IBM using PCM to implement better AI – round 6

Saw a recent article that discussed IBM’s research into new computing architectures that are inspired by brain computational techniques (see A new brain inspired architecture … ). The article reports on research done by IBM R&D into using Phase Change Memory (PCM) technology to implement various versions of computer architectures for AI (see Tutorial: Brain inspired computation using PCM, in the AIP Journal of Applied Physics).

But first please take our new poll:

As you may recall, we have been reporting on IBM Research into different computing architectures to support AI processing for quite awhile now, (see: Parts 1, 2, 3, 4, & 5). In our last post, More power efficient deep learning through IBM and PCM, we reported on a unique hybrid PCM-silicon solution to deep learning computation.

Readers should also be familiar with PCM as well as it’s been discussed at length in a number of our posts (see The end of NAND is near, maybe; The future of data storage is MRAM; and New chip architectures with CPU, storage & sensors …). MRAM, ReRAM and current 3D XPoint seem to be all different forms of PCM (I think).

In the current research, IBM discusses three different approaches to support AI  utilizing PCM devices. All three approaches stem from the physical characteristics of PCM.

(Some) PCM physics

FIG. 2. (a) Phase-change memory is based on the rapid and reversible phase transition of certain types of materials between crystalline and amorphous phases by the application of suitable electrical pulses. (b) Transmission electron micrograph of a mushroom-type PCM device in a RESET state. It can be seen that the bottom electrode is blocked by the amorphous phase.

It turns out that PCM devices have many  characteristics that lend themselves to be useful for specialized computation. PCM devices crystalize and melt in order to change state. The properties associated with melting and crystallization of the PCM media cell can be used to support unique forms of computation. Some of these PCM characteristics include::

  • Analog, not digital memory – PCM devices are, at the core, an analog memory device. We mean that they don’t record just a 0 or 1 (actually resistant or conductive) state, but rather a continuum of values between those two.
  • PCM devices have an accumulation capability –   each PCM cell actually  accumulates a level of activation. This means that one cell can be more or less likely to change state depending on prior activity.
  • PCM devices are noisy – PCM cells arenot perfect recorders of state chang signals  but rather have a well known, random noise which impacts the state level attained, that can be used to introduce randomness into processing.

The other major advantage of PCM devices is that they take a lot less power than a GPU-CPU to work.

Three ways to use PCM for AI learning

FIG. 4. “In-memory computing,” computation is performed in place by exploiting the physical attributes of memory devices organized as a “computational memory” unit. For example, if data A is stored in a computational memory unit and if we would like to perform f(A), then it is not required to bring A to the processing unit. This saves energy and time that would have to be spent in the case of conventional computing system and memory unit. Adapted from Ref. 19.

The Applied Physics article describes three ways to use PCM devices in AI learning. These three include:

  1. Computational storage – which uses the analog capabilities of PCM to perform  arithmetic and learning computations. In a sort of combined compute and storage device.
  2. AI co-processor – which uses PCM devices, in an “all PCM nodes connected to all other PCM nodes” operation that could be used to perform neural network learning. In an AI co-processor there would be multiple all connected PCM modules, each emulating a neural network layer.
  3. Spiking neural networks –  which uses PCM activation accumulation characteristics & inherent randomness to mimic, biological spiking neuron activation.
FIG. 11.
A proposed chip architecture for a co-processor for deep learning based on PCM arrays.28

It’s the last approach that intrigues me.

Spiking neural nets (SNN)

FIG. 12. (a) Schematic illustration of a synaptic connection and the corresponding pre- and post-synaptic neurons. The synaptic connection strengthens or weakens based on the spike activity of these neurons; a process referred to as synaptic plasticity. (b) A well-known plasticity mechanism is spike-time-dependent plasticity (STDP), leading to weight changes that depend on the relative timing between the pre- and post-synaptic neuronal spike activities. Adapted from Ref. 31.

Biological neurons accumulate charge from all input (connected) neurons and when they reach some input threshold, generate an output signal or spike. This spike is then used to start the process with another neuron up stream from it

Biological neurons also exhibit randomness in their threshold-spiking process.

Emulating spiking neurons, n today’s neural nets, takes computation.  Also randomness takes more.

But with PCM SNN, both the spiking process and its randomness, comes from device physics. Using PCM to create SNN seems a logical progression.

PCM as storage, as memory, as compute or all the above

In the storage business, we look at Optane (see our 3D Xpoint post) SSDs as blazingly fast storage. Intel has also announced that they will use 3D Xpoint in a memory form factor which should provide sadly slower, but larger memory devices.

But using PCM for compute, is a radical departure from the von Neumann computer architectures we know and love today. HPE has been discussing another new computing architecture with their memristor technology, but only in prototype form.

It seems IBM, is also prototyping hardware done this path.

Welcome to the next computing revolution.

Photo & Caption Credit(s): Photo and caption from Figure 2 in AIP Journal of Applied Physics article

Photo and caption from Figure 4 in AIP Journal of Applied Physics article

Photo and caption from Figure 11 in AIP Journal of Applied Physics article

Photo and caption from Figure 12 in AIP Journal of Applied Physics article

A new way to compute

I read an article the other day on using using random pulses rather than digital numbers to compute with, see Computing with random pulses promises to simplify circuitry and save power, in IEEE Spectrum. Essentially they encode a number as a probability in a random string of bits and then use simple logic to compute with. This approach was invented in the early days of digital logic and was called stochastic computing.

Stochastic numbers?

It’s pretty easy to understand how such logic can work for fractions. For example to represent 1/4, you would construct a bit stream that had one out of every four bits, on average, as a 1 and the rest 0’s. This could easily be a random string of bits which have an average of 1 out of every 4 bits as a one.

A nice result of such a numerical representation is that it easily results in more precision as you increase the length of the bit stream. The paper calls this progressive precision.

Progressive precision helps stochastic computing be more fault tolerant than standard digital logic. That is, if the string has one bit changed it’s not going to make that much of a difference from the original string and computing with an erroneous number like this will probably result in similar results to the correct number.  To have anything like this in digital computation requires parity bits, ECC, CRC and other error correction mechanisms and the logic required to implement these is extensive.

Stochastic computing

2 bit multiplier

Another advantage of stochastic computation and using a probability  rather than binary (or decimal) digital representation, is that most arithmetic functions are much simpler to implement.

 

They discuss two examples in the original paper:

  • AND gate

    Multiplication – Multiplying two probabilistic bit streams together is as simple as ANDing the two strings.

  • 2 input stream multiplexer

    Addition – Adding two probabilistic bit strings together just requires a multiplexer, but you end up with a bit string that is the sum of the two divided by two.

What about other numbers?

I see a couple of problems with stochastic computing:,

  • How do you represent  an irrational number, such as the square root of 2;
  • How do you represent integers or for that matter any value greater than 1.0 in a probabilistic bit stream; and
  • How do you represent negative values in a bit stream.

I suppose irrational numbers could be represented by taking a near-by, close approximation of the irrational number. For instance, using 1.4 for the square root of two, or 1.41, or 1.414, …. And this way you could get whatever (progressive) precision that was needed.

As for integers greater than 1.0, perhaps they could use a floating point representation, with two defined bit strings, one representing the mantissa (fractional part) and the other an exponent. We would assume that the exponent rather than being a probability from 0..1.0, would be inverted and represent 1.0…∞.

Negative numbers are a different problem. One way to supply negative numbers is to use something akin to complemetary representation. For example, rather than the probabilistic bit stream representing 0.0 to 1.0 have it represent -0.5 to 0.5. Then progressive precision would work for negative numbers as well a positive numbers.

One major downside to stochastic numbers and computation is that high precision arithmetic is very difficult to achieve.  To perform 32 bit precision arithmetic would require a bit streams that were  2³² bits long. 64 bit precision would require streams that were  2**64th bits long.

Good uses for stochastic computing

One advantage of simplified logic used in stochastic computing is it needs a lot less power to compute. One example in the paper they use for stochastic computers is as a retinal sensor for in the body visual augmentation. They developed a neural net that did edge detection that used a stochastic front end to simplify the logic and cut down on power requirements.

Other areas where stochastic computing might help is for IoT applications. There’s been a lot of interest in IoT sensors being embedded in streets, parking lots, buildings, bridges, trucks, cars etc. Most have a need to perform a modest amount of edge computing and then send information up to the cloud or some edge consolidator intermediate

Many of these embedded devices lack access to power, so they will need to make do with whatever they can find.  One approach is to siphon power from ambient radio (see this  Electricity harvesting… article), temperature differences (see this MIT … power from daily temperature swings article), footsteps (see Pavegen) or other mechanisms.

The other use for stochastic computing is to mimic the brain. It appears that the brain encodes information in pulses of electric potential. Computation in the brain happens across exhibitory and inhibitory circuits that all seem to interact together.  Stochastic computing might be an effective way, low power way to simulate the brain at a much finer granularity than what’s available today using standard digital computation.

~~~~

Not sure it’s all there yet, but there’s definitely some advantages to stochastic computing. I could see it being especially useful for in body sensors and many IoT devices.

Comments?

Photo Credit(s):  The logic of random pulses

2 bit by 2 bit multiplier, By Sodaboy1138 (talk) (Uploads) – Own work, CC BY-SA 3.0, wikimedia

AND ANSI Labelled, By Inductiveload – Own work, Public Domain, wikimedia

2 Input multiplexor

A battery free implantable neural sensor, MIT Technology Review article

Integrating neural signal and embedded system for controlling a small motor, an IntechOpen article