MIT – Silverton Consulting

Photonic [AI] computing seeing the light of day – part 2

Posted on June 16, 2021June 16, 2021 by Ray in Artificial Intelligence, Deep Learning, Energy efficiency, Photonic computing, Strategic Inflection Points, Strategic planning

Read an interesting article in Analytics India Magazine (MIT Researchers Make New Chips That Work On Light) about a startup out of MIT focused on using photonics for AI/ML/DL activities. Not exactly neuromorphic chips, but using analog photonics interactions to perform computational intensive operations required by todays deep neural net training.

We’ve written about photonics computing before ( see Photonic computing seeing the light of day [-part 1]). That post was about spin outs from Princeton and MIT back in 2019. We showed a bit more on how photonics can perform multiplication and other computations with less power.

The article (noted above) talked about LightIntelligence, an MIT spinout/ startup that’s been around since ~2017, but there’s another company in the same space, also out of MIT called LightMatter that just announced early access to their hardware system.

The CEOs of both companies collaborated on a paper (#1&2 authors of the 10 author paper) written back in 2017 on Deep Learning with Coherent Nanophotonic Circuits. This seemed to be the event that launched both companies.

LightMatter just received $80M in Series B funding ( bringing total funding to $113M) last month and LightIntelligence seems to have $40M in total funding So both have decent funding but, LightMatter seems further ahead in funding and product technology.

LightMatter

LightMatter Envise AI chip uses standard RISC electronic cores together with Photo Arithmetic Units for accelerated AI computations. Each Envise chip has 500MB of SRAM for large models, offers 400Gbps chip to chip interconnect fabric, 256 RISC cores, a Graph processor, 294 photonic arithmetic units and PCIe 4.0 connectivity.

LightMatter has just announced early access for their Envise AI photonics server. It’s an 4U, AI server with 16 Envise chips, 2 AMD EPYC CPUs, (16×400=)6.4Tpbs optical fabric for inter-chip communications, 1TB of DDR4 DRAM, 3TB of NVMe SSD and supports 2-200GbE SmartNICs for outside communications.

Envise also offers Idiom Software that interfaces with standard AI frameworks to transform models for photonics computing to use Envise hardware . Developers select Envise hardware to run their AI models on and Idiom automatically re-compiles (IdCompile) their model into more parallelized, photonics operations. Idiom also has a model profiler (IdProfiler) to help debug and visualize photonic models in operation (training or inferencing?) on Envise hardware. Idiom also offers an AI model library (IdML) which provides a PyTorch frontend to help compress and quantize a standard set of AI models.

LightMatter also announced their Passage optical interconnect chip that supplies 100Tbps optical switch for photonics, CPU or GPU processing. It’s huge, 8″x8″ and built on 5nm/7nm node process. Passage can connect up to 48 photonics, CPU or GPU chips that are built onto of it (one can see the space for each of these 48 [sub-]chips on the chip). LightMatter states that 40 Passage (photonic/optical) lanes are the width of one optical fibre. Passage chips are sampling now.

LightMatter Passage *photonics-transistor* chip (carrier) that provides a photonics programmable interconnect for inter-[photonics-electronic-]chip communications.

LightIntelligence

They don’t appear to be announcing any specific hardware just yet but they are at work in creating the world largest integrated photonics processing system. But LightIntelligence have published a number of research papers focused on photonic approaches to CNNs, RNNs/LSTMs/GRUs, Recurrent ISING machines, statistical computing, and invisibility cloaking.

Turns out the processing power needed to provide invisibility cloaking is very intensive and as its all pixels, photonics offers serious speedups (for invisibility, see Nature article, behind paywall).

Photonics Recurrent ISLING Sampler (PRIS)

LightIntelligence did produce a prototype photonics processor in 2019. And they believe the will have de-risked 80-90% of their photonics technology by year end 2021.

If I had to guess, it would appear as if LightIntelligence is trying to re-imagine deep learning taking a predominately all photonics approach.

Why photonics for AI DL

It turns out that one can use the interaction/interference between two light beams to perform matrix multiplication and other computations a lot faster, with a lot less power than using standard RISC (or CISC) electronic processor architectures. Typical GPUs run 400W each and multi-GPU training activities are commonplace today.

The research documented in the (Deep learning using nanophotonics) paper was based on using an optical FPGA which we have talked about before (See Photonics or Optical FPGAs on the horizon) to prototype the technology back in 2017.

Can photonics change the technology underpinning AI or computing?

If by using photonics, one could speed up AI inferencing by 3-5X and do it with 5-6X less power, you might have a market. These are LightMatter Envise performance numbers on ResNet50 with ImageNet and BERT-Base with SQUAD v1.1 against NVIDIA DGX-A100 (state of the art) AI processing system.

The challenge to changing the technology behind multi-million/billion/trillion dollar industry is that it’s not sufficient to offer a product better than the competition. One has to offer a technology that’s better enough to fund the building of a new (multi-million/billion/trillion dollar) ecosystem surrounding that technology. In order to do that it’s got to be orders of magnitude faster/lower power/better so that commercial customers adopt it en masse.

I like where LightMatter is going with their Passage chip. But their Envise server doesn’t seem fast enough to give them enough traction to build a photonics ecosystem or to fund Envise 2, 3, 4, etc. to change the industry.

The 2017 (Deep learning using nanophotonics) paper predicted that an all optical/photonics implementation of CNN would use 3 orders of magnitude less power for small models and that advantage would only go up for larger models (not counting power for data movement, photo detectors, etc.). Now if that’s truly feasible and maybe it takes a more photonics intensive processor to get there, then photonics technology could truly transform the AI or for that matter the computing industry.

But the other thing that LightIntelligence and LightMatter may be counting on is the slowdown in Moore’s law which may inhibit further advances in electronics processing power. Whether the silicon industry is ready to throw in the towel yet on Moore’s law is TBD.

Comments?

Photo Credit(s):

From LightMatter‘s website
From Deep Learning with Coherent Nanophotonic Circuits
From Heuristic recurrent algorithms for photonic Ising machines paper

OFA DNNs, cutting the carbon out of AI

Posted on April 30, 2020April 30, 2020 by Ray in Artificial Intelligence, Data density, Deep Learning, Distributed computing, Energy efficiency, Information economy, Machine Learning, Processing performance, System effectiveness

Read an article (Reducing the carbon footprint of AI… in Science Daily) the other day about a new approach to reducing the energy demands for AI deep neural net (DNN) training and inferencing. The article was reporting on a similar piece in MIT News but both were discussing a technique original outlined in a ICLR 2020 (Int. Conf. on Learning Representations) paper, Once-for-all: Train one network & specialize it for efficient deployment.

The problem stems from the amount of energy it takes to train a DNN and use it for inferencing. In most cases, training and (more importantly) inferencing can take place on many different computational environments, from IOT devices, to cars, to HPC super clusters and everything in between. In order to create DNN inferencing algorithms for use in all these environments, one would have to train a different DNN for each. Moreover, if you’re doing image recognition applications, resolution levels matter. Resolution levels would represent a whole set of more required DNNs that would need to be trained.

The authors of the paper suggest there’s a better approach. Train one large OFA (once-for-all) DNN, that covers the finest resolution and largest neural net required in such a way that smaller, sub-nets could be extracted and deployed for less weighty computational and lower resolution deployments.

The authors contend the OFA approach takes less overall computation (and energy) to create and deploy than training multiple times for each possible resolution and deployment environment. It does take more energy to train than training a few (4-7 judging by the chart) DNNs, but that can be amortized over a vastly larger set of deployments.

OFA DNN explained

Essentially the approach is to train one large (OFA) DNN, with sub-nets that can be used by themselves. The OFA DNN sub-nets have been optimized for different deployment dimensions such as DNN model width, depth and kernel size as well as resolution levels.

While DNN width is purely the number of numeric weights in each layer, and DNN depth is the number of layers, Kernel size is not as well known. Kernels were introduced in convolutional neural networks (CovNets) to identify the number of features that are to be recognized. For example, in human faces these could be mouths, noses, eyes, etc. All these dimensions + resolution levels are used to identify all possible deployment options for an OFA DNN.

OFA secrets

One key to the OFA success is that any model (sub-network) selected actually shares the weights of all of its larger brethren. That way all the (sub-network) models can be represented by the same DNN and just selecting the dimensions of interest for your application. If you were to create each and every DNN, the number would be on the order of 10**19 DNNs for the example cited in the paper with depth using {2,3,4) layers, width using {3,4,6} and kernel sizes over 25 different resolution levels.

In order to do something like OFA, one would need to train for different objectives (once for each different resolution, depth, width and kernel size). But rather than doing that, OFA uses an approach which attempts to shrink all dimensions at the same time and then fine tunes that subsets NN weights for accuracy. They call this approach progressive shrinking.

Progressive shrinking, training for different dimensions

Essentially they train first with the largest value for each dimension (the complete DNN) and then in subsequent training epochs reduce one or more dimensions required for the various deployments and just train that subset. But these subsequent training passes always use the pre-trained larger DNN weights. As they gradually pick off and train for every possible deployment dimension, the process modifies just those weights in that configuration. This way the weights of the largest DNN are optimized for all the smaller dimensions required. And as a result, one can extract a (defined) subnet with the dimensions needed for your inferencing deployments.

They use a couple of tricks when training the subsets. For example, when training for smaller kernel sizes, they use the center most kernels and transform their weights using a transformation matrix to improve accuracy with less kernels. When training for smaller depths, they use the first layers in the DNN and ignore any layers lower in the model. Training for smaller widths, they sort each layer for the highest weights, thus ensuring they retain those parameters that provide the most sensitivity.

It’s sort of like multiple video encodings in a single file. Rather than having a separate file for every video encoding format (Mpeg 2, Mpeg 4, HVEC, etc.), you have one file, with all encoding formats embedded within it. If for example you needed Mpeg-4, one could just extract those elements of the video file representing that encoding level

OFA DNN results

In order to do OFA, one must identify, ahead of time, all the potential inferencing deployments (depth, width, kernel sizes) and resolution levels to support. But in the end, you have a one size fits all trained DNN whose sub-nets can be selected and deployed for any of the pre-specified deployments.

The authors have shown (see table and figure above) that OFA beats (in energy consumed and accuracy level) other State of the Art (SOTA) and Neural (network) Architectural Search (NAS) approaches to training multiple DNNs.

The report goes on to discuss how OFA could be optimized to support different latency (inferencing response time) requirements as well as diverse hardware architectures (CPU, GPU, FPGA, etc.).

~~~~

When I first heard of OFA DNN, I thought we were on the road to artificial general intelligence but this is much more specialized than that. It’s unclear to me how many AI DNNs have enough different deployment environments to warrant the use of OFA but with the proliferation of AI DNNs for IoT, automobiles, robots, etc. their will come a time soon where OFA DNNs and its competition will become much more important.

Comments

Photo Credit(s):

Photo is Figure 1 in ICLR 2020 paper, Once-For-All: Train one network & specialize it for efficient deployment
Photo is Figure 3 in ICLR 2020 paper, Once-For-All: Train one network & specialize it for efficient deployment
Photo is Figure 4 in ICLR 2020 paper, Once-For-All: Train one network & specialize it for efficient deployment
Photo is Table 1 & Figure 8 in ICLR 2020 paper, Once-For-All: Train one network & specialize it for efficient deployment

Better core allocation for congested web apps

Posted on February 23, 2019February 23, 2019 by Ray in Distributed computing, Energy efficiency, Processing performance, Strategic Inflection Points, System effectiveness, Visionary leadershp

Read an article in ScienceDaily (Achieving greater efficiency for fast datacenter operations) today that discussed some research done at MIT CSAIL to be presented next week at NSDI’19 discussing Shenango, a new algorithm to allocate idle CPU cores to process latency sensitive transaction workloads. The paper is to be presented on February 27th. (I may update this with more details on Shenango after the paper is published)

t appears that for many web-scale applications, response time is driven mostly by tail latencies (slowest service determines web page response). For these 10K-100K server environments, they have always had to over provision CPU cores to support reducing service tail latency. This has led to 100s to 1000s of cores, mostly sitting idle (but powered on) for much of the time.

here’s been some solutions that try to better use idle cores, but their core allocation responsiveness has been in the milliseconds. With 10-100s of threads that make up web service , allocating CPU resources in milliseconds was too slow

Arachne, a core aware thread scheduler

One approach to better core allocation uses Arachne: Core Aware Thread Management, out of Stanford.

With Arachne, threads are assigned to an application and each is given a priority. Arachne attempts to schedule them in priority order across an array of cores at its disposal.

Arachne’s Core Arbiter code is what assigns application threads to cores and runs under Linux at the user level. Some of its timings seem pretty fast. In the paper cited above, Arachne was able to schedule a thread to a core in under 300nsec.

Under Arachne, there are two sets of cores, managed and unmanaged cores and applications. Unmanaged cores run normal (non-Arachne, unmanaged) applications and threads. Managed cores or applications use Arachne to assign cores.

Arachne uses a Linux construct called cpusets, a collection of cores and memory banks, to allocate resources to run application threads. Cores and memory banks move between managed and unmanaged based on applications being run. Arachne assumes that managed apps have higher priority than unmanaged apps.

That is at the start of Arachne, all cores exist in the unmanaged set. The Core Arbiter executes here as well. As applications are scheduled to run, the Arbiter grabs cpusets from unmanaged applications or a free pool and assigns them to run application threads. When the application completes the cpusets are returned to the unmanaged pool.

Arachne allocates cores based on a priority scheme with 8 levels. Highest priority managed applications/threads get cpusets first, lower priority managed application threads next, and unmanaged applications last

There’s a set of APIs that applications must use to request and free cores when no longer in use. Arachne seems pretty general purpose, and as it operates with both normal (unmanaged) Linux applications as well as (Arachne) managed applications is appealing.

Shenango core allocation

Untitled by johnwilson1969 (cc) (from Flickr)

Not much technical information on Shenango was available as we published this post, but their is some information in the MIT/ScienceDaily piece and some in the Arachne paper.

It appears as if Shenango detects applications suffering from high tail latency by interfacing with the network stack and seeing if packets have been waiting to be processed. It does this every 5 usecs and if a packet has been waiting since last time, it’s considered a candidate for more cores, has tail latency problems and is congested.

IIt seems to do the same for computational processes that have been waiting for some service response. Shenango implements an IOKernel that handles core allocation to apps. Shenango IO

Shenango apps use an API to indicate when they are not processing time sensitive services and when they are. If they are not, their cores can be released to more time sensitive apps that are encountering congestion

Presumably Shenango does not execute at the user level. And it’s unclear whether it can operate with both (Linux) normal and Shanango managed applications. And it also appears to be tied tightly to the network stack. Whether any of this matters to web-scale application users/developers is subject to debate.

However, the fact that it only alters core allocations when applications are congested seems a nice feature.

~~~~

The Arachne paper said it “improved SLO MemCached by 37% and reduced tail latency by 10X” . The only metric available in the Shenango discussion was that they increased typical web-scale server CPU core allocation from 60% to 100%

f Shenango or Arachne can reduce over provisioning of CPU cores and memory, it could lead to significant energy and server savings. Especially for customers running 10K servers or more.

AI processing at the edge

Posted on August 16, 2018April 8, 2021 by Ray in Artificial Intelligence, Cognitive computing, Decision making, Distributed computing, Neural network, Processing performance, R&D measures, Strategic Inflection Points, System effectiveness, Visionary leadershp

Read a couple of articles over the past few weeks (TechCrunch: Google is making a fast, specialized TPU chip for edge devices … and IEEE Spectrum: Two startups use processing in flash for AI at the edge) about chips for AI at the IoT edge.

The two startups, Syntiant and Mythic, are moving to analog only or analog-digital solutions to provide AI processing needed at the edge while Google is taking their TPU technology to the edge. We have written about Google’s TPU before (see: TPU and hardware vs. software innovation (round 3) post).

But first please take our new poll:

The major challenge in AI processing at the edge is power consumption. Both startups attack the power problem by using flash and other analog circuitry to provide power efficient compute.

Google attacked the power problem with their original TPU by reducing computational precision from 64- to 8-bits. By reducing transistor counts, they lowered power requirements proportionally.

AI today is based on neural networks (NN), that connect simulated neurons via simulated synapses with weights attached to indicate whether to boost or decrease the signal being transmitted. AI learning is done by setting those weights and creating the connections between simulated neurons and the synapses. So learning is setting weights and establishing connections. Actual inferences (using AI to do something) is a process of exciting input simulated neurons/synapses and letting the signal flow through the NN with each weight being used to determine output(s).

AI with standard compute

The problem with doing AI learning or inferencing with normal CPUs or even CUDAs is that the NN does thousands if not millions of multiplication-accumulation actions at each simulated synapse-neuron connection. Doing all these multiplication-accumulation takes power. CPUs and CUDAs can do these sorts of operations on 32 or 64 bit numbers or even floating point but it still takes power.

AI processing power

AI processing power is measured in trillions of (accumulate-multiply) operations per second per watt (TOPS/W). Mythic believes it can perform 4 TOPS/W and Syntiant says it can do 20 TOPS/W. In comparison, the NVIDIA Volta V100 can do about 0.4 TOPS/W (according to the article). Although comparing Syntiant-Mythic TOPS to NVIDIA TOPS is a little like comparing apples to oranges.

A current Intel Xeon Platinum 8180M (2.5Ghz, 28 Core processors, 205 W) can probably do (assuming one multiplication-accumulation per hertz) about 2.5 Billion X 28 Cores = 70 Billion Ops Second/205 W or 0.3 GOPS/W (source: Platinum 8180M Data sheet).

As for Google’s TPU TOPS/W, TPU2 is rated at 45 GFLOPS/chip and best guess for power consumption is between 160W and 200W, let’s say 180W. With power at that level, TPU2 should hit 0.25 GFLOPS/W. TPU3 is coming out with 8X the power but it uses water cooling (read LOTS MORE POWER).

Nonetheless, it appears that Mythic and Syntiant are one to two orders of magnitude better than the best that NVIDIA and TPU2 can do today and many orders of magnitude better than Intel X86.

Improving TOPS/W

Using NAND, as an analog memory to read, write and hold NN weights is an easy way to reduce power consumption. Combine that with analog circuitry that can do multiplication and addition with those flash values and you have a AI NN processor. This way you reduce the need to hold weights in memory and do compute in registers by collapsing both compute and memory into the same componentry.

The major difference between Syntiant and Mythic seems to be the amount of analog circuitry they use. Mythic seems to relegate the analog circuitry to an accelerator while Syntiant has a more extensive use of analog circuitry throughout their chip. Probably why it can perform 5X the TOPS/W of Mythic’s IPU.

IBM and others have been working on neuromorphic chips some of which are analog based and others which are all digital based. We’ve written extensively on IBM and some on MIT’s approaches (for the latest on IBM see: More power efficient deep learning through IBM and PCM, and for MIT see: MIT builds an analog synapse chip) and follow the links there to learn more.

~~~~

Special purpose AI hardware is emerging from the labs and finally reaching reality. IBM R&D has been playing with it for a long time. Google is working on TPU3 so there’s no stopping them. And startups are seeing an opening and are taking everyone on. Stay tuned, were in for a good long ride before the someone rises above the crowd and becomes the next chip giant.

Comments?

Photo Credit(s): TechCrunch Google is making a fast, specialized TPU chip for edge devices … article

Introduction to Digital Design Verification at Mythic, Medium.com Article

Images from Google Cloud Platform Blog on the TPU

Two startups use processing in flash for AI at the edge, IEEE Spectrum article courtesy of Mythic

Information flows everywhere – part 1

Posted on May 28, 2018 by Ray in Internet of Things, Strategic Inflection Points, Visionary leadershp

Read an article today from Scientific American (Sewage is helping cities flush out the opioid crisis) about how using chemical analysis of wastewater can be used to assess the extent of the opioid crisis in their city.

Wastewater information highway

There’s a lab at ASU (Arizona State University) that chemically analyzes samples of wastewater to determine the amount of drugs that a city’s population excretes. They can provide a near real-time assessment of the proportion of drugs in city sewage and thereby, in a city’s population.

The problem with public drug use surveys and hospital data gathering is that they take time. Moreover, surveys and hospital data gathering typically come long after drugs problem have become a serious problem in a city’s population.

Wastewater sample drug analysis can be done in a matter of days and can be redone as often as needed. Such data could be used to track intervention activities and see if they have a real impact (positive or negative) on drug use in a population.

Neighborhood health

In addition, by sampling sewage at a neighborhood level, one can gain an assessment of drug problems at any sub-division of a city that’s needed.

The above article talks about an MIT program with Cary, NC (from Biobot.io) that is designing robots to traverse sewer pipes and analyze wastewater chemical makeup in real time, reporting this back to ground stations around the city.

With such an approach, one could almost zero in (depending on sewer pipe networks) on any neighborhood in a city, target specific interventions at that level and measure impact in (digestion delayed) real time. Doing so, cities or states for that matter, could experiment with different interventions on a neighborhood by neighborhood basis and gain statistical evidence on drug problem intervention effectiveness.

But, you can analyze wastewater for any number of variables, such as viruses, bacteria, enzymes, etc. Any of which can lead to a better understanding of a population’s health.

~~~~

Two things I want to leave you with:

First, public health has had a major impact on human health and has doubled our lifespan in 200 years. All modern cities have water treatment plants today to insure water quality and thereby, have reduced the incidence of cholera and other waterborne epidemics in their cities. Wastewater analysis has the potential for significant improvements in population health monitoring. Just like water treatment, wastewater analysis will someday become common public health practice in modern cities throughout the world.

Second, I was at a conference this week which presented a slide that there was no cold data anymore (Pure//Accelerate 2018). This was in reference to re-analyzing old, cold data can often lead to insights and process improvements that were not obvious at first glance.

But it’s not just data anymore. Any activity done by man needs to be analyzed for (inherent & invisible) information flows that could be extracted to make the world a better place.

Photo Credit(s):

DSCN7989: Albuquerque Film Office
Prescription prices Ver3 by Chris Potter
Barcelona by SirisVisual

Analog neural simulation or digital neuromorphic computing vs. AI, Neuromorphic Part 1

Posted on December 15, 2011October 22, 2024 by Ray in Artificial Intelligence, Cognitive computing, IBM SyNAPSE chip, IBM Watson, MIT analog brain chip

DSC_9051 by Greg Gorman (cc) (from Flickr)

At last week’s IBM Smarter Computing Forum we had a session on Watson, IBM’s artificial intelligence machine which won Jeopardy last year and another session on IBM sponsored research helping to create the SyNAPSE digital neuromorphic computing chip.

Putting “Watson to work”

Apparently, IBM is taking Watson’s smarts and applying it to health care and other information intensive verticals (intelligence, financial services, etc.). At the conference IBM had Monoj Saxena, senior director Watson Solutions and Dr. Herbert Chase, a professor of clinical medicine a senior medical professor from Columbia School of Medicine come up and talk about Watson in healthcare.

Mr. Saxena’s contention and Dr. Chase concurred that Watson can play at important part in helping healthcare apply current knowledge. Watson’s core capability is the ability to ingest and make sense of information and then be able to apply that knowledge. In this case, using medical research knowledge to help diagnose patient problems.

Dr. Chase had been struck at a young age by one patient that had what appeared to be an incurable and unusual disease. He was an intern at the time and was given the task to diagnose her issue. Eventually, he was able to provide a proper diagnosis but it irked him that it took so long and so many doctors to get there.

So as a test of Watson’s capabilities, Dr. Chase input this person’s medical symptoms into Watson and it was able to provide a list of potential diagnosises. Sure enough, Watson did list the medical problem the patient actually had those many years ago.

At the time, I mentioned to another analyst that Watson seemed to represent the end game of artificial intelligence. Almost a final culmination and accumulation of 60 years in AI research, creating a comprehensive service offering for a number of verticals.

That’s all great, but it’s time to move on.

SyNAPSE is born

In the next session IBM had Dr. Dharmenrad Modta come up and talk about their latest SyNAPSE chip, a new neueromorphic digital silicon chip that mimicked the brain to model neurological processes.

We are quite a ways away from productization of the SyNAPSE chip. Dr. Modha showed us a real-time exhibition of the SyNAPSE chip in action (connected to his laptop) with it interpreting a handwritten numeral into it’s numerical representation. I would say it’s a bit early yet, to see putting “SyNAPSE to work”.

Digital vs. analog redux

I have written about the SyNAPSE neuromorphic chip and a competing technology, the direct analog simulation of neural processes before (see IBM introduces SyNAPSE chip and MIT builds analog synapse chip). In the MIT brain chip post I discussed the differences between the two approaches focusing on the digital vs. analog divide.

It seems that IBM research is betting on digital neuromorphic computing. At the Forum last week, I had a discussion with a senior exec in IBM’s STG group, who said that the history of electronic computing over the last half century or so has been mostly about the migration from analog to digital technologies.

Yes, but that doesn’t mean that digital is better, just more easy to produce.

On that topic, I asked the Dr. Modha, on what he thought of MIT’s analog brain chip. He said

MIT’s brain chip was built on 180nm fabrication processes whereas his is on 45nm or over 3X finer. Perhaps the fact that IBM has some of the best fab’s in the world may have something to do with this.
The digital SyNAPSE chip can potentially operate at 5.67Ghz and will be absolutely faster than any analog brain simulation. Yes, but each analog simulated neuron is actually one of a parallel processing complex and with a 1’000 or a million of them operating even 1000X or million X slower it’s should be able to keep up.
The digital SyNAPSE chip was carefully designed to be complementary to current digital technology. As I look at IT today we are surrounded by analog devices that interface very well with the digital computing environment, so I don’t think this will be a problem when we are ready to use it.

Analog still surrounds us and defines the real world. Someday the computing industry will awaken from it’s digital hobby horse and somehow see the truth in that statement.

~~~~

In any case, if it takes another 60 years to productize one of these technologies then the Singularity is farther away than I thought, somewhere around 2071 should about do it.

Comments?

MIT builds analog synapse chip

Posted on November 15, 2011April 8, 2021 by Ray in Distributed computing, Information economy, Strategic Inflection Points, System effectiveness

2011 Wikimedia commons (400px-Synapse_Illustration_unlabeled.svg)

Recently MIT announced a new brain chip, a breakthrough device that simulates a single brain synapse with an analog chip.

We have discussed before the digital nueromorphic chip activity going on (see my IBM introducing their SyNAPSE chip and Electro-human interface posts). However both those were digital, this new MIT chip is analog. The chip uses ~400 transistors and was fabricated using VLSI processing.

But first please take our new poll:

Analog, whats that?

Given that the world has gone digital, analog devices may be foreign to most of us. But analog dominated the way electronics worked for the first half of last century and were still pretty prominent during the last half.

Nowadays, such devices are used primarily in signal processing, and where streams of data are transformed from one mode to another (serial/deserializers). An analog signal has a theoretically an infinite resolution (Wikipedia), which should make it closer to real life and may be why some stereophiles perfer records to CDs.

Neurons are analog devices

That being said, it’s a treat to see some new analog technology come out that’s better than digital implementations. One would have to say that neural activity is by definition analog and as such, should make simulating brain activity much easier.

The advantage of analog can be seen in that the neural synapse is the connection between two neurons. Information is transferred between the two neurons by the take up of Ions. In the case of the MIT synapse chip, the same sort of process occurs but in this case information flows based on gradients of electronic potential.

In testament to the capabilities of the new synapse chip they were able to resolve a long standing debate in neuro-biology. The question was on how long term potentation (LTP) and long term depression (LTD) which enhances or depresses the information transfer across the synapse was accomplished in real neurons. Previously, it had been postulated that LTP and LTD would depend on two different mechanisms in real cells. But there was one theory that said with a specific type of receptor, both LTP and LTD could be performed in a single way.

MIT researchers were able to configure their synapse-chip to mimic that new receptor and were able to show how LTP and LTD could work with this single receptor in the brain.

Onto the brain

Of course a single synapse is not much considering the brain has 100B neurons each with many 100’s if not 1000’s of synapses. But it’s a start.

Naturally, considering its built out of transistors using CMOS technology, it should follow Moore’s law and after 18 months or so we should have a chip with two synapses on it. Another 40 or so doublings more (~60 years from now in 2071), if Moore’s law holds, we can have a brain-chip with 100B neurons and 100T synapses on it.

Of course, this being a prototype, I suppose with today’s fabrication capable of creating 40M transistors/chip, we may already be able to simulate 100K synapses and 100 neurons. Which means we should have a brain’s level of neurons and synapses in 30 doublings or ~2056.

Analog is better than biological

The other nice thing about analog logic and transistors, is that information processing in the brain-chip should be orders of magnitude faster than the brain’s biological processing. Which is probably even more frightening.

The IBM SyNAPSE chip mentioned earlier was an all digital creation and had two chip cores, one provided “learning synapses” and the other “programmable synapses”. This was probably an attempt to mimic neural processing in digital logic.

The analog brain-chip that MIT has invented, has no such distinction, supplying all synapse functionality in 400 transistors. Nonetheless, any accurate simulation of neural processes can help us to understand how to mimic it better. The fact that we have an analog simulation neural processes should help us improve the digital simulation to more closely match the brain.

—-

Not sure what we should call this chip, it’s certainly not neuromorphic, because it’s a real simulation of analog neural synapses not a digital approximation. I would use synapse- chip but its already in use. I kind of like the brain-chip but that may be stretching it a bit. Maybe the neuron-chip is best for now

Now that we know the date for the singularity, hopefully we can be ready to deal with whatever happens then.

Comments?