IBM using PCM to implement better AI – round 6

Saw a recent article that discussed IBM’s research into new computing architectures that are inspired by brain computational techniques (see A new brain inspired architecture … ). The article reports on research done by IBM R&D into using Phase Change Memory (PCM) technology to implement various versions of computer architectures for AI (see Tutorial: Brain inspired computation using PCM, in the AIP Journal of Applied Physics).

But first please take our new poll:

As you may recall, we have been reporting on IBM Research into different computing architectures to support AI processing for quite awhile now, (see: Parts 1, 2, 3, 4, & 5). In our last post, More power efficient deep learning through IBM and PCM, we reported on a unique hybrid PCM-silicon solution to deep learning computation.

Readers should also be familiar with PCM as well as it’s been discussed at length in a number of our posts (see The end of NAND is near, maybe; The future of data storage is MRAM; and New chip architectures with CPU, storage & sensors …). MRAM, ReRAM and current 3D XPoint seem to be all different forms of PCM (I think).

In the current research, IBM discusses three different approaches to support AI  utilizing PCM devices. All three approaches stem from the physical characteristics of PCM.

(Some) PCM physics

FIG. 2. (a) Phase-change memory is based on the rapid and reversible phase transition of certain types of materials between crystalline and amorphous phases by the application of suitable electrical pulses. (b) Transmission electron micrograph of a mushroom-type PCM device in a RESET state. It can be seen that the bottom electrode is blocked by the amorphous phase.

It turns out that PCM devices have many  characteristics that lend themselves to be useful for specialized computation. PCM devices crystalize and melt in order to change state. The properties associated with melting and crystallization of the PCM media cell can be used to support unique forms of computation. Some of these PCM characteristics include::

  • Analog, not digital memory – PCM devices are, at the core, an analog memory device. We mean that they don’t record just a 0 or 1 (actually resistant or conductive) state, but rather a continuum of values between those two.
  • PCM devices have an accumulation capability –   each PCM cell actually  accumulates a level of activation. This means that one cell can be more or less likely to change state depending on prior activity.
  • PCM devices are noisy – PCM cells arenot perfect recorders of state chang signals  but rather have a well known, random noise which impacts the state level attained, that can be used to introduce randomness into processing.

The other major advantage of PCM devices is that they take a lot less power than a GPU-CPU to work.

Three ways to use PCM for AI learning

FIG. 4. “In-memory computing,” computation is performed in place by exploiting the physical attributes of memory devices organized as a “computational memory” unit. For example, if data A is stored in a computational memory unit and if we would like to perform f(A), then it is not required to bring A to the processing unit. This saves energy and time that would have to be spent in the case of conventional computing system and memory unit. Adapted from Ref. 19.

The Applied Physics article describes three ways to use PCM devices in AI learning. These three include:

  1. Computational storage – which uses the analog capabilities of PCM to perform  arithmetic and learning computations. In a sort of combined compute and storage device.
  2. AI co-processor – which uses PCM devices, in an “all PCM nodes connected to all other PCM nodes” operation that could be used to perform neural network learning. In an AI co-processor there would be multiple all connected PCM modules, each emulating a neural network layer.
  3. Spiking neural networks –  which uses PCM activation accumulation characteristics & inherent randomness to mimic, biological spiking neuron activation.
FIG. 11.
A proposed chip architecture for a co-processor for deep learning based on PCM arrays.28

It’s the last approach that intrigues me.

Spiking neural nets (SNN)

FIG. 12. (a) Schematic illustration of a synaptic connection and the corresponding pre- and post-synaptic neurons. The synaptic connection strengthens or weakens based on the spike activity of these neurons; a process referred to as synaptic plasticity. (b) A well-known plasticity mechanism is spike-time-dependent plasticity (STDP), leading to weight changes that depend on the relative timing between the pre- and post-synaptic neuronal spike activities. Adapted from Ref. 31.

Biological neurons accumulate charge from all input (connected) neurons and when they reach some input threshold, generate an output signal or spike. This spike is then used to start the process with another neuron up stream from it

Biological neurons also exhibit randomness in their threshold-spiking process.

Emulating spiking neurons, n today’s neural nets, takes computation.  Also randomness takes more.

But with PCM SNN, both the spiking process and its randomness, comes from device physics. Using PCM to create SNN seems a logical progression.

PCM as storage, as memory, as compute or all the above

In the storage business, we look at Optane (see our 3D Xpoint post) SSDs as blazingly fast storage. Intel has also announced that they will use 3D Xpoint in a memory form factor which should provide sadly slower, but larger memory devices.

But using PCM for compute, is a radical departure from the von Neumann computer architectures we know and love today. HPE has been discussing another new computing architecture with their memristor technology, but only in prototype form.

It seems IBM, is also prototyping hardware done this path.

Welcome to the next computing revolution.

Photo & Caption Credit(s): Photo and caption from Figure 2 in AIP Journal of Applied Physics article

Photo and caption from Figure 4 in AIP Journal of Applied Physics article

Photo and caption from Figure 11 in AIP Journal of Applied Physics article

Photo and caption from Figure 12 in AIP Journal of Applied Physics article

Flaming Lotus Girls Neuron by SanFranAnnie (cc) (from Flickr)

IBM Research creates PCM synapses – cognitive computing, round 4

Last year we reported on IBM’s progress in taking PCM (phase change memory) and using it to create a new, neuromorphic computing architecture (see Phase Change Memory (PCM) based neuromorphic processors). And earlier we discussed IBM’s (2nd generation), True North chip and IBM’s (1st generation) Synapse Chip.

This past week IBM made another cognitive computing announcement. This time they have taken their neuromorphic technologies another step closer to precise emulation of neurological processing of the brain.

Their research paper was not directly available, but IBM Research has summarized its contents in a short web article with a video (see IBM Scientists imitate the functionality of neurons with Phase-Change device).
Continue reading “IBM Research creates PCM synapses – cognitive computing, round 4”

PCM based neuromorphic processors

Read an interesting article from Register the other day about  IBM’s Almadan Research lab using standard Non-volatile memory devices to implement a neural net. They apparently used 2-PCM (Phase Change Memory) devices to implement a 913 neuron/165K synapse pattern recognition system.

This seems to be another (simpler, cheaper) way to create neuromorphic chips. We’ve written about neuromorphic chips before (see my posts on IBM SyNAPSE, IBM TrueNorth and MIT’s analog neuromorphic chip). The latest TrueNorth chip from IBM uses ~5B transistors and provides 1M neurons with 256M synapses.

But none of the other research I have read actually described the neuromorphic “programming” process at the same level nor provided a “success rate” on a standard AI pattern matching benchmark as IBM has with the PCM device.

PCM based AI

The IBM summary report on the research discusses at length how the pattern recognition neural network (NN) was “trained” and how the 913 neuron/165K synapse NN was able to achieve 82% accuracy on NIST’s handwritten digit training database.

The paper has many impressive graphics. The NN was designed as a 3-layer network and used back propagation for its learning process. They show how the back propagation training was used to determine the weights.

The other interesting thing was they analyzed how hardware faults (stuck-ats, dead conductors, number of resets, etc.) and different learning parameters (stochasticity, learning batch size, variable maxima, etc.) impacted NN effectiveness on the test database.

Turns out the NN could tolerate ~30% dead conductors (in the Synapses) or 20% of stuck-at’s in the PCM memory and still generate pretty good accuracy on the training event. Not sure I understand the learning parameters but they varied batch size from 1 to 10 and this didn’t seem to impact NN accuracy whatsoever.

Which PCM was used?

In trying to understand which PCM devices were in use, the only information available said it was a 180nm device. According to a 2012 Flash Memory Summit Report report on alternative NVM technologies, 180nm PCM devices have been around since 2004, a 90nm PCM device was introduced in 2008 with 128Mb and even newer PCM devices at 45nm were introduced in 2010 with 1Gb of memory.  So I would conclude that the 180nm PCM device supported ~16 to 32Mb.

What can we do with todays PCM technology?

With the industry supporting a doubling of transistors/chip every 2 years a PCM device in 2014 should have 4X the transistors of the 45nm, 2010 device above and ~4-8X the memory. So today we should be seeing 4-16Gb PCM chips at ~22nm. Given this, current PCM technology should support 32-64X more neurons than the 180nm devices or ~29K to ~58K neurons or so

Unclear what technology was used for the  ‘synapses’  but based on the time frame for the PCM devices, this should also be able to scale up by a factor of 32-64X or between ~5.3M to ~10.6M synapses.

Still this doesn’t approach TrueNorth’s Neurons/Synapse levels, but it’s close. But then 2 4-16Gb PCMs probably don’t cost nearly as much to purchase as TrueNorth costs to create.

The programing model for the TrueNorth/Synapse chips doesn’t appear to be neural network like. So perhaps another advantage of the PCM model of hardware based AI is that you can use standard, well known NN programming methods to train and simulate it.

So, PCM based neural networks seem an easier way to create hardware based AI. Not sure this will ever match Neuron/Synapse levels that the dedicated, special purpose neuromorphic chips in development can accomplish but in the end, they both are hardware based AI that can support better pattern recognition.

Using commodity PCM devices any organization with suitable technological skills should be able to create a hardware based NN that operates much faster than any NN software simulation. And if PCM technology starts to obtain market acceptance, the funding available to advance PCMs will vastly exceed that which IBM/MIT can devote to TrueNorth and its descendants.

Now, what is HP up to with their memristor technology and The Machine?

Photo Credits: Neurons by Leandro Agrò

12 atoms per bit vs 35 bits per electron

Shows 6 atom pairs in a row, with coloration of blue for interstitial space and yellow for external facets of the atom
from Technology Review Article

Read a story today in Technology Review on Magnetic Memory Miniaturized to Just 12 Atoms by a team at  IBM Research that created a (spin) magnetic “storage device” that used 12 iron atoms  to record a single bit (near absolute zero and just for a few hours).  The article said it was about 100X  denser than the previous magnetic storage record.

Holographic storage beats that

Wikipedia’s (soon to go dark for 24hrs) article on Memory Storage Density mentioned research at Stanford that in 2009 created an electronic quantum holographic device that stored 35 bits/electron using a sheet of copper atoms to record the letters S and U.

The Wikipedia article went on to equate 35bits/electron to ~3 Exabytes[10**18 bytes]/In**2.  (Although, how Wikipedia was able to convert from bits/electron to EB/in**2 I don’t know but I’ll accept it as a given)

Now an iron atom has 26 electrons and copper has 29 electrons.  If 35 bits/electron is 3 EB/in**2 (or ~30Eb/in**2), then 1 bit per 12 iron atoms (or 12*26=312 electrons) should be 0.0032bits/electron or ~275TB/in**2 (or ~2.8Pb/in**2).   Not quite to the scale of the holographic device but interesting nonetheless.

What can that do for my desktop?

Given that today’s recording head/media has demonstrated ~3.3Tb/in**2 (see our Disk drive density multiplying by 6X post), the 12 atoms per bit  is a significant advance for (spin) magnetic storage.

With today’s disk industry shipping 1TB/disk platters using ~0.6Tb/in**2 (see our Disk capacity growing out of sight post), these technologies, if implemented in a disk form factor, could store from 4.6PB to 50EB in a 3.5″ form factor storage device.

So there is a limit to (spin) magnetic storage and it’s about 11000X larger than holographic storage.   Once again holographic storage proves it can significantly store more data than magnetic storage if only it could be commercialized. (Probably a subject to cover in a future post.)

~~~~

I don’t know about you but 4.6PB drive is probably more than enough storage for my lifetime and then some.  But then again those new 4K High Definition videos, may take up a lot more space than my (low definition) DVD collection.

Comments?

 


Top 10 blog posts for 2011

Merry Christmas! Buon Natale! Frohe Weihnachten! by Jakob Montrasio (cc) (from Flickr)
Merry Christmas! Buon Natale! Frohe Weihnachten! by Jakob Montrasio (cc) (from Flickr)

Happy Holidays.

I ranked my blog posts using a ratio of hits to post age and have identified with the top 10 most popular posts for 2011 (so far):

  1. Vsphere 5 storage enhancements – We discuss some of the more interesting storage oriented Vsphere 5 announcements that included a new DAS storage appliance, host based (software) replication service, storage DRS and other capabilities.
  2. Intel’s 320 SSD 8MB problem – We discuss a recent bug (since fixed) which left the Intel 320 SSD drive with only 8MB of storage, we presumed the bug was in the load leveling logic/block mapping logic of the drive controller.
  3. Analog neural simulation or digital neuromorphic computing vs AI – We talk about recent advances to providing both analog (MIT) and digital versions (IBM) of neural computation vs. the more traditional AI approaches to intelligent computing.
  4. Potential data loss using SSD RAID groups – We note the possibility for catastrophic data loss when using equally used SSDs in RAID groups.
  5. How has IBM researched changed – We examine some of the changes at IBM research that have occurred over the past 50 years or so which have led to much more productive research results.
  6. HDS buys BlueArc – We consider the implications of the recent acquisition of BlueArc storage systems by their major OEM partner, Hitachi Data Systems.
  7. OCZ’s latest Z-Drive R4 series PCIe SSD – Not sure why this got so much traffic but its OCZ’s latest PCIe SSD device with 500K IOPS performance.
  8. Will Hybrid drives conquer enterprise storage – We discuss the unlikely possibility that Hybrid drives (NAND/Flash cache and disk drive in the same device) will be used as backend storage for enterprise storage systems.
  9. SNIA CDMI plugfest for cloud storage and cloud data services – We were invited to sit in on a recent SNIA Cloud Data Management Initiative (CDMI) plugfest and talk to some of the participants about where CDMI is heading and what it means for cloud storage and data services.
  10. Is FC dead?! – What with the introduction of 40GbE FCoE just around the corner, 10GbE cards coming down in price and Brocade’s poor YoY quarterly storage revenue results, we discuss the potential implications on FC infrastructure and its future in the data center.

~~~~

I would have to say #3, 5, and 9 were the most fun for me to do. Not sure why, but #10 probably generated the most twitter traffic. Why the others were so popular is hard for me to understand.

Comments?

Analog neural simulation or digital neuromorphic computing vs. AI

DSC_9051 by Greg Gorman (cc) (from Flickr)
DSC_9051 by Greg Gorman (cc) (from Flickr)

At last week’s IBM Smarter Computing Forum we had a session on Watson, IBM’s artificial intelligence machine which won Jeopardy last year and another session on IBM sponsored research helping to create the SyNAPSE digital neuromorphic computing chip.

Putting “Watson to work”

Apparently, IBM is taking Watson’s smarts and applying it to health care and other information intensive verticals (intelligence, financial services, etc.).  At the conference IBM had Monoj Saxena, senior director Watson Solutions and Dr. Herbert Chase, a professor of clinical medicine a senior medical professor from Columbia School of Medicine come up and talk about Watson in healthcare.

Mr. Saxena’s contention and Dr. Chase concurred that Watson can play at important part in helping healthcare apply current knowledge.  Watson’s core capability is the ability to ingest and make sense of information and then be able to apply that knowledge.  In this case, using medical research knowledge to help diagnose patient problems.

Dr. Chase had been struck at a young age by one patient that had what appeared to be an incurable and unusual disease.  He was an intern at the time and was given the task to diagnose her issue.  Eventually, he was able to provide a proper diagnosis but it irked him that it took so long and so many doctors to get there.

So as a test of Watson’s capabilities, Dr. Chase input this person’s medical symptoms into Watson and it was able to provide a list of potential diagnosises.  Sure enough, Watson did list the medical problem the patient actually had those many years ago.

At the time, I mentioned to another analyst that Watson seemed to represent the end game of artificial intelligence. Almost a final culmination and accumulation of 60 years in AI research, creating a comprehensive service offering for a number of verticals.

That’s all great, but it’s time to move on.

SyNAPSE is born

In the next session IBM had Dr. Dharmenrad Modta come up and talk about their latest SyNAPSE chip, a new neueromorphic digital silicon chip that mimicked the brain to model neurological processes.

We are quite a ways away from productization of the SyNAPSE chip.  Dr. Modha showed us a real-time exhibition of the SyNAPSE chip in action (connected to his laptop) with it interpreting a handwritten numeral into it’s numerical representation.  I would say it’s a bit early yet, to see putting “SyNAPSE to work”.

Digital vs. analog redux

I have written about the SyNAPSE neuromorphic chip and a competing technology, the direct analog simulation of neural processes before (see IBM introduces SyNAPSE chip and MIT builds analog synapse chip).  In the MIT brain chip post I discussed the differences between the two approaches focusing on the digital vs. analog divide.

It seems that IBM research is betting on digital neuromorphic computing.  At the Forum last week, I had a discussion with a senior exec in IBM’s STG group, who said that the history of electronic computing over the last half century or so has been mostly about the migration from analog to digital technologies.

Yes, but that doesn’t mean that digital is better, just more easy to produce.

On that topic, I asked the Dr. Modha, on what he thought of MIT’s analog brain chip.  He said

  • MIT’s brain chip was built on 180nm fabrication processes whereas his is on 45nm or over 3X finer. Perhaps the fact that IBM has some of the best fab’s in the world may have something to do with this.
  • The digital SyNAPSE chip can potentially operate at 5.67Ghz and will be absolutely faster than any analog brain simulation.   Yes, but each analog simulated neuron is actually one of a parallel processing complex and with a 1’000 or a million of them operating even 1000X or million X slower it’s should be able to keep up.
  • The digital SyNAPSE chip was carefully designed to be complementary to current digital technology.   As I look at IT today we are surrounded by analog devices that interface very well with the digital computing environment, so I don’t think this will be a problem when we are ready to use it.

Analog still surrounds us and defines the real world.  Someday the computing industry will awaken from it’s digital hobby horse and somehow see the truth in that statement.

~~~~

In any case, if it takes another 60 years to productize one of these technologies then the Singularity is farther away than I thought, somewhere around 2071 should about do it.

Comments?

How has IBM research changed?

20111207-204420.jpg
IBM Neuromorphic Chip (from Wired story)

What does Watson, Neuromorphic chips and race track memory have in common. They have all emerged out of IBM research labs.

I have been wondering for some time now how it is that a company known for it’s cutting edge research but lack of product breakthrough has transformed itself into an innovation machine.

There has been a sea change in the research at IBM that is behind the recent productization of tecnology.

Talking the past couple of days with various IBMers at STGs Smarter Computing Forum, I have formulate a preliminary hypothesis.

At first I heard that there was a change in the way research is reviewed for product potential. Nowadays, it almost takes a business case for research projects to be approved and funded. And the business case needs to contain a plan as to how it will eventually reach profitability for any project.

In the past it was often said that IBM invented a lot of technology but productized only a little of it. Much of their technology would emerge in other peoples products and IBM would not recieve anything for their efforts (other than some belated recognition for their research contribution).

Nowadays, its more likely that research not productized by IBM is at least licensed from them after they have patented the crucial technologies that underpin the advance. But it’s just as likely if it has something to do with IT, the project will end up as a product.

One executive at STG sees three phases to IBM research spanning the last 50 years or so.

Phase I The ivory tower:

IBM research during the Ivory Tower Era looked a lot like research universities but without the tenure of true professorships. Much of the research of this era was in materials and pure mathematics.

I suppose one example of this period was Mandlebrot and fractals. It probably had a lot of applications but little of them ended up in IBM products and mostly it advanced the theory and practice of pure mathematics/systems science.

Such research had little to do with the problems of IT or IBM’s customers. The fact that it created pretty pictures and a way of seeing nature in a different light was an advance to mankind but it didn’t have much if any of an impact to IBM’s bottom line.

Phase II Joint project teams

In IBM research’s phase II, the decision process on which research to move forward on now had people from not just IBM research but also product division people. At least now there could be a discussion across IBM’s various divisions on how the technology could enhance customer outcomes. I am certain profitability wasn’t often discussed but at least it was no longer purposefully ignored.

I suppose over time these discussions became more grounded in fact and business cases rather than just the belief in the value of the research for research sake. Technological roadmaps and projects were now looked at from how well they could impact customer outcomes and how such technology enabled new products and solutions to come to market.

Phase III Researchers and product people intermingle

The final step in IBM transformation of research involved the human element. People started moving around.

Researchers were assigned to the field and to product groups and product people were brought into the research organization. By doing this, ideas could cross fertilize, applications could be envisioned and the last finishing touches needed by new technology could be envisioned, funded and implemented. This probably led to the most productive transition of researchers into product developers.

On the flip side when researchers returned back from their multi-year product/field assignments they brought a new found appreciation of problems encountered in the real world. That combined with their in depth understanding of where technology could go helped show the path that could take research projects into new more fruitful (at least to IBM customers) arenas. This movement of people provided the final piece in grounding research in areas that could solve customer problems.

In the end, many research projects at IBM may fail but if they succeed they have the potential to make change IT as we know it.

I heard today that there were 700 to 800 projects in IBM research today if any of them have the potential we see in the products shown today like Watson in Healthcare and Neuromorphic chips, exciting times are ahead.

Graphene Flash Memory

Model of graphene structure by CORE-Materials (cc) (from Flickr)
Model of graphene structure by CORE-Materials (cc) (from Flickr)

I have been thinking about writing a post on “Is Flash Dead?” for a while now.  Well at least since talking with IBM research a couple of weeks ago on their new memory technologies that they have been working on.

But then this new Technology Review article came out  discussing recent research on Graphene Flash Memory.

Problems with NAND Flash

As we have discussed before, NAND flash memory has some serious limitations as it’s shrunk below 11nm or so. For instance, write endurance plummets, memory retention times are reduced and cell-to-cell interactions increase significantly.

These issues are not that much of a problem with today’s flash at 20nm or so. But to continue to follow Moore’s law and drop the price of NAND flash on a $/Gb basis, it will need to shrink below 16nm.  At that point or soon thereafter, current NAND flash technology will no longer be viable.

Other non-NAND based non-volatile memories

That’s why IBM and others are working on different types of non-volatile storage such as PCM (phase change memory), MRAM (magnetic RAM) , FeRAM (Ferroelectric RAM) and others.  All these have the potential to improve general reliability characteristics beyond where NAND Flash is today and where it will be tomorrow as chip geometries shrink even more.

IBM seems to be betting on MRAM or racetrack memory technology because it has near DRAM performance, extremely low power and can store far more data in the same amount of space. It sort of reminds me of delay line memory where bits were stored on a wire line and read out as they passed across a read/write circuit. Only in the case of racetrack memory, the delay line is etched in a silicon circuit indentation with the read/write head implemented at the bottom of the cleft.

Graphene as the solution

Then along comes Graphene based Flash Memory.  Graphene can apparently be used as a substitute for the storage layer in a flash memory cell.  According to the report, the graphene stores data using less power and with better stability over time.  Both crucial problems with NAND flash memory as it’s shrunk below today’s geometries.  The research is being done at UCLA and is supported by Samsung, a significant manufacturer of NAND flash memory today.

Current demonstration chips are much larger than would be useful.  However, given graphene’s material characteristics, the researchers believe there should be no problem scaling it down below where NAND Flash would start exhibiting problems.  The next iteration of research will be to see if their scaling assumptions can hold when device geometry is shrunk.

The other problem is getting graphene, a new material, into current chip production.  Current materials used in chip manufacturing lines are very tightly controlled and  building hybrid graphene devices to the same level of manufacturing tolerances and control will take some effort.

So don’t look for Graphene Flash Memory to show up anytime soon. But given that 16nm chip geometries are only a couple of years out and 11nm, a couple of years beyond that, it wouldn’t surprise me to see Graphene based Flash Memory introduced in about 4 years or so.  Then again, I am no materials expert, so don’t hold me to this timeline.

 

—-

Comments?