AI reaches a crossroads

There’s been a lot of talk on the extendability of current AI this past week and it appears that while we may have a good deal of runway left on the machine learning/deep learning/pattern recognition, there’s something ahead that we don’t understand.

Let’s start with MIT IQ (Intelligence Quest),  which is essentially a moon shot project to understand and replicate human intelligence. The Quest is attempting to answer “How does human intelligence work, in engineering terms? And how can we use that deep grasp of human intelligence to build wiser and more useful machines, to the benefit of society?“.

Where’s HAL?

The problem with AI’s deep learning today is that it’s fine for pattern recognition, but it doesn’t appear to develop any basic understanding of the world beyond recognition.

Some AI scientists concede that there’s more to human/mamalian intelligence than just pattern recognition expertise, while others’ disagree. MIT IQ is trying to determine, what’s beyond pattern recognition.

There’s a great article in Wired about the limits of deep learning,  Greedy, Brittle, Opaque and Shallow: the Downsides to Deep Learning. The article says deep learning is greedy because it needs lots of data (training sets) to work, it’s brittle because step one inch beyond what’s it’s been trained  to do and it falls down, and it’s opaque because there’s no way to understand how it came to label something the way it did. Deep learning is great for pattern recognition of known patterns but outside of that, there must be more to intelligence.

The limited steps using unsupervised learning don’t show a lot of hope, yet

“Pattern recognition” all the way down…

There’s a case to be made that all mammalian intelligence is based on hierarchies of pattern recognition capabilities.

That is, at a bottom level  human intelligence consists of pattern recognition, such as vision, hearing, touch, balance, taste, etc. systems which are just sophisticated pattern recognition algorithms that label what we are hearing as Bethovan’s Ninth Symphony, tasting as grandma’s pasta sauce, and seeing as the Grand Canyon.

Then, at the next level there’s another pattern recognition(-like) system that takes all these labels and somehow recognizes this scene as danger, romance, school,  etc.

Then, at the next level, human intelligence just looks up what to do in this scene.  Almost as if we have a defined list of action templates that are what we do when we are in danger (fight or flight), in romance (kiss, cuddle or ?), in school (answer, study, view, hide, …), etc.  Almost like a simple lookup table with procedural logic behind each entry

One question for this view is how are these action templates defined and  how many are there. If, as it seems, there’s almost an infinite number of them, how are they selected (some finer level of granularity in scene labeling – romance but only flirting …).

No, it’s not …

But to other scientists, there appears to be more than just pattern recognition(-like) algorithms and lookup and act algorithms, going on inside our brains.

For example, once I interpret a scene surrounding me as in danger, romance, school, etc.,  I believe I start to generate possible action lists which I could take in this domain, and then somehow I select the one to do which makes the most sense in this situation or rather gets me closer to my current goal (whatever that is) in this situation.

This is beyond just procedural logic and involves some sort of memory system, action generative system, goal generative/recollection system, weighing of possible action scripts, etc.

And what to make of the brain’s seemingly infinite capability to explain itself…

Baby intelligence

Most babies understand their parents language(s) and learn to crawl within months after birth. But they haven’t listened to thousands of hours of people talking or crawled thousands of miles.  And yet, deep learning requires even more learning sets in order to label language properly or  learning how to crawl on four appendages. And of course, understanding language and speaking it are two different capabilities. Ditto for crawling and walking.

How does a baby learn to recognize these patterns without TB of data and millions of reinforcements (“Smile for Mommy”, say “Daddy”). And what to make of the, seemingly impossible to contain wanderlust, of any baby given free reign of an area.

These questions are just scratching the surface in what it really means to engineer human intelligence.

~~~~

MIT IQ is one attempt to try to answer the question that: assuming we understand how to pattern recognition can be made to work well on today’s computers what else do we need to do to build a more general purpose intelligence.

There are obvious ethical questions on whether we want to engineer a human level of intelligence (see my Existential risks… post). Our main concern is what it does (to humanity) once we achieve it.

But assuming we can somehow contain it for the benefit of humanity, we ought to take another look at just what it entails.

 

Photo Credits:  Tech trends for 2017: more AI …., the Next Silicon Valley website. 

HAL from 2001 a Space Odyssey 

Design software test labeling… 

Exploration in toddlers…, Science Daily website

Industrial revolutions, deep learning & NVIDIA’s 3U AI super computer @ FMS 2017

I was at Flash Memory Summit this past week and besides the fire on the exhibit floor, there was a interesting keynote by Andy Steinbach, PhD from NVIDIA on “Deep Learning: Extracting Maximum Knowledge from Big Data using Big Compute”.  The title was a bit much but his session was great.

2012 the dawn of the 4th industrial revolution

Steinbach started off describing AI, machine learning and deep learning as another industrial revolution, similar to the emergence of steam engines, mass production and automation of production. All of which have changed the world for the better.

Steinbach said that AI is been gestating for 50 years now but in 2012 there was a step change in it’s capabilities.

Prior to 2012 hand coded AI image recognition algorithms were able to achieve about a 74%  image recognition level but in 2012, a deep learning algorithm achieved almost 85%, in one year.

And since then it’s been on a linear trend of improvements such that in 2015, current deep learning algorithms are better than human image recognition. Similar step function improvements were seen in speech recognition as well around 2012.

What drove the improvement?

Machine and deep learning depend on convolutional neural networks. These are layers of connected nodes. There are typically an input layer and output layer and N number of internal layers in a network. The connection weights between nodes control the response of the network.

Todays image recognition convolutional networks can have ~10 layers, billions of parameters, take ~30 Exaflops to train, using 10M images and took days to weeks to train.

Image recognition covolutional neural networks end up modeling the human visual cortex which has neurons to recognize edges and other specialized characteristics of a visual field.

The other thing that happened was that convolutional neural nets were translated to execute on GPUs in 2011. Neural networks had been around in AI since almost the very beginning but their computational complexity made them impossible to use effectively until recently. GPUs with 1000s of cores all able to double precision floating point operations made these networks now much more feasible.

Deep learning training of a network takes place through optimization of the node connections weights. This is done via a back propagation algorithm that was invented in the 1980’s.  Back propagation typically depends on “supervised learning” which adjust the weights of the connections between nodes to come closer to the correct answer, like recognizing Sarah in an image.

Deep learning today

Steinbach showed multiple examples of deep learning algorithms such as:

  • Mortgage prepayment predictor system which takes information about a mortgagee, location, and other data and predicts whether they will pre-pay their mortgage.
  • Car automation image recognition system which recognizes people, cars, lanes, road surfaces, obstacles and just about anything else in front of a car traveling a road.
  • X-ray diagnostic system that can diagnose diseases present in people from the X-ray images.

As far as I know all these algorithms use supervised learning and back propagation to train a convolutional network.

Steinbach did show an example of “un-supervised learning” which essentially was fed a bunch of images and did clustering analysis on them.  Not sure what the back propagation tried to optimize but the system was used to cluster the images in the set. It was able to identify one cluster of just military aircraft images out of the data.

The other advantage of convolutional neural networks is that they can be reused. E.g. the X-ray diagnostic system above used an image recognition neural net as a starting point and then ran it against a supervised set of X-rays with doctor provided diagnoses.

Another advantage of deep learning is that it can handle any number of dimensions. Mathematical optimization algorithms can handle a relatively few dimensions but deep learning can handle any number of dimensions.  The number of input dimensions, the number of nodes in each layer and number of layers in your network are only limited by computational power.

NVIDIA’s DGX a deep learning super computer

At the end of Stienbach’s talk he mentioned the DGX appliance designed by NVIDIA for AI research.

The appliance has 8 state of the art NVIDIA GPUs, connected over a high speed NVLink with anywhere from ~29K to ~41K cores depending on GPU selected, and is capable of 170 to 960 Flops (FP16).

Steinbach said this single 3u appliance would have been rated the number one supercomputer in 2004 beating out a building full of servers. If you were to connect 13 (I think) DGX’s together, you would qualify to be on the top 500 super computers in the world.

~~~~

Comments?

Photo credit(s): Steinbach’s “Deep Learning: Extracting Maximum Knowledge from Big Data using Big Compute” presentation at FMS 2017.