Learning Machine Learning – part 2

In Learning Machine Learning – part 1, we covered AWS and GCP tutorials on machine learning within each of their clouds. In part 2 we cover Microsoft tutorial(s) on machine learning in Azure.

I found Machine Learning Jump Start in Microsoft Visual Academy with instructors, Buck Woody and Seayoung Rhee. This is a series of 4 video tutorials on Azure ML Studio. ML studio seems similar to AWS SageMaker as it’s a framework to perform machine learning.

Azure (and probably AWS & GCP) have a number of methods to perform machine learning. ML studio happens to be the one that I found but there are many others worth examining.

Azure’s ML Studio tutorial videos were a better than AWS but not as good as GCP IMHO for learning machine learning.  There are four videos in the series. I watched  the first (~45 minutes), the second  (~45 minutes) and most of the third (only 25 of ~45 minutes).

Video 1 Concepts and setting up a ML Studio account

In the first video, the instructors took a long time to get going and then when  got someplace interesting, it was all play acting (human as a machine learner) to teach concepts.

The tutorials do distinguish between Supervised learning and Unsupervised learning. Both of which can apply to prediction or classification types of problems or outcomes. These are discussed as classic machine learning characteristics.

 

In the last 1/3 of the first video they discuss Azure ML Studio. It provides a common place to work and collaborate across team members. It also provides a graphical approach to machine learning. ML Studio also supports a programmable API, but I never got to that section in my viewing.

Some Azure ML Studio strengths:

  1. It provides industry recognized  data sets and data science algorithms that can be used as a black box, such as recommendation engines.
  2. It allows you to publish and consume machine learning solutions.

On the Azure portal there’s a machine learning studio icon (it’s now buried under the 100+ services link, in the AI + Machine Learning section). You use this to create a new ML studio workspace.

Inside a workspace you can use Azure ML studio services.  In the workspace you can review all your experiments (these are algorithms or predictive models being worked). 

In the Experiments page you can create a new experiment which is sort of a graphical workflow of the machine learning task.

There you will find a list of Azure sample data sets and sample algorithms that can be used in your experiment. The first video didn’t go into much detail on any of this other than showing you how to get started and create a ML studio workspace.

Video 2 how to use ML studio

Video 2 takes your ML studio workspace and runs a rudimentary experiment with it. In this video they walk you through selecting a data set, selecting algorithms to use and how to connect them into a machine learning workflow.

Creating an ML Studio experiment is almost like flowcharting your workflow. You select the data you want and drop it into the workflow. Next select an extraction engine you want to use and drop it into the work flow and connect it to the data. Then. you identify what you want to do with the data (like training) and drop that algorithm into the workflow and so on.  In the end you have defined a sequence of actions to perform on data.

In their example, dataset they use a user movie ratings dataset. They connect this to a bayesian learning model and to a IMDB database to extract movie titles. The tutorial experiment is a movie recommendation engine.Although it wasn’t a neural net many of the same techniques apply.

 

ML Studio uses an intuitive graphical approach to defining a machine learning workflow.

Video 3 publishing your ML Studio web service

Video 3 shows you how to publish (on Azure’s Marketplace) the recommendation engine created in video 2 as an OData web service.

I stopped watching the 3rd video after about 25 minutes as it was setting up various aspect of the OData web service to be deployed on Azure marketplace.

Using Azure ML studio seemed pretty straightforward. But it was much more data science/data analytics activity than neural network training.

The Azure MVA ML Studio tutorial was created in 2014 so some of the concepts are a bit dated but most still apply.

Looking today on the Azure Portal, I was still able to find the ML studio workspaces under one of the 10 AI + Machine Learning services.  Again I would have to say the GCP tutorial was a better fit for what I wanted which was how do I create a neural  net and get it trained.

Other ML approaches under Azure

There are other Azure approaches to machine learning and tutorials that support them. For example, there’s a quick start tutorial to understand how to use Python and Jupyter notebooks under Azure, which is probably closer to the neural net training in GCP.

I found myself skipping ahead a lot in video 1 as it was mainly about concepts and not much technical detail. Video 2 was a good intro into ML studio and Video 3 showed you how to publish a ML studio web service in Azure but it was more details than I wanted to know. I never got to video 4, which probably talked about ML Studio’s programable API.

If I had to do it over again, I probably would have viewed the quick start tutorial with Python and Jupyter notebooks, which sounded more like the GCP tutorials in the part 1 post.

On the other hand, Azure ML Studio tutorials supplied a good complement to the GCP tutorial, as a different (more graphical) way to do ML. It would probably be worthwhile to view before taking the AWS Sagemaker tutorials as it’s a bit higher level and quicker introduction into the workflow of AI and machine learning.

Comments?

Picture credit(s): Screen shots of Videos 1, 2 and 3 in the MVA series, (c) Microsoft 

AI reaches a crossroads

There’s been a lot of talk on the extendability of current AI this past week and it appears that while we may have a good deal of runway left on the machine learning/deep learning/pattern recognition, there’s something ahead that we don’t understand.

Let’s start with MIT IQ (Intelligence Quest),  which is essentially a moon shot project to understand and replicate human intelligence. The Quest is attempting to answer “How does human intelligence work, in engineering terms? And how can we use that deep grasp of human intelligence to build wiser and more useful machines, to the benefit of society?“.

Where’s HAL?

The problem with AI’s deep learning today is that it’s fine for pattern recognition, but it doesn’t appear to develop any basic understanding of the world beyond recognition.

Some AI scientists concede that there’s more to human/mamalian intelligence than just pattern recognition expertise, while others’ disagree. MIT IQ is trying to determine, what’s beyond pattern recognition.

There’s a great article in Wired about the limits of deep learning,  Greedy, Brittle, Opaque and Shallow: the Downsides to Deep Learning. The article says deep learning is greedy because it needs lots of data (training sets) to work, it’s brittle because step one inch beyond what’s it’s been trained  to do and it falls down, and it’s opaque because there’s no way to understand how it came to label something the way it did. Deep learning is great for pattern recognition of known patterns but outside of that, there must be more to intelligence.

The limited steps using unsupervised learning don’t show a lot of hope, yet

“Pattern recognition” all the way down…

There’s a case to be made that all mammalian intelligence is based on hierarchies of pattern recognition capabilities.

That is, at a bottom level  human intelligence consists of pattern recognition, such as vision, hearing, touch, balance, taste, etc. systems which are just sophisticated pattern recognition algorithms that label what we are hearing as Bethovan’s Ninth Symphony, tasting as grandma’s pasta sauce, and seeing as the Grand Canyon.

Then, at the next level there’s another pattern recognition(-like) system that takes all these labels and somehow recognizes this scene as danger, romance, school,  etc.

Then, at the next level, human intelligence just looks up what to do in this scene.  Almost as if we have a defined list of action templates that are what we do when we are in danger (fight or flight), in romance (kiss, cuddle or ?), in school (answer, study, view, hide, …), etc.  Almost like a simple lookup table with procedural logic behind each entry

One question for this view is how are these action templates defined and  how many are there. If, as it seems, there’s almost an infinite number of them, how are they selected (some finer level of granularity in scene labeling – romance but only flirting …).

No, it’s not …

But to other scientists, there appears to be more than just pattern recognition(-like) algorithms and lookup and act algorithms, going on inside our brains.

For example, once I interpret a scene surrounding me as in danger, romance, school, etc.,  I believe I start to generate possible action lists which I could take in this domain, and then somehow I select the one to do which makes the most sense in this situation or rather gets me closer to my current goal (whatever that is) in this situation.

This is beyond just procedural logic and involves some sort of memory system, action generative system, goal generative/recollection system, weighing of possible action scripts, etc.

And what to make of the brain’s seemingly infinite capability to explain itself…

Baby intelligence

Most babies understand their parents language(s) and learn to crawl within months after birth. But they haven’t listened to thousands of hours of people talking or crawled thousands of miles.  And yet, deep learning requires even more learning sets in order to label language properly or  learning how to crawl on four appendages. And of course, understanding language and speaking it are two different capabilities. Ditto for crawling and walking.

How does a baby learn to recognize these patterns without TB of data and millions of reinforcements (“Smile for Mommy”, say “Daddy”). And what to make of the, seemingly impossible to contain wanderlust, of any baby given free reign of an area.

These questions are just scratching the surface in what it really means to engineer human intelligence.

~~~~

MIT IQ is one attempt to try to answer the question that: assuming we understand how to pattern recognition can be made to work well on today’s computers what else do we need to do to build a more general purpose intelligence.

There are obvious ethical questions on whether we want to engineer a human level of intelligence (see my Existential risks… post). Our main concern is what it does (to humanity) once we achieve it.

But assuming we can somehow contain it for the benefit of humanity, we ought to take another look at just what it entails.

 

Photo Credits:  Tech trends for 2017: more AI …., the Next Silicon Valley website. 

HAL from 2001 a Space Odyssey 

Design software test labeling… 

Exploration in toddlers…, Science Daily website

IBM’s next generation, TrueNorth neuromorphic chip

Ok, I admit it, besides being a storage nut I also have an enduring interest in AI. And as the technology of more sophisticated neuromorphic chips starts to emerge it seems to me to herald a whole new class of AI capabilities coming online. I suppose it’s both a bit frightening as well as exciting which is why it interests me so.

IBM announced a new version of their neuromorphic chip line, called TrueNorth with +5B transistors and the equivalent of ~1M neurons. There were a number of articles on this yesterday but the one I found most interesting was in MIT Technical Review, IBM’s new brainlike chip processes data the way your brain does, (based on a Journal Science article requires login, A million spiking neuron integrated circuit with a scaleable communications network and interface).  We discussed an earlier generation of their SyNAPSE chip in a previous post (see my IBM research introduces SyNAPSE chip post).

But first please take our new poll:

How does TrueNorth compare to the previous chip?

The previous generation SyNAPSE chip had a multi-mode approach which used  65K “learning synapses” together with ~256K “programming synapses”. Their current generation, TrueNorth chip has 256M “configurable synapses” and 1M “programmable spiking neurons”.  So the current chip has quadrupled the previous chips “programmable synapses” and multiplied the “configurable synapses” by a factor of a 1000.

Not sure why the configurable synapses went up so high but it could be an aspect of connectivity, something akin to what happens to a “complete graph” which has a direct edge connection to every node in the graph. In a complete graph if you have N nodes then the number of edges is given as [N*(N-1)]/2, which for 1M nodes would be ~500M edges. So it must not be a complete graph, but it’s “close to complete” with 1/2 the number of edges.

Analog vs. Digital?

When last I talked with IBM on their earlier version chip I wondered why they used digital logic to create it rather than analog. They said to be able to better follow along the technology curve of normal chip electronics digital was the way to go.

It seemed to me at the time that if you really  wanted to simulate a brains neural processing then you would want to use an analog approach and this should use much less power. I wrote a couple of posts on the subject, one of which was on MIT’s analog neuromorphic chip (see my MIT builds analog neuromorphic chip post) and the other was on why analog made more sense than digital technology for neuromorphic computation (see my Analog neural simulation or Digital neuromorphic computing vs. AI post).

The funny thing is that IBM’s TrueNorth chip uses a lot less power (1000X, milliwatts vs watts) than normal CMOS chips in e use today. Not sure why this would be the case with digital logic but if this is true maybe there’s more of a potential to utilize these sorts of chips in wider applications beyond just traditional AI domains.

How do you program it?

I would really like to get a deeper look at the specs for TrueNorth and its programming model.  But there was a conference last year where IBM presented three technical papers on TrueNorth architecture and programming capabilities (see MIT Technical Report: IBM scientists show blueprints for brain like computing).

Apparently the 1M programming spike neurons are organized into blocks of 256 neurons each (with a prodigious amount of “configurable” synapses as well). These seem equivalent to what I would call a computational unit. One programs these blockss with “corelets” which map out the neural activity that the 256-neuron blocks can perform. Also these corelets “programs” can be linked together or one be subsumed within another sort of like subroutines.  IBM as of last year had a library of 150 corelets which do stuff like detect visual artifacts, motion in a visual image, detect color, etc.

Scale-out neuromorphic chips?

The abstract of the Journal Science paper talked specifically about a communications network interface that allows the TrueNorth chips to be “tiled in two dimensions” to some arbitrary size. So it is apparent that with the TrueNorth design, IBM has somehow extended a within chip block interface that allows corelets to call one another, to go off chip as well. With this capability they have created a scale-out model with the TrueNorth chip.

Unclear why they felt it had to go only two dimensional rather than three but, it seems to mimic the sort of cortex layer connections we have in our brains today. But even with only two dimensional scaling there are all sorts of interesting topologies that are possible.

There doesn’t appear to be any theoretical limit to the number of chips that can be connected in this fashion but I would suppose they would all need to be on a single board or at least “close” together because there’s some sort of time frame that couldn’t be exceeded for propagation delay, i.e., the time it takes for a spike to transverse from one chip to the farthest chip in the chain couldn’t exceed say 10msec. or so.

So how close are we to brain level computations?

In one of my previous post I reported Wikipedia stating that  a typical brain has 86B neurons with between 100M and 500M synapses. I was able to find the 86B number reference today but couldn’t find the 100M to 500M synapses quote again.  However, if these numbers are close to the truth, the ratio between human neurons and synapses is much less in a human brain than in the TrueNorth chip. And TrueNorth would need about 86,000 chips connected together to match the neuronal computation of a human brain.

I suppose the excess synapses in the TrueNorth chip is due to the fact that electronic connection have to be fixed in place for a neuron to neuron connection to exist. Whereas in the brain, we can always grow synapse connections as needed. Also, I read somewhere (can’t remember where) that a human brain at birth has a lot more synapse connections than an adult brain and that part of the learning process that goes on during early life is to trim excess synapses down to something that is more manageable or at least needed.

So to conclude, we (or at least IBM) seem to be making good strides in coming up with a neuromorphic computational model and physical hardware, but we are still six or seven generations away from a human brain’s capabilities (assuming a 1000 of these chips could be connected together into one “brain”).  If a neuromorphic chip generation takes ~2 years then we should be getting pretty close to human levels of computation by 2028 or so.

The Tech Review article said that the 5B transistors on TrueNorth are more transistors than any other chip that IBM has produced. So they seem to be at current technology capabilities with this chip design (which is probably proof that their selection of digital logic was a wise decision).

Let’s just hope it doesn’t take it 18 years of programming/education to attain college level understanding…

Comments?

Photo Credit(s): New 20x [view of mouse cortex] by Robert Cudmore

Forgetting is important and other news from cognitive research

A female student reading a Serbian contract law book, her face is obscured by the book
Study time by Stanković Vlada

It turns out retrieval is more important (at least for the brain) than storage.

Recent research from cognitive scientists such as Robert Bjork at the UCLA Learning & Forgetting lab have shown that most of what we think we know about learning is wrong.  (See Learning and Forgetting Lab,  Getting it wrong, UCLA Learning and Forgetting Lab for more).

 

The researchers have been testing people to see which approaches are better to recalling some information they were trying to study. They found that the key to studying and actually remembering better is working on better retrieval not better storage.

It’s somewhat interesting that the scientists aren’t talking about learning as much as retrieval of information.  Almost as if learning were actually the equivalent to information retrieval.

Stop studying the same items over and over again, just try something different

It seems that studying a single item over and over again is the wrong way to try to learn something.  A better way is to vary your studying, to examine different but related items, which somehow lets you better classify the information and provides more accessible paths for retrieving that data.

Stop studying in the same place, go someplace else

Further guidance is when trying to learn something new vary the location, decor, or any other characteristic of the environment you are trying to study in.  The key here is that these other locations add another tag/handle/indexes to the data and the more indexing, the better for retrieval success.

Stop studying, start testing

An additional way to remember better is trying to retrieve information early and often, even if it doesn’t work.  It appears that the more you try to recall, some tidbit of information, irregardless of success, the stronger the access path is burned into your brain.  So that the next time you try to recollect that information, it becomes much easier to do.  In fact, the suggestion is to try to test yourself after learning something new, right away, sort of retrieval exercise without studying it.  Struggling to recollect something helps?!

Stop taking notes during class, start taking them afterwards

Following on in that vein yet almost unbelievable, is another recommendation to abandon note taking altogether and rather, spend time after class to summarize (exercising that retrieval path again) what you were taught.  The important part is to do this immediately afterwards.  (Don’t tell my kids!)

Stop studying continually, wait before you study again

Moreover, another suggestion is to wait before you study something again. It seems if you study something too soon after having just studied it, you are not exercising that recall path well enough. Rather, they advocate waiting around a couple of days/weeks before studying something again to remember it better.  Struggling to recall information is better for remembering it than having an easy time of it.

With (relatively) infinite storage, forgetting is important

Finally, the cognitive scientists seem to think that forgetting is almost as important as remembering.  From a storage perspective, it appears that the brain has an unlimited capacity to store information.  But the downside is that any retrieval takes time and effort (something akin to searching through a bunch of indexes).

What we really want is to be better able to retrieve information that’s important.  Keeping all that extraneous junk readily recallable just slows down the retrieval of the really good stuff.  So forgetting helps purge un-needed access paths/tags/indexes freeing up space for what needs to be remembered.

~~~~

Gosh, and to think all along all those illegible notes I took in college (and still do) really did help me learn!?

Comments?