Learning machine learning – part 3

Image of the cover of the book Deep Learning with Python

Decided to take the plunge and purchase the Deep Learning with Python book and see what it has to offer. In prior posts (see Learning machine learning – part 1 & part 2) we were working with the cloud tutorials. This Part one is based on the book

It has a great introduction into deep learning which is a subset of machine learning. After what I know today, the Microsoft Azure session was more on traditional (statistical) machine learning and not deep learning.


Installing deep learning

In order to use the book, you need access to Keras, Python, Jupyter and a Keras backend (TensorFlow, Microsoft CNTK or Theanno).

I decided not to use any cloud solutions and rather install Python, Jupiter, TensorFlow and Keras on my MacBook. Although it probably would have been much easier (and more costly) to use any cloud solution.

I followed the directions on the installing TensorFlow website for the PIP install (you have to install a “virtual environment” and “PIP” first). The MacBook didn’t have a NVIDIA GPU so I needed to install the CPU version of TensorFlow.

But I had the hardest time running any of the book examples. Whenever I changed any command cell in a Jupyter notebook with Keras functionality in them (like adding a space to the end of an “import Keras” command line), it would throw a (module not found) error.

After days of web searching for what path is used for Jupyter notebook-iPython/Python imports (sys.path and PYTHONPATH) and where I should be importing Keras from (it’s not “~/ .keras”), I got nowhere closer to running anything.

I finally saw that I could directly install Keras (again, when I installed Tensorflow, it installed Keras as well) into my VENV. After I did that, everything worked. (I probably have one too many Keras environments, but who cares).

Finally getting the environment correct, I could now execute any command cells in a Jupyter notebook (with Keras functionality properly, well most of them anyways).

Jupyter notebooks for dummies…

It took me a while to figure out that the way you run a Jupyter notebook server is by issuing the command “jupyter notebook” (nowhere in the command’s help file, but can be found in Jupyter tutorials). That’s when I started to see the problems in the installation section above with my Keras installation.

Understanding Jupyter notebooks is non-trivial. Yes, I know it’s an interactive code and documentation environment. It’s sort of like BASIC on steroids with WORD functionality built in/escapeable into at any time.

First thing to understand is that when you open up a jupyter notebook, you haven’t executed anything yet. YES there are output lines in the notebook you just opened but NO, they aren’t from executing them under your client-server environment.

The output lines you see in the notebook is output from someone else’s execution run. So while they may look like they worked fine but they haven’t executed in your installation environment yet..

Also, when executing Jupyter notebook command cells, pay special attention to the In [?]: that’s shown to the right of every command cell.

When the ‘?’ is a number, like In:[12] that tells you what sequence (12th in the sequence) that (multi-line command cell) has been executed in and when the ‘?’ a “*”, like In[*], it says that the Jupyter notebook server is executing that command cell. 

Some command cells generate Out [?]: lines and others do not. So can’t use this to tell if something’s been executed or not. The only way to tell if some command cell has been executed is by seeing the In [n]: integer as n be incremented from the last command cell you executed. Of course you can execute command cells out of sequence if you wish.

Jupyter notebook coding/executing was weird as one who is more used to C, Java, and other coding languages and IDEs. A video tutorial on Jupiter notebooks would probably have helped here, but I couldn’t find one.

Running the examples

You can download all of the books current examples from the book’s website.

The book suggests you add model layers, subtract model layers and change the parameters of the number of nodes in a model as examples for you to try at home.

In general, doing so (once the environment was setup properly) seemed to work as desired. Adding layers didn’t seem to change the accuracy of the models, if anything it degraded it, and deleting layers didn’t help either. Ditto for adding or reducing node counts within a layer.

There’s a bunch of datasets that comes with Keras install used in the examples. Many examples have a first step where you modify this data so as to be more amenable to deep learning modeling.

For example, there’s a IMDB dataset that has film reviews. The film reviews are text files. But deep learning doesn’t work on text strings so you need to convert the text files into lists of integers. You do this by looking up each word in a word dictionary and substituting the index for each word in the review, generating an array (list) of integers.

This is all done through the NumPy package. It’s worth the time to become familiar with Python and probably NumPy. I took the verbal Python tutorial (but did nothing to learn NumPy).

Another example is a real estate prediction model that has 13 different parameters across 500 or so neighborhoods. The parameters are all different, some are distances, some %s, some pricing differences, etc. In order to perform deep learning on them, the example normalizes all of them, using distance from mean, in units of standard deviation.

There are other examples of data transformations as well. It seems that transforming your data into something amenable to deep learning is one part of the magic of deep learning.

Back to the book

Getting through chapter 3 of the book i- fairly straightforward when everything is set up properly. I found a iPad app (Juno) that could be used to connect to the Jupyter Server and it seemed to work once I found the proper command to use to start Jupyter (jupyter notebook –ip=”*”) and the proper Jupyter configuration parameters to use.

The examples are pretty self-documenting so you should be able to try out any of them on your own. The book adds great explanations on machine learning, deep learning and and the overall flow of how to approach a deep learning project.

Once you finish chapter 4 of the book you have all the tools one needs to tackle any deep learning project that you want to attack. You may need to read up on how to transform your data and you will probably be using one of the modeling techniques in one of the examples but it seems easy enough.

The rest of the book’s chapters (which I have yet to complete) deal with deep learning in practice and it’s in these chapters that you can learn some of the art of deep learning data science and model science..


I ended up having fun with Jupyter notebooks, once I got them running with the iPad client in one hand and the book in the other. At the end of chapter 4, I startedto see some applications to my consulting business that might be interesting to model.

Using the Mac CPU was fast enough for the examples but I may have to tear down the crypto mine and use it as an AI server for my home network if I plan to tackle something with more data.

Wish me luck…

Google releases new Cloud TPU & Machine Learning supercomputer in the cloud

Last year about this time Google released their 1st generation TPU chip to the world (see my TPU and HW vs. SW … post for more info).

This year they are releasing a new version of their hardware called the Cloud TPU chip and making it available in a cluster on their Google Cloud.  Cloud TPU is in Alpha testing now. As I understand it, access to the Cloud TPU will eventually be free to researchers who promise to freely publish their research and at a price for everyone else.

But first please take our new poll:

What’s different between TPU v1 and Cloud TPU v2

The differences between version 1 and 2 mostly seem to be tied to training Machine Learning Models.

TPU v1 didn’t have any real ability to train machine learning (ML) models. It was a relatively dumb (8 bit ALU) chip but if you had say a ML model already created to do something like understand speech, you could load that model into the TPU v1 board and have it be executed very fast. The TPU v1 chip board was also placed on a separate PCIe board (I think), connected to normal x86 CPUs  as sort of a CPU accelerator. The advantage of TPU v1 over GPUs or normal X86 CPUs was mostly in power consumption and speed of ML model execution.

Cloud TPU v2 looks to be a standalone multi-processor device, that’s connected to others via what looks like Ethernet connections. One thing that Google seems to be highlighting is the Cloud TPU’s floating point performance. A Cloud TPU device (board) is capable of 180 TeraFlops (trillion or 10^12 floating point operations per second). A 64 Cloud TPU device pod can theoretically execute 11.5 PetaFlops (10^15 FLops).

TPU v1 had no floating point capabilities whatsoever. So Cloud TPU is intended to speed up the training part of ML models which requires extensive floating point calculations. Presumably, they have also improved the ML model execution processing in Cloud TPU vs. TPU V1 as well. More information on their Cloud TPU chips is available here.

So how do you code a TPU?

Both TPU v1 and Cloud TPU are programmed by Google’s open source TensorFlow. TensorFlow is a set of software libraries to facilitate numerical computation via data flow graph programming.

Apparently with data flow programming you have many nodes and many more connections between them. When a connection is fired between nodes it transfers a multi-dimensional matrix (tensor) to the node. I guess the node takes this multidimensional array does some (floating point) calculations on this data and then determines which of its outgoing connections to fire and how to alter the tensor to send to across those connections.

Apparently, TensorFlow works with X86 servers, GPU chips, TPU v1 or Cloud TPU. Google TensorFlow 1.2.0 is now available. Google says that TensorFlow is in use in over 6000 open source projects. TensorFlow uses Python and 1.2.0 runs on Linux, Mac, & Windows. More information on TensorFlow can be found here.

So where can I get some Cloud TPUs

Google is releasing their new Cloud TPU in the TensorFlow Research Cloud (TFRC). The TFRC has 1000 Cloud TPU devices connected together which can be used by any organization to train machine learning algorithms and execute machine learning algorithms.

I signed up (here) to be an alpha tester. During the signup process the site asked me: what hardware (GPUs, CPUs) and platforms I was currently using to training my ML models; how long does my ML model take to train; how large a training (data) set do I use (ranging from 10GB to >1PB) as well as other ML model oriented questions. I guess there trying to understand what the market requirements are outside of Google’s own use.

Google’s been using more ML and other AI technologies in many of their products and this will no doubt accelerate with the introduction of the Cloud TPU. Making it available to others is an interesting play but this would be one way to amortize the cost of creating the chip. Another way would be to sell the Cloud TPU directly to businesses, government agencies, non government agencies, etc.

I have no real idea what I am going to do with alpha access to the TFRC but I was thinking maybe I could feed it all my blog posts and train a ML model to start writing blog post for me. If anyone has any other ideas, please let me know.


Photo credit(s): From Google’s website on the new Cloud TPU

TPU and hardware vs. software innovation (round 3)

tpu-2At Google IO conference this week, they revealed (see Google supercharges machine learning tasks …) that they had been designing and operating their own processor chips in order to optimize machine learning.

They called the new chip, a Tensor Processing Unit (TPU). According to Google, the TPU provides an order of magnitude more power efficient machine learning over what’s achievable via off the shelf GPU/CPUs. TensorFlow is Google’s open sourced machine learning  software.

This is very interesting, as Google and the rest of the hype-scale hive seem to have latched onto open sourced software and commodity hardware for all their innovation. This has led the industry to believe that hardware customization/innovation is dead and the only thing anyone needs is software developers. I believe this is incorrect and that hardware innovation combined with software innovation is a better way, (see Commodity hardware always loses and Better storage through hardware posts).
Continue reading “TPU and hardware vs. software innovation (round 3)”