Improving floating point

Read a post this week in Reddit pointing to an article that was from The Next Platform (New approach could sink floating point computation). It was all about changing IEEE floating point format to something better called posits, which was designed by noted computer architect, John Gustafson, et al, (see their paper Beating floating point at its own game: Posit arithmetic, for more info).

But first please take our new poll:

The problems with standard floating point have been known since they were first defined, in 1985 by the IEEE. As you may recall, an IEEE 754 floating point number has three parts a sign, an exponent and a mantissa (fraction or significand part). Both the exponent and mantissa can be negative.

IEEE defined floating point numbers

The IEEE 754 standard defines the following formats (see Floating point Floating -point arithmetic, for more info)

  • Half precision floating point, (added in 2008), has 1 sign bit (for the significand or mantissa), 5 exponent bits (indicating 2**-62 to 2**+64) and 10 significand bits for a total of 16 bits.
  • Single precision floating point, has 1 sign bit, 8 exponent bits (indicating 2**-126 to 2**+128) and 23 significand bits for a total of32 bits.
  • Double precision floating point, has 1 sign bit, 11 exponent bits (2**-1022 to 2**+1024) and 52 significand bits.
  • Quadrouple precision floating point, has 1 sign bit, 15 exponent bits (2**-16,382 to 2**+16,384) and 112 significand bits.

I believe Half precision was introduced to help speed up AI deep learning training and inferencing.

Some problems with the IEEE standard include, it supports -0 and +0 which have different representations and -∞ and +∞ as well as can be used to represent a number of unique, Not-a-Numbers or NaNs which are illegal floating point numbers. So when performing IEEE standard floating point arithmetic, one needs to check to see if a result is a NaN which would make it an illegal result, and must be wary when comparing numbers such as -0, +0 and -∞ , +∞. because, sigh, they are not equal.

Posits to the rescue

It’s all a bit technical (read the paper to find out) but posits don’t support -0 and +0, just 0 and there’s no -∞ or +∞ in posits either, just ∞. Posits also allow for a variable number of exponent bits (which are encoded into Regime scale factor bits [whose value is determined by a useed factor] and Exponent scale factor bits) which means that the number of significand bits can also vary.

So, with a 32 bit, single precision Posit, the number range represented can be quite a bit larger than single precision floating point. Indeed, with the approach put forward by Gustafson, a single 32 bit posit has more numeric range than a single precision IEEE 754 float and about as 1/2 as much range as double precision IEEE floating point number but only uses 32 bits.

Presently, there’s no commercial hardware implementations of posits, but there’s a lot of interest. Mostly because, the same number of bits can represent a lot more numeric range than equivalently sized IEEE 754 floats. And for HPC environments, AI deep learning applications, scientific computing, etc. having more numeric range (or precision), in less space, means they can jam more data in the same storage, transfer more data over the same networking bandwidth and save more numbers in limited amounts of DRAM.

Although, commercial implementations do not exist, there’s been some FPGA simulations of posit floating point arithmetic. Those simulations have shown it to be more energy efficient than IEEE 754 floating point arithmetic for the same number of bits. So, you need to add better energy efficiency to the advantages of posit arithmetic.

Is it any wonder that HPC/big science (weather prediction, Square Kilometer Array, energy simulations, etc.) and many AI hardware accelerator chip designers are examining posits as a potential way to boost precision, reduce storage/memory footprint and reduce energy consumption.

~~~~

Yet, standards have a way of persisting. Just look at how long the QWERTY keyboard has lasted. It was originally designed in the 1870’s to slow down typing and reduce jamming, when typewriters were mechanical devices. But ever since 1934, when the DVORAK keyboard was patented, there’s been much better layouts for keyboards. And there’s no arguing that the DVORAK keyboard is better for typing on non-mechanical typewriters. Yet today, I know of no computer vendor that ships DVORAK labeled keyboards. Once a standard becomes set, it’s very hard to dislodge.

Comments?

Photo Credit(s):

Swarm intelligence at #HPEDiscover

I attended HPE Discover conference in Vegas this past week and among all their product announcements, there was a panel discussion on something called Swarm Intelligence. But it was really about collaborative learning.

Swarm Intelligence at HPE is a way for multiple organizations/edge devices to train a model collaboratively. They end up using their own data to train local models but then share their models (actually model node weights) with one another.

In this fashion, if say one hospital specializes in the detection and treatment of pneumonia and another in TB, they could both train a shared model on their respective sets of data. But during training, they share their model weights between them and after some number of training iterations, have a single model than supports detection of both.

But first please take our new poll:

How does swarm intelligence work?

To make swarm intelligence work:

  1. All parties have to reach consensus on model hyper parameters, i.e, type of model (CNN, RNN, LSM, etc.), number of nodes per layer, number of layers, levels of connections between nodes, etc. So there’s a single model architecture to be trained across all the organizations.
  2. All organization training data needs to be the same type, (e.g., X-rays).
  3. After each model training session all model weights have to be shared with each other
  4. All organizations have to decide on the method used to merge or combine the model weights (e.g. averaging). .

In the end, after N number of training epochs, their combined model would be essentially cross trained on each organizations data. But no one shared any data!

Why attempt swarm intelligence

HPE believes swarm intelligence would be a way to not have to transmit all that edge data to a central repository, but there’s other advantages:

  • A combined model could be trained on more data than any single organization could provide.
  • A combined model would have less organizational bias.

There’s one other possibility, but it’s unclear whether this is legally valid or not, but a combined model could be trained on data that it didn’t have legal access to.

One problem with the edge is the vast amount of data there

It turns out that a self driving car could generate 4TB of data/day of driving. Moving 4TB a day from all the cars in say a major metropolitan area (4 million people with ~1 million cars of which 20% are on the road each week day) could represent as much as 200K*4TB or 200EB of data/day.

There is not enough bandwidth in a fully 5G world to move that amount of data each day wirelessly and probably not enough bandwidth to move that amount of data over wire.

But if each car were to train its own (self-driving) model each day on its own data and then share that training model of say 1024 nodes with 1024 layers, it would represent 1M node weights ( floating point numbers), or ~16MB of data, if done effectively, one could have a cities worth of training data to train your self driving car models.

The allure of swarm intelligence/collaborative learning is high. It seems a small cost to reach consensus on the model hyper-parameters, collaborative learning methodology and synchronized training epochs to create a model trained on multiple organization/edge device training data.

HPE discussed using private blockchains to coordinate the sharing of model training across organizations or edge devices and use the block chain to compensate organizations for the use of their trained models. Certainly this could work well with edge devices but it seems an unnecessary complication for collaborative organizations.

Nonetheless, swarm intelligence may just be one way to address some of serious problems with deep learning today.

Photo Credit(s): “Starling Flock” by Mike Legend is licensed under CC BY-NC-ND 2.0 

“Artificial Intelligence & AI & Machine Learning” by mikemacmarketing is licensed under CC BY 2.0 

“Geese in v-formation, Walberswick” by stephengg is licensed under CC BY-NC-ND 2.0 

Photonic computing seeing the light of day

Read three articles this past week or so that shed some light on Photonic computing, that is doing computing with light rather than electronic signals. We discuss the first two directly below and save the last for the end.

The first was a TechCrunch article, Bill Gates, Neo, [&] Gigafund backing Luminous in photonic computing, was about a new startup that raised $9M in seed funding to focus on AI computing using photonics rather than electronics. Luminous is working on a AI accelerator chip.

The second article was from MIT News, Chip design drastically reduces energy needed to compute with light, which discusses a paper in Physics Review (Large scale optical neural networks based on photo-electric multiplication) again focused on AI DL acceleration using photonics.

AI accelerators are cropping up everywhere (see my posts on Intel DL Boost, New Graphcore GC2 chips, and AI processing at the edge). So having a photonic, AI accelerator should be easy to integrate with.

Luminous from Princeton

Neuromorphic architecture with level-tuned neurons. The internal state of a primary neuron is used to enable a set of level-tuned neurons. Credit: Pantazi et al. ©2016 IOP Publishing

Luminous Computing is based on research that’s been done over the last decade at Princeton University, and seems to have as it’s goal to ship a single chip to replace 3000TPUs. We’ve talked about Google TPUs before (see Google releases new Cloud TPU, and TPU and hardware vs. software innovation – round 3 posts). We’ve also discussed photonics before (see our Photonic or Optical FPGAs on the horizon post).

Luminous Computings, CTO Mitchell Nahmias has helped author a number of papers, articles, & books on photonics with others from Princeton University, perhaps the most impressive being Principals of neuromorphic photonics. If they are following this approach, it would seem likely they are creating a new photonic neuromorphic chip.

There is absolutely minimal information on their web site so, I am assuming they’re targeting a neuromorphic chip from the CTO’s history of research publications. I could be completely incorrect on this assumption but will continue with it for now.

Neuromorphic AI is in contrast to standard neural network deep learning approaches to AI. We have discussed Neuromorphic computing in a number of posts (e.g. see IBM introduces the SYNAPSE chip from 2011 or University of Manchester fires up the largest neuromorphic chip from 2018, more if interested, just search for “Neuromorphic”). Neuromorphic approaches to AI although had early promise, have been losing the technological arms race, as standard AI DL using GPUs, implementing neural networks, seems to have vastly more adherents both in academia as well as industry.

On the other hand, simulating actual neurons, as in neuromorphic computing, has the potential to be a significant breakthrough. But only if it can be orders of magnitude faster, cheaper or somehow better than standard AI DL neural network processing.

In one of the news reports, it mentioned that Luminous Computing has working silicon. Assuming their approach has significant AI training advantages, will it also require a neuromorphic chip to perform AI inferencing?

Photonics from MIT

The MIT group states, that based on their simulation of the photonics chip, they can perform AI DL using neural network training with much less energy. In their paper, they mention that their matrix multiplication is performed passively by optical interference alone. In this case, both the input and the multiply and accumulate function (MAC) are done using photonics.

AI DL neural network training takes multiple iterations of matrix multiplications and accumulate (MAC) functions. A standard CPU takes about 20 pj/MAC (10**-12 pico-Joules/MAC) and a GPU takes about 1 pj/MAC. Whereas in simulation they were able to show their photonic chip should be able to perform 10 fJ/MAC (10**-15 femto-Joules/MAC) This would put the MIT photonic chip design power consumption per MAC ~1000X better than CPUs and ~100X better than GPUs.

The paper goes on to say that theoretically, they could perform at 50 to 100 zJ/MAC (10**-21 zepto-Joules/MAC), which would add another factor of 1000 to their power advantages.

In addition, MIT’s photonic chip design is being constructed using “free [space] optical components” and should be able to scale much better than “nano photonic implementations”. Nano photonics requires waveguides and optical couplers whereas free-space photonics uses lasers beams traveling through space.

Their paper claims that nano-photonics can only support ~1000 node neural networks but with their free-space photonic solution, they believe they can support a 1,000,000 node neural network. Not sure how this would work in a chip design with lasers visible, being routed via micro-mirrors and other optical components, around a chip.

Nothing I could find indicates that the MIT team has working silicon nor have spun out a startup with seed funding.

Energy use for AI DL modeling

There have been many news reports noting that training a AI DL neural network is 5X more carbon polluting than a driving a car. This statement is based on the third article, a single report, Energy and policy considerations for deep learning in NLP .

One can argue with the numbers. Also, the fact that the car has historically consumed gas, a non-renewable energy source and the GPU could theoretically use solar, wind or hydro renewable power for its energy needs also needs to be stated. Moreover, one trains a AI DL (NLP) model once with all the data and then may subsequently train it on new data that comes in, but it gets used a gazillion times. Most adults (at least in the US) have a car and drive it often, multiplying automobile energy impacts by 100s of millions in the US alone.

But the fact is that AI DL training, that iterates multiple times over 1000s to 1,000,000s of data items, takes energy and lots of it. How that energy is produced matter to global warming. And anything that has the potential to reduce this energy consumption should be welcomed.

~~~~

Photonics may be ready for prime time but only if they can significantly improve AI training .

Photo Credit(s): Logo from Luminous Computing website.

Figure from Neuromorphic computing mimics important brain feature Phys.org article

Screen grab of Figure 1 from the Large-Scale Optical Neural Networks Based on Photoelectric Multiplication paper.

Screen grab of Figure 1 from the Energy and Policy Considerations for Deep Learning in NLP paper.

All that AI DL training data comes from us

Read a couple of articles the past few weeks that highlighted something that not many of us are aware of, most of the data used to train AI deep learning (DL) models comes from us.

That is through our ignorance or tacit acceptation of licenses for apps that we use every day and for just walking around/interacting with the world.

The article in Atlantic, The AI supply chain runs on ignorance, talks about Ever, a picture sharing app (like Flickr), where users opted in to its facial recognition software to tag people in pictures. Ever also used that (tagged by machine or person) data to train its facial recognition software which it sells to government agencies throughout the world.

The second article, in Engadget , Colorado College students were secretly used to train AI facial recognition (software), talks about a group using a telephoto security camera than was pointed at a high traffic area on campus. The data obtained was used to help train an AI DL model to identify facial characteristics from far away.

The article went on to say that gathering photos from people in public places is not against the law. The study was also cleared by the school. The database was not released until after the students graduated but it did have information about the time and date the photos were taken.

But that’s nothing…

The same thing applies to video sharing and photo animation models, podcasting and text speaking models, blogging and written word generation models, etc. All this data is just lying around the web, freely available for any AI DL data engineer to grab and use to train their models. The article which included the image below talks about a new dataset of millions of webpages.

From an OpenAI paper on better language models showing the accuracy of some AI DL models “trained on a new dataset of millions of webpages called WebText.”

,Google photo search is scanning the web and has access to any photo posted to use for training data. Facebook, IG, and others have millions of photos that people are posting online every day, many of which are tagged, with information identifying people in the photos. I’m sure some where there’s a clause in a license agreement that says your photos, when posted on our app, no longer belong to you alone.

As security cameras become more pervasive, camera data will readily be used to train even more advanced facial recognition models without your say so, approval or even appreciation that it is happening. And this is in the first world, with data privacy and identity security protections paramount, imagine how the rest of the world’s data will be used.

With AI DL models, it’s all about the data. Yes much of it is messy and has to be cleaned up, massaged and sometimes annotated to be useful for DL training. But the origins of that training data are typically not disclosed to the AI data engineers nor the people that created it.

We all thought China would have a lead in AI DL because of their unfettered access to data, but the west has its own way to gain unconstrained access to vast amounts of data. And we are living through it today.

Yes AI DL models have the potential to drastically help the world, humanity and government do good things better. But a dark side to AI DL models also exist to help bad actors, organizations and even some government agencies do evil.

Caveat usor (May the user beware)

~~~~

Comments?

Photo Credit(s): “Still Watching You” by jhcrow is licensed under CC BY-NC 2.0 

Computational Photography Homework 1 Results.” by kscottz is licensed under CC BY-NC 2.0 

From Language models are unsupervised multi-task learners OpenAI research paper

Intel’s new DL Boost for DL AI inferencing

I was at a TechFieldDay Extra with Intel Data Centric Innovation Conference last week in San Francisco. It was a lavish affair with many industry analysts in attendance besides the TFDx crew.

At the event Intel announced a number of new products including the availability of their next generation scaleable Xeon processor chips, new Optane DC PM (DIMM) and software, new Ethernet (800) NIC cards, new FPGA line (10nm) and DL (deep learning inferencing) boost functionality.

But first please take our new poll:

I was most interested in the DL Boost and Optane DC PM solutions. For this post I focus on DL Boost.

DL Boost for DL inferencing on Xeon

Intel’s DL Boost technology provides a new integer 8 bit precision (INT 8) matrix multiply & summation instruction which can be used to speed up DL inferencing operations. As those who have been following along with my AI-DL-machine learning (ML) blog posts (latest being Learning Machine Learning part 3), probably know, deep learning machine learning that processes data to create a neural network made up with a number of layers and a number of nodes each of which represents a floating point weight used to transform inputs into outputs.

All DL AI projects involve at least two phases: model training and model inferencing (prediction, classification, AI result, etc.). Although both of these activities involve matrix calculations, model training involves a lot more of these compute intensive operations than inferencing. In fact, while training typically is done on GPUs or other special purpose compute hardware (TPU, IPUs, etc.) inferencing can typically be done on standard off the shelf CPUs.

Historically. inferencing used floating point matrix multiplication and summation functionality ,taking input from sensors, logs, photos, etc. and performing the model logic to create an output.

Intel believes (with industry analyst agreement) that over time, 50% or more of the DL AI workload is going to involve inferencing. Hence, the focus on this end of the AI workload, at least for now.

For example, although speech recognition AI can take a long time to process audio recordings and use reinforcement learning to train a recognition model. But, once trained, you could use that recognition AI model in anything from smart speakers, to speech to text dictation machines, to voice response systems, etc. In all of these the recognition model is passed a voice recording (or voice in real time) and processes these to create a text version of the speech.

But all of this has historically been done in floating point (FP) 32 (bit precision) or FP 16. Google’s TPU is capable of doing this with less precision, but to my knowledge, up to this point, it’s always been floating point.

What is DL Boost

What Intel has done with DL Boost is to create a new X86 instruction which can perform an integer (INT) 8 (bit precision) matrix multiplication and summation with less cycles than what it took before. Intel believes if customers were to modify their trained AI neural network models to move from FP 32 (or 16) to INT 8, they could perform inferencing much faster on Xeon Cascade Lake CPUs, than they could before and not have to rely on GPUs for this activity at all.

Yes, this does require hand optimization of trained AI neural network. Some of this may be automated, but not all. Intel claims the precision loss, if done properly, is less than a few percent and it’s impact on AI inferencing correctness is negligible at best.

At the moment, for all the DL modeling I have done, i have never looked at the trained model’s weights leaving this to TensorFlow/Keras to manage for me. But I’m not creating production level DL AI systems (yet). So, I don’t know what it would take to modify my AI models to use INT 8 nor what level of degradation in correctness would ensue. But I also don’t have Cascade Lake Xeon CPUs available.

Some potential problems here:

  1. Manual activity to hand tune the INT 8 neural network is not going to be that popular, except for those organizations where inferencing requires GPUs.
  2. Most production DL AI models, undergo some form of personalization for a user or implementation instance which would require a further FP to INT conversion for each user/implementation.
  3. Most production DL AI models also undergo periodic retraining to fine tune the model with the latest data that has been accumulated. This would also require further FP to INT conversion after each training cycle.

In the end, there’s an advantage for production AI inferencing, for models that don’t require substantial retraining/personalization as they don’t change that often. And there’s a definite cost advantage to using DL Boost INT 8, for those AI inferencing that must use GPUs today to perform in real time or under other performance constraints.

But hand converting neural networks, reminds me of creating assembly code for modules that can impact performance. This is normally reserved for only a select modules or functionality that executud a lot. However, DL models are much more monolithic and by definition, less modular. Identifying which models (or model layers) within a production DL AI solution that are performance sensitive and hand optimizing them to work on CPUs rather than GPUs, seems like a hard task.

It would be better from my perspective to create a single FP 16 matrix multiplication instruction. Alternatively, create some software that would automatically convert any DL AI model (or model layer) from FP to INT 8. That way DL Boost optimization would be just another step in the model training process and could be automatically generated to see if A) it loses too much sensitivity and B) if it’s worthwhile using CPU inferencing.

~~~~

Comments?

Learning machine learning – part 3

Image of the cover of the book Deep Learning with Python

Decided to take the plunge and purchase the Deep Learning with Python book and see what it has to offer. In prior posts (see Learning machine learning – part 1 & part 2) we were working with the cloud tutorials. This Part one is based on the book

It has a great introduction into deep learning which is a subset of machine learning. After what I know today, the Microsoft Azure session was more on traditional (statistical) machine learning and not deep learning.

 

Installing deep learning

In order to use the book, you need access to Keras, Python, Jupyter and a Keras backend (TensorFlow, Microsoft CNTK or Theanno).

I decided not to use any cloud solutions and rather install Python, Jupiter, TensorFlow and Keras on my MacBook. Although it probably would have been much easier (and more costly) to use any cloud solution.

I followed the directions on the installing TensorFlow website for the PIP install (you have to install a “virtual environment” and “PIP” first). The MacBook didn’t have a NVIDIA GPU so I needed to install the CPU version of TensorFlow.

But I had the hardest time running any of the book examples. Whenever I changed any command cell in a Jupyter notebook with Keras functionality in them (like adding a space to the end of an “import Keras” command line), it would throw a (module not found) error.

After days of web searching for what path is used for Jupyter notebook-iPython/Python imports (sys.path and PYTHONPATH) and where I should be importing Keras from (it’s not “~/ .keras”), I got nowhere closer to running anything.

I finally saw that I could directly install Keras (again, when I installed Tensorflow, it installed Keras as well) into my VENV. After I did that, everything worked. (I probably have one too many Keras environments, but who cares).

Finally getting the environment correct, I could now execute any command cells in a Jupyter notebook (with Keras functionality properly, well most of them anyways).

Jupyter notebooks for dummies…

It took me a while to figure out that the way you run a Jupyter notebook server is by issuing the command “jupyter notebook” (nowhere in the command’s help file, but can be found in Jupyter tutorials). That’s when I started to see the problems in the installation section above with my Keras installation.

Understanding Jupyter notebooks is non-trivial. Yes, I know it’s an interactive code and documentation environment. It’s sort of like BASIC on steroids with WORD functionality built in/escapeable into at any time.

First thing to understand is that when you open up a jupyter notebook, you haven’t executed anything yet. YES there are output lines in the notebook you just opened but NO, they aren’t from executing them under your client-server environment.

The output lines you see in the notebook is output from someone else’s execution run. So while they may look like they worked fine but they haven’t executed in your installation environment yet..

Also, when executing Jupyter notebook command cells, pay special attention to the In [?]: that’s shown to the right of every command cell.

When the ‘?’ is a number, like In:[12] that tells you what sequence (12th in the sequence) that (multi-line command cell) has been executed in and when the ‘?’ a “*”, like In[*], it says that the Jupyter notebook server is executing that command cell. 

Some command cells generate Out [?]: lines and others do not. So can’t use this to tell if something’s been executed or not. The only way to tell if some command cell has been executed is by seeing the In [n]: integer as n be incremented from the last command cell you executed. Of course you can execute command cells out of sequence if you wish.

Jupyter notebook coding/executing was weird as one who is more used to C, Java, and other coding languages and IDEs. A video tutorial on Jupiter notebooks would probably have helped here, but I couldn’t find one.

Running the examples

You can download all of the books current examples from the book’s website.

The book suggests you add model layers, subtract model layers and change the parameters of the number of nodes in a model as examples for you to try at home.

In general, doing so (once the environment was setup properly) seemed to work as desired. Adding layers didn’t seem to change the accuracy of the models, if anything it degraded it, and deleting layers didn’t help either. Ditto for adding or reducing node counts within a layer.

There’s a bunch of datasets that comes with Keras install used in the examples. Many examples have a first step where you modify this data so as to be more amenable to deep learning modeling.

For example, there’s a IMDB dataset that has film reviews. The film reviews are text files. But deep learning doesn’t work on text strings so you need to convert the text files into lists of integers. You do this by looking up each word in a word dictionary and substituting the index for each word in the review, generating an array (list) of integers.

This is all done through the NumPy package. It’s worth the time to become familiar with Python and probably NumPy. I took the verbal Python tutorial (but did nothing to learn NumPy).

Another example is a real estate prediction model that has 13 different parameters across 500 or so neighborhoods. The parameters are all different, some are distances, some %s, some pricing differences, etc. In order to perform deep learning on them, the example normalizes all of them, using distance from mean, in units of standard deviation.

There are other examples of data transformations as well. It seems that transforming your data into something amenable to deep learning is one part of the magic of deep learning.

Back to the book

Getting through chapter 3 of the book i- fairly straightforward when everything is set up properly. I found a iPad app (Juno) that could be used to connect to the Jupyter Server and it seemed to work once I found the proper command to use to start Jupyter (jupyter notebook –ip=”*”) and the proper Jupyter configuration parameters to use.

The examples are pretty self-documenting so you should be able to try out any of them on your own. The book adds great explanations on machine learning, deep learning and and the overall flow of how to approach a deep learning project.

Once you finish chapter 4 of the book you have all the tools one needs to tackle any deep learning project that you want to attack. You may need to read up on how to transform your data and you will probably be using one of the modeling techniques in one of the examples but it seems easy enough.

The rest of the book’s chapters (which I have yet to complete) deal with deep learning in practice and it’s in these chapters that you can learn some of the art of deep learning data science and model science..

~~~~

I ended up having fun with Jupyter notebooks, once I got them running with the iPad client in one hand and the book in the other. At the end of chapter 4, I startedto see some applications to my consulting business that might be interesting to model.

Using the Mac CPU was fast enough for the examples but I may have to tear down the crypto mine and use it as an AI server for my home network if I plan to tackle something with more data.

Wish me luck…

Learning machine learning – part 1

Saw an article this past week from AWS Re:Invent that they just released their Machine Learning curriculum and materials  free to the public. Google (Cloud Platform and elsewhere) TensorFlow,  (Facebook’s) PyTorch, and Microsoft Azure CNTK frameworks  education is also available and has been for awhile now.

But first please take our new poll:

My money is on PyTorch and Tensorflow as being the two frameworks most likely to succeed. However all the above use many open source facilities and there seems to be a lot of cross breeding across them. Both AWS ML solutions and Microsoft CNTK offer PyTorch and TensorFlow frameworks/APIs as one option among many others.  

AWS Machine Learning

I spent about an hour plus looking over the AWS SageMaker tutorial videos in the developer section of AWS machine learning curriculum. Signing up was fairly easy but I already had an AWS login. You also had to enroll/register for the course on your AWS login  but once that was through, you could access courses.

In the comments on the AWS blog post there were a number of entries indicating broken links and other problems but I didn’t have any issues. Then again, I didn’t start at the beginning, only looked at over one series of courses, and was using the websites one week after they were announced at Re:Invent.

Amazon SageMaker is an overarching framework that can be used to perform machine learning on AWS, all the way from gathering, analyzing and modifying the dataset(s), to training the model, to creating a inference engine available as an endpoint that can be used to perform the inferencing.

Amazon also has special purpose API based tools that allow customers to embed intelligence (inferencing) directly into their application, without needing to perform the ML training. These include:

  • Amazon Recognition which provides image (facial and other tagging) recognition services
  • Amazon Polly which provides text to speech services in multilple languages, and
  • Amazon Lex which provides speech recognition technology (used by Alexa) and together with Polly helps embed conversational interfaces into customer applications.

TensorFlow Machine Learning

In the past I looked over the TensorFlow tutorials and recently rechecked them out. I found them much easier to follow this time.

The Google IO 2018 video on TensorFlowGetting Started With TensorFlow High Level APIs, takes you through a brief introduction to the Colab(oratory),  a GCP solution that uses TensorFlow and how to use Tensorflow Keras, tf.data and TensorFlow Eager Execution to create machine learning models and perform machine learning.

 Keras on TensorFlow seems to be the easiest approach to  use machine learning technologies. The video spends most of the time discussing a Colab Keras code element,  ~9 lines, that loads a image classification dataset, defines a 1 level (one standard layer and one output layer), trains it, validates it and uses it to perform  inferencing.

The video also touches a bit on tf.data and TensorFlow Eager Execution but the main portion discusses the 9 line TensorFlow Keras machine learning example.

Both Colab and AWS Sagemaker use and discuss Jupyter Notebooks. These appear to be an open source approach to documenting and creating a workflow and executing Python code automatically.

GCP Colab is essentially a GCP-Google Drive based Jupyter notebook execution engine. With Colab you create a Jupyter notebook on google drive and interactively execute it under Colab. You can download your Juyiter notebook files and essentially execute them anywhere else that supports TensorFlow (that supports TensorFlow v1.7 or above, with Keras API support).

In the video, the Google IO   instructors (Josh Gordon and Lawrence Moroney) walk you through building a model to recognize handwritten digits and outputs a classification (0..9) of what the handwritten digit represents.

It uses a standard labeled handwriting to digits labeled data set, called the MNIST database of handwritten digits that’s already been broken up into a training set and a validation set. Josh calls this the “Hello World” of machine learning.

The instructor in the video walks you through the (Jupyter Notebook – Eager Execution-Keras) code that inputs the data set (line 2), builds a 1 level (really two layer, one neural net layer and one output layer) neural network model (lines 3-6), trains the model (line 7), tests/validates the model (line 8) and then uses it to perform an inference (line 9).

Josh spends a little time discussing neural networks and model optimizations and some of the other parameters used in the code above. He has a few visualizations of what this all means but for the most part, the code uses a simple way to build a neural net model and some standard optimization techniques for the network.

He then goes on to discuss tf.data which is an API that can be used to create machine learning datasets and provide this data to the neural net for training or inferencing.  Apparently tf.data has a number of nifty features that allow you to take raw data and transform it into something that can be used to feed neural nets. For example, separating the data into batches, shuffling (randomizing) the batches of data, pre-fetching it so as to not starve the GPU matrix multipliers, etc.

Then it goes into how machine learning is different than regular coding. And show how TensorFlow Eager Execution is really just like Python execution. They go through another example (larger) of machine learning, this one distinguishes between cats and dogs. While they use an open source Python IDE ,  PyCharm, to test and walk through their TF Eager Execution code, setting breakpoints and examining data along the way.

At the end of the video they show a link to a Google crash course on TensorFlow machine learning and they refer to a book Deep Learning with Python by Francois Chollet. They also mention a browser version of TensorFlow which uses Java Script and  your browser to develop, train and perform inferences using TensorFlow Keras machine learning.

~~~~

Never got around to Microsoft’s Azure training other than previewing some websites but plan to look over that soon.

I would have to say that the Google IO session on using TensorFlow high level APIs was a lot more enjoyable (~40 minutes) than the AWS multiple tutorial videos (>>40 minutes) that I watched to learn about SageMaker.

Not a fair comparison as one was a Google IO intro session on TensorFlow high level APIs and the other was a series of actual training videos on Amazon SageMaker and the AWS services you can use to take advantage of it.

But the GCP session left me thinking I can handle learning more and using machine learning (via TensorFlow, Keras, Eager Execution, & tf.data) to actually do something while the SageMaker sessions left me thinking, how much AWS facilities and AWS infrastructure services,  I would need to understand and use to ever get to actually developing a machine learning model.

I suppose one was more of an (AWS SageMaker) infrastructure tutorial  and the other was more of an intro into machine learning using TensorFlow wherever you wanted to execute it.

I think I’m almost ready to start creating and feeding a TensorFlow model with my handwriting and seeing if it can properly interpret it into searchable text. If it can do that, I would be a happy camper

Comments…

Photo credits: 

Screenshos from AWS Sagemaker series of tutorial video 1, 2, 3, 4 & 5, you may need a signin to view them

Screenshots from the Getting Started with TensorFlow High Level APIs YouTube video 

New GraphCore GC2 chips with 2PFlop performance in a Dell Server

I was at Dell EMC Analyst summit this past week and at the show they had a series of sessions describing some of Dell Venture capital investments. One of the sessions was about GraphCore, a UK design firm that’s working on a new AI chip.

But first please take our new poll:

Their new GC2 chip is now out and available for customers to use. The new chip offers unprecedented performance for AI NN computations.

Hardware

GraphCore’s new Colossus GC2 chip holds 1216 IPU-Cores™. Each IPU runs at 100GFlops and is capable of running 7 threads. The GC2 chip supports 300MB of memory, with an aggregate of 30TB/s of memory bandwidth.  Each IPU supports low precision floating point arithmetic in completely parallel/concrrent execution. The GC2 chip has 23.6B transistors.

Each GC2 chip supports 80 IPU-Links™ to connect to other GC2 chips with 2.5tbps of chip to chip bandwidth. Further, the chip includes a PCIe Gen 4 x16 link (31.5GB/s) to host processors. And each chip supports up to 8TB/s IPU-Exchange™ on the chip bandwidth for inter chip, IPU to IPU communications.

The GC2 chip is available on a PCIe accelerator board that includes 2 GC2 chips. It’s also available in a Dell server configuration with 8 of their PCIe accelerator boards. In the server, with 2 GC2 chips  per board, it has ~19.5K IPUs with ~2.0PFlops in total of IPU processing power.

Software

GC2 IPUs support GraphCore’s Poplar® software and API’s that allows users to code in many of their favorite AI framework, such as PyTorch and TensorFlow.

At the NIPS 2017 conference GraphCore showed some AI ResNet-50, DeepBench LSTM RNN, and DeepVoice WaveNet performance benchmark results with their GC2 accelerator cards..

The chart above shows DeepBench LSTMN RNN runs comparing their  GC2 accelerator card against an Nvidia P100 GPU board (longer is better).

DeepBench is intended to support a set of workloads that mimic or simulate typical deep neural net types of operations and is used to compare NN hardware systems. The chart above compares DeepBench RNN inference operations with  GC2 accelerator card  vs. Nvidia P100 cards at three levels of response times (<2msec, <5msec. and <7msec.).

As can be seen in the chart, the GraphCore GC2 accelerator card performed significantly (from 182X to 242X) better than the Nvidia accelerator card executing NN inferencing at <5msec and <7msec latency. And was able to perform ~42K Inferences at <2msec latency where Nvidia P100 was unable to do at all.

~~~~
The GC2 chip, accelerator card and Dell EMC servers that run them look to be a significant advance in AI NN computations. We didn’t see any technical spec’s for the server but we assume it comes in a 4U configuration and uses less power than 8 GPUs.

However, at the moment, the servers are sold out. No information on the GC2 accelerator cards but our guess is that they are sold out as well, and probably ditto for the chips. Dell didn’t quote us any pricing on the servers, so its hard to know whether we could afford one, even if they weren’t sold out.

Who wouldn’t want to own a 4U server with 2PFlops performance for their AI apps?

Comments?

Photo Credit(s): Photos taken during Dell EMC Analyst Summit GraphCore presentation

Photos from GraphCore NIPS 2017 presentations