Learning machine learning – part 1

Saw an article this past week from AWS Re:Invent that they just released their Machine Learning curriculum and materials  free to the public. Google (Cloud Platform and elsewhere) TensorFlow,  (Facebook’s) PyTorch, and Microsoft Azure CNTK frameworks  education is also available and has been for awhile now.

But first please take our new poll:

My money is on PyTorch and Tensorflow as being the two frameworks most likely to succeed. However all the above use many open source facilities and there seems to be a lot of cross breeding across them. Both AWS ML solutions and Microsoft CNTK offer PyTorch and TensorFlow frameworks/APIs as one option among many others.  

AWS Machine Learning

I spent about an hour plus looking over the AWS SageMaker tutorial videos in the developer section of AWS machine learning curriculum. Signing up was fairly easy but I already had an AWS login. You also had to enroll/register for the course on your AWS login  but once that was through, you could access courses.

In the comments on the AWS blog post there were a number of entries indicating broken links and other problems but I didn’t have any issues. Then again, I didn’t start at the beginning, only looked at over one series of courses, and was using the websites one week after they were announced at Re:Invent.

Amazon SageMaker is an overarching framework that can be used to perform machine learning on AWS, all the way from gathering, analyzing and modifying the dataset(s), to training the model, to creating a inference engine available as an endpoint that can be used to perform the inferencing.

Amazon also has special purpose API based tools that allow customers to embed intelligence (inferencing) directly into their application, without needing to perform the ML training. These include:

  • Amazon Recognition which provides image (facial and other tagging) recognition services
  • Amazon Polly which provides text to speech services in multilple languages, and
  • Amazon Lex which provides speech recognition technology (used by Alexa) and together with Polly helps embed conversational interfaces into customer applications.

TensorFlow Machine Learning

In the past I looked over the TensorFlow tutorials and recently rechecked them out. I found them much easier to follow this time.

The Google IO 2018 video on TensorFlowGetting Started With TensorFlow High Level APIs, takes you through a brief introduction to the Colab(oratory),  a GCP solution that uses TensorFlow and how to use Tensorflow Keras, tf.data and TensorFlow Eager Execution to create machine learning models and perform machine learning.

 Keras on TensorFlow seems to be the easiest approach to  use machine learning technologies. The video spends most of the time discussing a Colab Keras code element,  ~9 lines, that loads a image classification dataset, defines a 1 level (one standard layer and one output layer), trains it, validates it and uses it to perform  inferencing.

The video also touches a bit on tf.data and TensorFlow Eager Execution but the main portion discusses the 9 line TensorFlow Keras machine learning example.

Both Colab and AWS Sagemaker use and discuss Jupyter Notebooks. These appear to be an open source approach to documenting and creating a workflow and executing Python code automatically.

GCP Colab is essentially a GCP-Google Drive based Jupyter notebook execution engine. With Colab you create a Jupyter notebook on google drive and interactively execute it under Colab. You can download your Juyiter notebook files and essentially execute them anywhere else that supports TensorFlow (that supports TensorFlow v1.7 or above, with Keras API support).

In the video, the Google IO   instructors (Josh Gordon and Lawrence Moroney) walk you through building a model to recognize handwritten digits and outputs a classification (0..9) of what the handwritten digit represents.

It uses a standard labeled handwriting to digits labeled data set, called the MNIST database of handwritten digits that’s already been broken up into a training set and a validation set. Josh calls this the “Hello World” of machine learning.

The instructor in the video walks you through the (Jupyter Notebook – Eager Execution-Keras) code that inputs the data set (line 2), builds a 1 level (really two layer, one neural net layer and one output layer) neural network model (lines 3-6), trains the model (line 7), tests/validates the model (line 8) and then uses it to perform an inference (line 9).

Josh spends a little time discussing neural networks and model optimizations and some of the other parameters used in the code above. He has a few visualizations of what this all means but for the most part, the code uses a simple way to build a neural net model and some standard optimization techniques for the network.

He then goes on to discuss tf.data which is an API that can be used to create machine learning datasets and provide this data to the neural net for training or inferencing.  Apparently tf.data has a number of nifty features that allow you to take raw data and transform it into something that can be used to feed neural nets. For example, separating the data into batches, shuffling (randomizing) the batches of data, pre-fetching it so as to not starve the GPU matrix multipliers, etc.

Then it goes into how machine learning is different than regular coding. And show how TensorFlow Eager Execution is really just like Python execution. They go through another example (larger) of machine learning, this one distinguishes between cats and dogs. While they use an open source Python IDE ,  PyCharm, to test and walk through their TF Eager Execution code, setting breakpoints and examining data along the way.

At the end of the video they show a link to a Google crash course on TensorFlow machine learning and they refer to a book Deep Learning with Python by Francois Chollet. They also mention a browser version of TensorFlow which uses Java Script and  your browser to develop, train and perform inferences using TensorFlow Keras machine learning.

~~~~

Never got around to Microsoft’s Azure training other than previewing some websites but plan to look over that soon.

I would have to say that the Google IO session on using TensorFlow high level APIs was a lot more enjoyable (~40 minutes) than the AWS multiple tutorial videos (>>40 minutes) that I watched to learn about SageMaker.

Not a fair comparison as one was a Google IO intro session on TensorFlow high level APIs and the other was a series of actual training videos on Amazon SageMaker and the AWS services you can use to take advantage of it.

But the GCP session left me thinking I can handle learning more and using machine learning (via TensorFlow, Keras, Eager Execution, & tf.data) to actually do something while the SageMaker sessions left me thinking, how much AWS facilities and AWS infrastructure services,  I would need to understand and use to ever get to actually developing a machine learning model.

I suppose one was more of an (AWS SageMaker) infrastructure tutorial  and the other was more of an intro into machine learning using TensorFlow wherever you wanted to execute it.

I think I’m almost ready to start creating and feeding a TensorFlow model with my handwriting and seeing if it can properly interpret it into searchable text. If it can do that, I would be a happy camper

Comments…

Photo credits: 

Screenshos from AWS Sagemaker series of tutorial video 1, 2, 3, 4 & 5, you may need a signin to view them

Screenshots from the Getting Started with TensorFlow High Level APIs YouTube video 

Google cloud offers SSD storage

Read an article the other day on Google Cloud tests out fast, high I/O SSD drives. I suppose it was only a matter of time before cloud services included SSDs in their I/O mix.

Yet, it doesn’t seem to me to be as simple as adding SSDs to the storage catalog. Enterprise storage vendors have had SSDs arguably since January of 2008 (see my EMC introduced SSDs to DMX dispatch). And although there are certainly a class of applications that can take advantage of SSD low latency/high IOPs, the vast majority of applications don’t seem to require these services.

Storage systems use of SSDs today

That’s why most enterprise storage system vendors support some form of automated storage tiering or flash caching of normal I/O for their high-end storage systems. Together with offering just plain old SSDs as data storage. In this more sophisticated solution customers have the option to assign application data to SSDs only, hybrid SSD-disks, or disk only storage. In this way the customer get’s to decide whether they want some sort of mix or just pure SSD or disk IO to satisfy their application IO requirements.

Storage startups have emerged that take on both the hybrid SSD-disk and all-flash model and add quality of service to the picture. An example of all-flash that supplies QoS version of all-flash storage is SolidFire (learn more about SolidFire in our GreyBeardsOnStorage podcast with Dave Wright).  An example that does the same sort of thing for hybrid storage is Fusion IOcontrol (formerly NexGen) storage.

Storage system QoS

In the case of SolidFire one can limit volume or volume groups with an IOPs max, throughput max, and a Burst max. The burst is sort of a credit that accrues on a time basis if the application doesn’t ask for the maximum IOPs/Througput which they then can consume above their maximums up to the burst max for a limited timeframe.

QoS capabilities are slowly making their way into enterprise storage systems as well but it will take some time for the instrumentation and capabilities to be put in place. But one can see limited QoS in IBM DS8000 priority IO, NetApp Storage QoS, EMC Unisphere QoS manager for VNX & SMC QoS for VMAX, and HDS SVOS QoS via partitioning. Most of these capabilities control access or partition cache, backend and frontend resources for host volumes. As such, they are not nearly as sophisticated or as easy to use as what SolidFire and other start ups are offering, but they are getting there.

Cloud SSD pricing

Back to the cloud offering. According to the GigaOm article, Google SSD volumes can sustain up to 15K IOPs and they are charging a premium price for this storage ($0.325/GB-month). Apparently Amazon AWS offers high IO EC2 storage as well with a maximum of 4K IOPs but charges a premium both for the storage ($0.125/GB month) and on an IOPs basis ($0.10/IOPS-month). GigaOM had a pricing comparison for 500GB and 2000 IOPs indicating that Google SSD storage would cost $163/month and the AWS provisioned SSD storage would cost $263 ($62.50 for storage and $200 for the 2000 IOPs).

The fact that you can drive the Google SSD to it’s limits without incurring any extra cost seems a serious advantage to me and would be very appealing to me to most enterprise customers.

But where’s latency

It seems to me after some IOPs level is attained, most mission critical applications are more interested in low latency IO (for more on why low latency matters seem my IO throughput vs. low latency post…). Many storage systems are capable of maximum of 100,000s of IOPS but most shops don’t run them that hard, ever. But with proper use of SSDs, most enterprise storage is now clocking IO at sub-msec. low latency IO.

However, I have yet to see any Cloud storage pricing or QoS for that matter that was based on latency guarantees.  I think this is a serious omission.

In any event, SSDs in the cloud is a good think now they just need to offer flash caching, automatic storage tiering and sophisticated QoS.  I realize this is partially re-inventing enterprise storage in the cloud but isn’t that what everyone actually wants, at cloud storage pricing of course.

Comments?