Polarized laser light speeds up data center networks

binary data flow

Read an article the other day, Polarizing the data center from IEEE Spectrum, on new optical technology that has the potential to boost data center networking speeds by ~7x beyond what it is today. The research was released in a Nature article, Ultrafast spin lasers (paywall) but a previous version of the paper was released on PLOS (Ultrafast spin lasers) was freely available

It’s still in lab demonstration at this point, but if it does make into the data center, it has the potential to remove local networking as a bottleneck for application workloads, at least for the foreseeable future.

The new technology is based on polarizing (right or left circular) laser light and using plolarization to encode ones and zeros. Today’s optical transceivers seem to use on-off or brightness level to encode data signals, which requires a lot of power (and by definition cooling) to work. On the other hand, polarizing laser light takes ~7% of the power (and cooling), then the old style of on and off laser light. 

How it works

Not sure I understand all the physics but it appears that if you are able to control the carrier spin within a semiconductor, Vertical-Cavity Surface-Emitting Laser (VCSEL), it transmutes carrier spin into photon polarization, and by doing so, emits polarized laser light. And with appropriate sensors, this laser light polarization can be detected and decoded. 

In addition, due to some physical constraints, modulating (encoding) laser intensity will never be faster than modulating (encoding) carrier spin. This has something to do with cycling the laser on and off vs, the polarization process. As such, one should be able to can transmit more information by polarized laser light than by intensified laser light.

Moreover, polarization can be done at room temperature. Apparently, VCSELs operating today typically hit 70C in normal high speed operations, vs. ~21C for VCSELs using polarization.

Lab results

In the lab they are using (I believe) mechanical bending in combination with a pulsed laser to create the spin carriers in the VCSEL’s that polarize the laser light. This is just used for demonstrating purposes. Unclear whether this approach will be useable in a data center application of the technology.

In their lab experiments they were able to demonstrate VCSEL polarization cycles (how quickly they could change polarization) in the 5 ps (pico-second, trillionths of a second) range. This resulted in transmitting something on the order of 214Ghz of polarized light cycles. Somewhere in the PLoS article they mentioned transmitting a random bit string using the technology and not just cycling through 1s and 0s over and over again.

The researchers believe that by moving from mechanical bending, to the use of a photonic crystal or strained quantum well-based VCSELs will allow them to move from signaling at 214Ghz to 1Thz, or ~28X what can be done with laser intensity signaling today. 

I don’t know whether the technology will get out of the lab anytime soon but 1Thz  (~1Tbps) seems something most IT organizations would want, especially if the price is right is similar to today’s technology.

The research mentioned this would be more suitable for data center networking rather than long range data transfers. Not sure why but it could be because 1) it’s still relatively experimental and 2) they have yet to determine distance degradation parameters.

Of course normal (on-off) signaling technology using VCSELs is not standing still. There’s always a potential for moving beyond any current physical constraints to boost some technologies capabilities. Just witness the superparamagnetic barrier in magnetic disk over the years. That physical barrier has moved multiple times during my career.

However, a nearly order of magnitude of speed and more than an order of magnitude of power/cooling improvements are hard to come by with mature technology. I see a polarized optical fiber networking in data centers of the future.



Photo Credit(s):

For data that never rests, NetApp NDAS

NetApp co-founder, Dave Hitz announced he was becoming a NetApp Founder Emeritus at the Storage Field Day (SFD18) show. He gave a great session about what he and his Hitz foundation’s been doing (for one example see our Archeology meets big data, post). He also discussed at length where he felt the storage world (and NetApp) must do to address the opportunities of the new cloud world. But this post isn’t about Dave, it’s about NetApp Data Availability Service, NDAS.

NetApp NDAS, currently in Beta but GAing (hopefully) later this year, is an AWS marketplace data orchestration solution that manages primary to secondary to S3 movement for ONTAP data. Essentially, NetApp Data Availability Services extends ONTAP data lifecycle management to AWS cloud. But it’s more than just a way to archive ONTAP data.

NDAS orchestrates Snapmirror services across ONTAP systems and AWS. But once your ONTAP data is in S3 it supplies access to that data for authorized AWS applications and services. That way one can use their ONTAP data to provide data analytics, train AI models, and do just about anything you can do with AWS applications today. By using NDAS, customers can extract more value from their ONTAP data.

NDAS is not just copying data to S3 but is also copying ONTAP metadata, catalogues and other information that provides context for that data. By copying ONTAP catalog information, customers and authorized end users can have file level access to ONTAP data residing in S3 objects.

NDAS today, only supports copying data from secondary ONTAP systems to S3. But a future enhancement will expand this to copy primary ONTAP data to S3.

How does NDAS work

NDAS provisions (your) EC2 instances, and middleware to read the data from the secondary systems and copy it to S3 buckets which you provide. NDAS after initial configuration to point to your ONTAP secondary storage systems, will autodiscover all the data available that can be copied to the cloud.

NDAS will start cataloguing your ONTAP data. NDAS EC2 instances support the NDAS copy, view and a Google-like search processes.

NDAS search presents a simplified file system view into your ONTAP data copied to S3. That way customers can identify data that could be used for AI training or data analytics that run in the cloud to access the data.

There’s extensive security to insure that NDAS is properly authorized to access your ONTAP data. Normal S3 security options also apply such as to have the data be encrypted on S3. NDAS data is automatically encrypted in flight.

Moreover, NDAS S3 bucket data can be replicated across AWS regions . Also serverless/lambda funationality are fully supported from or NDAS S3 buckets. .

What can it do with the data

AWS applications can access the data directly through NDAS APIs. Or customers can manually extract data they want to further process using the NDAS GUI to identify and copy data of interests. NDAS essentially creates a small app layer that allows users to view and access the ONTAP data in S3 as a file system.

One can have different NDAS AMIs operating in different regions for faster access or to support GDPR compliance requirements. Alternatively, a customer could have one NDAS AMI accessing all their secondary ONTAP instances.

NDAS is intended to provide a data analyst or IT generalist access to ONTAP data. This way AI training and big data analytics applications which run easily in the cloud, can have access to ONTAP data. In this way, customers can more effectively utilize data that IT has been storing and maintaining, since time began.

One NDAS beta customer is a MLB team. They have over time instrumented their stadiums to generate lot’s of data about pitch speed, rotation, ball location as it crosses the plate, etc.   The problem with all this data is siloed in onprem or IOT systems that generated it. But the customer wants to use the data to improve players, coaches and the viewer experience. And all that needs tools, applications and software that’s just not available to run in the data center. But with NDAS all this data is now available to cloud applications.

NDAS is supported by any ONTAP 9.5 or later (FAS, AFF, Cloud ONTAP, ONTAPselect) secondary storage system. ONTAP 9.5 software contains all the services required to support NDAS. This includes the copy-to-cloud APIs, as well as the NDAS proxy, which supplies the secure interface to NDAS operating in the cloud.

NetApp’s NDAS sessions are pretty informative. Anyone interested in finding out more should checkout the videos available on TechFieldDay website and Dave’s session is also worth a view.

For more information on Dave’s session and NDAS check out:

NetApp, Cloudier than ever by Enrico Signoretti (@ESignoretti)

NetApp and the space in between by Dan Frith (@PenguinPunk)



DNA IT, the next revolution

I’ve been writing about DNA computing and storage for quite awhile now (see DNA computing and the end of natural evolution, DNA storage and the end of evolution part 2, & Random access DNA object storage system). But in the last few months there’s been a flurry of activity in this space that seems worthy of note.

DNA programing language

First up, A logic programing language for computational nucleic acid devices, a research article in ACS Synthetic Biology magazine. The research describes a new approach to programming DNA computers, that’s uniquely designed to mimic molecular algorithmic capabilities for DNA devices. T\

The language uses logical statements and predicates (reminds me of Prolog). Indeed, the language was modeled after Prolog with equational and molecular extensions to represent DNA functionality. As with Prolog, output is a function of declarative, predicate logic rather than control flow and assignment in normal programming languages. Logic programming takes a different mind set and demands an understanding of formal logic.

The article talks about applications for DNA computing for in vitro (chemical/protien) manufacturing, diagnosis, and therapeutics (operating inside living cells) devices (cells).

DNA storage device

Next up, a recent article in Scientific Reports, Demonstration of end-to-end automation of DNA data storage.

The intent here is to create a fully automated data storage device that uses DNA as its recording media. The current device (seen in the bottom right above) is a lab prototype, that fits on a bench and costs $10K that can store 5 bytes of data with error correction.

The system has three hardware modules: synthesis (writing), storage and sequencing (reading). It also includes encoding and decoding software that translates bits to nucleic acid bases and adds error correction to it. They need to add more bases to be compatible with the sequencing (reading) process.

The limits to storage may have something to do with the size of the storage vessel as well as the size of the DNA string that can be synthesized/sequenced. . Error correction is based on a 6 base (bit) hashing code (less than a byte for 5 bytes). The systems write to read-back time is ~21 hrs.

The device creates many copies of the DNA (data) strand. The 5 byte (“HELLO”) string took 4 micrograms of liquid and yielded 3469 DNA strands, 1973 of which aligned properly to their adapter sequence. Of those properly aligned DNA strands, 30 had extractable payload regions of which 1 was correct, the other 29 were corrupted.

This is a very poor BER (bit error rate). For comparison LTO-7/8 has a BER of 1:10**19 bits, and enterprise disk has a BER of 1:10**15 bits. This DNA storage device has a BER of 3469:1 or ~99.9% of all bits written were lost.

To get a better understanding of the BER, they stored a 100 base (~12 byte) data payload. Of the 25,592 strands created, 286 aligned properly and of those 251 were corrupted, 11 had invalid hashes, and 8 were corrupted but correctable (valid hashes invalid data) and 16 were perfect reads. So 25592 strands had 24 proper reads ~1K:1 BER (not entirely correct because the correctable strands actually had bit errors but we can give them that).

DNA computer architecture

Last up, an IEEE Spectrum article, discussing CalTech Research, DNA computer shows programmable chemical machines are possible, reporting on an article in Nature, Diverse and robust molecular algorithms using reprogrammable DNA self-assembly (paywall). This DNA computer system is made of just DNA and salt water. It computes algorithms on 6 bits of input and uses DNA logic gates.

The Caltech team created 2 input-2-output boolean gates out of DAN sequences, five of these gates are connected to form a computation layer. It supports 6 input and 6 output bits. But you can layer multiple computational levels on top of one another where the output of one layer can be fed in as input to the layer on top of it.

One key, is that the DNA computer self assemblies the computational layer. They use a seed layer as a starter DNA strand and then the input (mixed inside a vial) is attached to this seed layer and then the computational layers are attached one by one until the output is generated.

Each computational layer is made up of DNA computational tiles that attach together sort of like a circuit. they were able to create a 355 instruction set for their DNA computer. In comparison the IBM 360 had a one byte op code (at most 256 instructions).

They have a compiler that allows researchers to write a software algorithm and this translates code into DNA circuit tiles, computational layers and ultimately into a DNA computer.

According to the article, it takes 1-2 hours to grow the computational DNA crystal and another day or so for the computation to complete.

An interesting approach to DNA computation but it’s unclear if they have any branching mechanisms in their “instruction set”. And 6 bit input/output seems a bit limiting. However, by creating boolean gates with DNA, they could recreate any type of electronic computer that exists today.


Put it all together and someday you could have a DNA compute server and storage.

One thing that’s missing is a (packet switched or token ring) network for transferring data between cells (and maybe into and out of DNA storage). They could probably use some sort of vascular (network) system with a way to transfer data from inside a cell to the network and into another cell .

That way they could gang a number of DNA compute servers (cells) together and maybe create a cellular automata machine.

The future of computation looks wetter now.

Photo Credit(s):

Better core allocation for congested web apps

Read an article in ScienceDaily (Achieving greater efficiency for fast datacenter operations) today that discussed some research done at MIT CSAIL to be presented next week at NSDI’19 discussing Shenango,  a new algorithm to allocate idle CPU cores to process latency sensitive transaction workloads. The paper is to be presented on February 27th. (I may update this with more details on Shenango after the paper is published)

t appears that for many web-scale applications, response time is driven mostly by tail latencies (slowest service determines web page response). For these 10K-100K server  environments, they have always had to over provision CPU cores to support reducing service tail latency. This has led to 100s to 1000s of cores, mostly sitting idle (but powered on) for much of the time.

here’s been some solutions that try to better use idle cores, but their core allocation responsiveness has been in the milliseconds. With 10-100s of threads that make up web service , allocating CPU resources in milliseconds was too slow

Arachne, a core aware thread scheduler

One approach to better core allocation uses Arachne: Core Aware Thread Management, out of Stanford.

With Arachne, threads are assigned to an application and each is given a priority. Arachne attempts to schedule them in priority order across an array of cores at its disposal.

Arachne’s Core Arbiter code is what assigns application threads to cores and runs under Linux at the user level. Some of its timings seem pretty fast. In the paper cited above, Arachne was able to schedule a thread to a core in under 300nsec.

Under Arachne, there are two sets of cores, managed and unmanaged cores and applications. Unmanaged cores run normal (non-Arachne, unmanaged) applications and threads. Managed cores or applications use Arachne to assign cores.

Arachne uses a Linux construct called cpusets, a collection of cores and memory banks, to allocate resources to run application threads. Cores and memory banks move between managed and unmanaged based on applications being run. Arachne assumes that managed apps have higher priority than unmanaged apps.

That is at the start of Arachne, all cores exist in the unmanaged set. The Core Arbiter executes here as well. As applications are scheduled to run, the Arbiter grabs cpusets from unmanaged applications or a free pool and assigns them to run application threads. When the application completes the cpusets are returned to the unmanaged pool.

Arachne allocates cores based on a priority scheme with 8 levels. Highest priority managed applications/threads get cpusets first, lower priority managed application threads next, and unmanaged applications last

There’s a set of APIs that applications must use to request and free cores when no longer in use. Arachne seems pretty general purpose, and as it operates with both normal (unmanaged) Linux applications as well as (Arachne) managed applications is appealing.

Shenango core allocation

Untitled by johnwilson1969 (cc) (from Flickr)
Untitled by johnwilson1969 (cc) (from Flickr)

Not much technical information on Shenango was available as we published this post, but their is some information in the MIT/ScienceDaily piece and some in the Arachne paper.

It appears as if Shenango detects applications suffering from high tail latency by interfacing with the network stack and seeing if packets have been waiting to be processed. It does this every 5 usecs and if a packet has been waiting since last time, it’s considered a candidate for more cores, has tail latency problems and is congested.

IIt seems to do the same for computational processes that have been waiting for some service response.  Shenango implements an IOKernel that handles core allocation to apps. Shenango IO

Shenango apps use an API to indicate when they are not processing time sensitive services and when they are. If they are not, their cores can be released to more time sensitive apps that are encountering congestion

Presumably Shenango does not execute at the user level. And it’s unclear whether it can operate with both (Linux) normal and Shanango managed applications.  And it also appears to be tied tightly to the network stack. Whether any of this matters to web-scale application users/developers is subject to debate. 

However, the fact that it  only alters core allocations when applications are congested seems a nice feature.


The Arachne paper said it “improved SLO MemCached by 37% and reduced tail latency by 10X” . The only metric available in the Shenango discussion was that they increased typical web-scale server CPU core allocation from 60% to 100%

f Shenango or Arachne can reduce over provisioning of CPU cores and memory, it could lead to significant energy and server savings. Especially for customers running 10K servers or more.

IT in space

Read an article last week about all the startup activity that’s taking place in space systems and infrastructure (see: As rocket companies proliferate … new tech emerges leading to a new space race). This is a consequence of cheap(er) launch systems from SpaceX, Blue Origin, Rocket Lab and others.

SpaceBelt, storage in space

One startup that caught my eye was SpaceBelt from Cloud Constellation Corporation, that’s planning to put PB (4X library of congress) of data storage in a constellation of LEO satellites.

The LEO storage pool will be populated by multiple nodes (satellites) with a set of geo-synchronous access points to the LEO storage pool. Customers use ground based secure terminals to talk with geosynchronous access satellites which communicate to the LEO storage nodes to access data.

Their main selling points appear to be data security and availability. The only way to access the data is through secured satellite downlinks/uplinks and then you only get to the geo-synchronous satellites. From there, those satellites access the LEO storage cloud directly. Customers can’t access the storage cloud without going through the geo-synchronous layer first and the secured terminals.

The problem with terrestrial data is that it is prone to security threats as well as natural disasters which take out a data center or a region. But with all your data residing in a space cloud, such concerns shouldn’t be a problem. (However, gaining access to your ground stations is a whole different story.

AWS and Lockheed-Martin supply new ground station service

The other company of interest is not a startup but a link up between Amazon and Lockheed Martin (see: Amazon-Lockheed Martin …) that supplies a new cloud based, satellite ground station as a service offering. The new service will use Lockheed Martin ground stations.

Currently, the service is limited to S-Band and attennas located in Denver, but plans are to expand to X-Band and locations throughout the world. The plan is to have ground stations located close to AWS data centers, so data center customers can have high speed, access to satellite data.

There are other startups in the ground station as a service space, but none with the resources of Amazon-Lockheed. All of this competition is just getting off the ground, but a few have been leasing idle ground station resources to customers. The AWS service already has a few big customers, like DigitalGlobe.

One thing we have learned, is that the appeal of cloud services is as much about the ecosystem that surrounds it, as the service offering itself. So having satellite ground stations as a service is good, but having these services, tied directly into other public cloud computing infrastructure, is much much better. Google, Microsoft, IBM are you listening?

Data centers in space

Why stop at storage? Wouldn’t it be better to support both storage and computation in space. That way access latencies wouldn’t be a concern. When terrestrial disasters occur, it’s not just data at risk. Ditto, for security threats.

Having whole data centers, would represent a whole new stratum of cloud computing. Also, now IT could implement space native applications.

If Microsoft can run a data center under the oceans, I see no reason they couldn’t do so in orbit. Especially when human flight returns to NASA/SpaceX. Just imagine admins and service techs as astronauts.

And yet, security and availability aren’t the only threats one has to deal with. What happens to the space cloud when war breaks out and satellite killers are set loose.

Yes, space infrastructure is not subject to terrestrial disasters or internet based security risks, but there are other problems besides those and war that exist such as solar storms and space debris clouds. .

In the end, it’s important to have multiple, non-overlapping risk profiles for your IT infrastructure. That is each IT deployment, may be subject to one set of risks but those sets are disjoint with another IT deployment option. IT in space, that is subject to solar storms, space debris, and satellite killers is a nice complement to terrestrial cloud data centers, subject to natural disasters, internet security risks, and other earth-based, man made disasters.

On the other hand, a large, solar storm like the 1859 one, could knock every data system on the world or in orbit, out. As for under the sea, it probably depends on how deep it was submerged!!

Photo Credit(s): Screen shots from SpaceBelt youtube video (c) SpaceBelt

Screens shot from AWS Ground Station as a Service sign up page (c) Amazon-Lockheed

Screen shots from Microsoft’s Under the sea news feature (c) Microsoft

Decoding deep learning

Read an article in Qaunta Magazine (A new approach to understanding how machines think) about what Google’s team has been doing to interpret deep learning models. They have created TCAV (Testing with Content Activation Vectors), a software tool that can be used to interrogate deep learning models to determine how sensitive they are to features of interest.

Essentially, TCAV provides a way to exercise a deep learning model using a select set of test data (that isolates a feature of interest) and to determine how sensitive the deep learning model is to that feature.

What TCAV can do for DL models

For example, in my experiments with deep learning, I’ve trained a model to predict popularity of a (RayOnStorage) blog post based on its title. But, I was also intending to do the same based on content attributes such as, text length, heading count, image count, link count, etc. In the end, I was hoping to come up with some idea of the popularity of a post based on these attributes. But in reality what I wanted to know was how each of these parameters (or features) impacted blog post (predicted and actual) popularity.

With TCAV, you essentially select training examples such as posts that have or show the parameter of interest (e.g. blog posts with a high number of images). Once you have your example set you use TCAV to feed in the samples to the model and it generates a number between 0 and 1, that tells you how sensitive the model is to the feature in the training set.

So in the example from my blog above, it might show that the blog popularity prediction DL model has a 0.2 sensitivity to the number of images in a post. In the example shown in the graphic the base model interprets images and TCAV is used to determine how important stripes are to interpreting an image has a zebra in it.

How TCAV actually works

The use of TCAV is a bit technical but essentially it feeds the example data set into the model as well as some random set of data without the feature of interest and isolates the model’s (neural net node) activation deltas between random data and example data.

TCAV uses a machine learning model to interrogate another machine learning model of the sensitivity to a characteristic feature vs a random feature set. The paper goes into much more detail than this if interested, but you train this new model to predict the sensitivity of the old model to the feature of interest. In the end, TCAV comes up with a single number determining that sensitivity


TCAV is available as an open source tool (see GitHub TCAV project page) and works with Google TensorFlow frameworks. TCAV was originally developed to work with image classification models but can work with other models as well.

If your running TensorFlow already, adding TCAV appears easy enough (checkout the readme page for the project for more info). On the TCAV project page, there’s a Jupyter notebook (Run TCAV in the GitHub directory) available that explains it in more detail.

Can’t wait to try it out on my blog popularity prediction model.


Photo Credit(s): From Neural Networks, Multiple Outputs from caesar harda (Flickr)

From the Testing with Content Activation Vectors paper

From The face of a robot with human-like features, Penn State

Scratch 3.0 is out

I’ve written on Scratch before (see my 10 years of Scratch and still counting post). It’s essentially an object oriented, visual programming language for kids. Nontheless, it is pretty sophisticated. The team at MIT just released Scratch 3.0, with a number of new extensions and updates to make it easier to work with.

Google also has a visual object oriented programming tool, called Blockly. I’ve used a variant of Blockly to program an Android phone based robot controller. It’s ok, but Blocky lacks a good collaboration mode and editing large Blockly code modules is not as easy as it should be.

On the other hand Scratch is made for collaboration. They have a web page with 1000s of collaborations listed. Seems like there’s a bit for everyone on the collaboration list.  And they have a a number of starter Scratch projects that anyone can tackle to earn coding cards that will gentling introduce you to scratch and coding.

Using Scratch

When I first ran across Scratch I used it to create sounds based on key combinations. Then I moved to animating sprites (drawn characters, which you can draw yourself or use one of many they have). Then I moved to animating planes, then groups of planes, then created a game where one plane would be followed by others. And then added a way where one plane could shoot another and so on.

It didn’t take me very long to get to a point where I had fleets of planes moving around the screen fighting each other. I haven’t done anything big with Scratch before but I’ve done a number of mini games/animations with my kids and it was fun to toy with.

Used to be you had to download and run Scratch locally on your PC/Mac. With later versions, they have Scratch Desktops that one can download for Windows and MacOS.

Alternatively, one could also use the web based version. In this way you can easily run it in any web browser.

The new desktop is more like a visual IDE than the old one I’m used to and looks exactly like the one on the web. The first Scratch I used presented itself in a table top screen with various Scratch tools surrounding this table top. I’m sure it makes things easier for beginning coders not to be presented with a Scratch world of tools right off the bat and just to have a sprite to play with. I suspect that all these tools are now buried in Scratch Tutorials

Scratch 3.0 comes with a number of extensions

One of the extensions allows you to program LEGO Robotics, another provides a way to interact with a blue tooth micro:bit controller, and another allows you to interact with your web cam to animate objects based on vision detection. There are plenty more and I’m sure this isn’t the end of them. (NB. Scratch team you need one for FIRST robotics) .

I just added a few for sounds and the text to speech extension. And it’s really easy to have Scratch 3.0 read out a text string for you. I suppose there would be a way for one to input a text file and have Scratch read it for you. But didn’t get that far with it.


I am a strong supporter of everyone learning how to code and solutions like MIT’s Scratch (and Google’s Blockly) are a great way to understand coding without having to deal with the pain/semantics of compilers, APIs or function libraries etc.

Just start coding and having fun. it’s amazing what one can accomplish. That’s what Scratch was made to do, enjoy.

Learning machine learning – part 3

Image of the cover of the book Deep Learning with Python

Decided to take the plunge and purchase the Deep Learning with Python book and see what it has to offer. In prior posts (see Learning machine learning – part 1 & part 2) we were working with the cloud tutorials. This Part one is based on the book

It has a great introduction into deep learning which is a subset of machine learning. After what I know today, the Microsoft Azure session was more on traditional (statistical) machine learning and not deep learning.


Installing deep learning

In order to use the book, you need access to Keras, Python, Jupyter and a Keras backend (TensorFlow, Microsoft CNTK or Theanno).

I decided not to use any cloud solutions and rather install Python, Jupiter, TensorFlow and Keras on my MacBook. Although it probably would have been much easier (and more costly) to use any cloud solution.

I followed the directions on the installing TensorFlow website for the PIP install (you have to install a “virtual environment” and “PIP” first). The MacBook didn’t have a NVIDIA GPU so I needed to install the CPU version of TensorFlow.

But I had the hardest time running any of the book examples. Whenever I changed any command cell in a Jupyter notebook with Keras functionality in them (like adding a space to the end of an “import Keras” command line), it would throw a (module not found) error.

After days of web searching for what path is used for Jupyter notebook-iPython/Python imports (sys.path and PYTHONPATH) and where I should be importing Keras from (it’s not “~/ .keras”), I got nowhere closer to running anything.

I finally saw that I could directly install Keras (again, when I installed Tensorflow, it installed Keras as well) into my VENV. After I did that, everything worked. (I probably have one too many Keras environments, but who cares).

Finally getting the environment correct, I could now execute any command cells in a Jupyter notebook (with Keras functionality properly, well most of them anyways).

Jupyter notebooks for dummies…

It took me a while to figure out that the way you run a Jupyter notebook server is by issuing the command “jupyter notebook” (nowhere in the command’s help file, but can be found in Jupyter tutorials). That’s when I started to see the problems in the installation section above with my Keras installation.

Understanding Jupyter notebooks is non-trivial. Yes, I know it’s an interactive code and documentation environment. It’s sort of like BASIC on steroids with WORD functionality built in/escapeable into at any time.

First thing to understand is that when you open up a jupyter notebook, you haven’t executed anything yet. YES there are output lines in the notebook you just opened but NO, they aren’t from executing them under your client-server environment.

The output lines you see in the notebook is output from someone else’s execution run. So while they may look like they worked fine but they haven’t executed in your installation environment yet..

Also, when executing Jupyter notebook command cells, pay special attention to the In [?]: that’s shown to the right of every command cell.

When the ‘?’ is a number, like In:[12] that tells you what sequence (12th in the sequence) that (multi-line command cell) has been executed in and when the ‘?’ a “*”, like In[*], it says that the Jupyter notebook server is executing that command cell. 

Some command cells generate Out [?]: lines and others do not. So can’t use this to tell if something’s been executed or not. The only way to tell if some command cell has been executed is by seeing the In [n]: integer as n be incremented from the last command cell you executed. Of course you can execute command cells out of sequence if you wish.

Jupyter notebook coding/executing was weird as one who is more used to C, Java, and other coding languages and IDEs. A video tutorial on Jupiter notebooks would probably have helped here, but I couldn’t find one.

Running the examples

You can download all of the books current examples from the book’s website.

The book suggests you add model layers, subtract model layers and change the parameters of the number of nodes in a model as examples for you to try at home.

In general, doing so (once the environment was setup properly) seemed to work as desired. Adding layers didn’t seem to change the accuracy of the models, if anything it degraded it, and deleting layers didn’t help either. Ditto for adding or reducing node counts within a layer.

There’s a bunch of datasets that comes with Keras install used in the examples. Many examples have a first step where you modify this data so as to be more amenable to deep learning modeling.

For example, there’s a IMDB dataset that has film reviews. The film reviews are text files. But deep learning doesn’t work on text strings so you need to convert the text files into lists of integers. You do this by looking up each word in a word dictionary and substituting the index for each word in the review, generating an array (list) of integers.

This is all done through the NumPy package. It’s worth the time to become familiar with Python and probably NumPy. I took the verbal Python tutorial (but did nothing to learn NumPy).

Another example is a real estate prediction model that has 13 different parameters across 500 or so neighborhoods. The parameters are all different, some are distances, some %s, some pricing differences, etc. In order to perform deep learning on them, the example normalizes all of them, using distance from mean, in units of standard deviation.

There are other examples of data transformations as well. It seems that transforming your data into something amenable to deep learning is one part of the magic of deep learning.

Back to the book

Getting through chapter 3 of the book i- fairly straightforward when everything is set up properly. I found a iPad app (Juno) that could be used to connect to the Jupyter Server and it seemed to work once I found the proper command to use to start Jupyter (jupyter notebook –ip=”*”) and the proper Jupyter configuration parameters to use.

The examples are pretty self-documenting so you should be able to try out any of them on your own. The book adds great explanations on machine learning, deep learning and and the overall flow of how to approach a deep learning project.

Once you finish chapter 4 of the book you have all the tools one needs to tackle any deep learning project that you want to attack. You may need to read up on how to transform your data and you will probably be using one of the modeling techniques in one of the examples but it seems easy enough.

The rest of the book’s chapters (which I have yet to complete) deal with deep learning in practice and it’s in these chapters that you can learn some of the art of deep learning data science and model science..


I ended up having fun with Jupyter notebooks, once I got them running with the iPad client in one hand and the book in the other. At the end of chapter 4, I startedto see some applications to my consulting business that might be interesting to model.

Using the Mac CPU was fast enough for the examples but I may have to tear down the crypto mine and use it as an AI server for my home network if I plan to tackle something with more data.

Wish me luck…