Magnonics for configurable electronics

Read an article today in ScienceDaily on [a] New way to write magnetic info … that discusses research done at Imperial College Of London that used a magnetic force microscope (small magnetic probe) to write magnetic fields onto a dense array of nanowires.

Frustrated metamaterials needed

The original research is written up in a Nature article Realization of ground state in artificial kagome spin ice via topological defect driven magnetic writing  (paywall). Unclear what that means but the paper abstract discusses geometrically frustrated magnetic metamaterials.  This is where the physical size or geometrical properties of the materials at the nanometer scale restricts or limits the magnetic states that material can exhibit.

Magnetic storage deals with magnetic material but there are a number of unique interactions of magnetic material when in close (nm) proximity to one another and the way nanowire geometrically frustrated magnetic metamaterials can be magnetized to different magnetic moments which can be exploited for other uses.  These interactions and magnetic moments can be combined to provide electronic circuitry and data storage.

I believe the research provides a proof point that such materials can be written, in close proximity to one another using a magnetic force microscope.

Why it’s important

The key is the potential to create  magnonic circuitry based on the pattern of moments writen into an array of nanowires. In doing so, one can fabricate any electrical circuit. It’s almost like photolithography but without fabs, chemicals, or laser scanners.

At first I thought this could be a denser storage device, but the potential is much greater if electronic circuitry could be constructed without having to fabricate semiconductors. It would seem ideal for testing out circuitry before manufacturing. And ultimately if it could be scaled up, the manufacture/fabrication of electronic circuitry itself could be done using these techniques.

Speed, endurance, write limits?

There was no information in the public article about the speed of writing the “frustrated magnetic metamaterials”. But an atomic force microscope can scan 150×150 micrometers in several minutes. If we assume that a typical chip size today is 150×150 mm, then this would take 1E6 times several minutes, or ~2K days. With multiple scanning force microscopes operating concurrently we could cut this down by a factor of 10 or 100 and maybe someday 1000. 2 days to write any electronic circuit on the order of todays 23nm devices with nanowires and magnetic force microscopes would be a significant advance

Also there was no mention of endurance, write limits or other characteristics we have learned to love with Flash storage. But the assumption is that it can be written multiple times and that the pattern stays around for some amount of time.

How magnetics generate electronic circuits

Neither Wikipedia page, the public article or the paywall articles’ abstract describes how Magnonics can supply electronic circuitry. However both the abstract and the public article discuss applications for this new technology in hardware based neural networks using arrays of densely packed nanowires.

Presumably, by writing different magnetic patterns in these nanowire metamaterials, such patterns can be used to simulate hardware connected neurons. This means that the magnetic information can be overwritten because it can be trained. Also, such magnetic circuits can be constructed to: a) can create different path for electrons to flow through the material; b) can restrict or enhance this electronic flow, and c) can integrate across a number of inputs and determine how electronic flow will proceed from a simulated neuron.

If magnonics can do all that,  it’s very similar to electronic gates today in CPU, GPUs and other electronic circuitry. Maybe it cannot simulate every gate or electronic device that’s found in todays CPUs but it’s a step in the right direction. And magnonics is relatively new. Silicon transistors are over 70 years old and the integrated circuit is almost 60 years old. So in time, magnonics could very well become the next generation of chip technology.

Writing speed is a problem. Maybe if they spun the nanowire array around the magnetic force microscope…


Photo Credits:  Real space observation of emergent magnetic monopoles … Nature article

Realization of ground state in artificial kagome spin ice via topological defect driven magnetic writing, Nature article


Scratch file use in HPC @ORNL, a statistical analysis

Attended SC17 (Supercomputing Conference) this past week and I received a copy of the accompanying research proceedings. There are a number of interesting papers in the research and I came across one, Scientific User Behavior and Data Sharing Trends in a Peta Scale File System by Seung-Hwan Lim, et al from Oak Ridge National Laboratory (ORNL) and the use of files at the Oak Ridge Leadership Computing Facility (OLCF) which was very interesting.

The paper statistically describes the use of a Scratch files in a multi PB file system (Lustre) at OLCF from January 2015 to August 2016. The OLCF supports over 32PB of storage, has a peak aggregate of over 1TB/s and Spider II (current Lustre file system) consists of 288 Lustre Object Storage Servers, all interconnected and connected to all the supercomputing cluster of  servers via an InfiniBand network. Spider II supports all scratch storage requirements for active/queued jobs for the Titan (#4 in Top 500 [super computer clusters worldwide] list) and other clusters at ORNL.

ORNL uses an HPSS (High Performance Storage System) archive for permanent storage but uses the Spider II file system for all scratch files generated and used during supercomputing applications.  ORNL is expecting Spider III (2018-2023) to host 10 billion files.

Scratch files are purged from Spider II after 90 days of no access.The paper is based on metadata analysis captured during scratch purging process for 500 days of access.

The paper displays a number of statistics and metrics on the use of Spider II:

  • Less than 3% of projects have a directory depth >15, the maximum directory depth was recorded at 432, with most projects having a shallow (<10) directory depth.
  • A project typically has 10X the files that a specific researcher has and a median file count/researcher is 2000 files with a median project having 20,000 files.
  • Storage system performance is actively managed by many projects. For instance, 20 out of 35 science domains manually managed their Lustre cluster configuration to improve throughput.
  • File count continues to grow and reached a peak of 1B files during the time being analyzed.
  • On average only 3% of files were accessed readonly, 10% of files updated (read-write) and 76% of files were untouched during a week period. However, median and maximum file age was 138 and 214 days respectively, which means that these scratch files can continue to be accessed over the course of 200+ days.

There was more information in the paper but one item missing is statistics on scratch file size distribution a concern.

Nonetheless, in paints an interesting picture of scratch file use in HPC application/supercluster environments today.


Crowdresearch, crowdsourced academic research

Read an article in Stanford Research, Crowdsourced research gives experience to global participants that discussed an activity in Stanford and other top tier research institutions to try to get global participation in academic research. The process is discussed more fully in a scientific paper (PDF here) by researchers from Stanford, MIT Media Lab, Cornell Tech and UC Santa Cruz.

They chose three projects:

  • A HCI (human computer interaction) project to design, engineer and build a new paid crowd sourcing marketplace (like Amazon’s Mechanical Turk).
  • A visual image recognition project to improve on current visual classification techniques/algorithms.
  • A data science project to design and build the world’s largest wisdom of the crowds experiment.

Why crowdsource academic research?

The intent of crowdsourced research is to provide top tier academic research experience to persons which have no access to top research organizations.

Participating universities obtain more technically diverse researchers, larger research teams, larger research projects, and a geographically dispersed research community.

Collaborators win valuable academic research experience, research community contacts, and potential authorship of research papers as well as potential recommendation letters (for future work or academic placement),

How does crowdresearch work?

It’s almost an open source and agile development applied to academic research. The work week starts with the principal investigator (PI) and research assistants (RAs) going over last week’s milestone deliveries to see which to pursue further next week. The crowdresearch uses a REDDIT like posting and up/down voting to determine which milestone deliverables are most important. The PI and RAs review this prioritized list to select a few to continue to investigate over the next week.

The PI holds an hour long video conference (using Google Hangouts On Air Youtube live stream service). On the conference call all collaborators can view the stream but only a select few are on camera. The PI and the researchers responsible for the important milestone research of the past week discuss their findings and the rest of the collaborators on the team can participate over Slack. The video conference is archived and available  to be watched offline.

At the end of the meeting, the PI identifies next weeks milestones and potentially directly responsible investigators (DRIs) to work on them.

The DRIs and other collaborators choose how to apportion the work for the next week and work commences. Collaboration can be fostered and monitored via Slack and if necessary, more Google live stream meetings.

If collaborators need help understanding some technology, technique, or too, the PI, RAs or DRIs can provide a mini video course on the topic or can point to other information used to get the researchers up to speed. Collaborators can ask questions and receive answers through Slack.

When it’s time to write the paper, they used Google Docs with change tracking to manage the writing process.

The team also maintained a Wiki on the overall project to help new and current members get up to speed on what’s going on. The Wiki would also list the week’s milestones, video archives, project history/information, milestone deliverables, etc.

At the end of the week, researchers and DRIs would supply a mini post to describe their work and link to their milestone deliverables so that everyone could review their results.

Who gets credit for crowdresearch?

Each week, everyone on the project is allocated 100 credits and apportions these credits to other participants the weeks activities. The credits are  used to drive a page-rank credit assignment algorithm to determine an aggregate credit score for each researcher on the project.

Check out the paper linked above for more information on the credit algorithm. They tried to defeat (credit) link rings and other obvious approaches to stealing credit.

At the end of the project, the PI, DRIs and RAs determine a credit clip level for paper authorship. Paper authors are listed in credit order and the remaining, non-author collaborators are listed in an acknowledgements section of the paper.

The PIs can also use the credit level to determine how much of a recommendation letter to provide for researchers

Tools for crowdresearch

The tools needed to collaborate on crowdresearch are cheap and readily available to anyone.

  • Google Docs, Hangouts, Gmail are all freely available, although you may need to purchase more Drive space to host the work on the project.
  • Wiki software is freely available as well from multiple sources including Wikipedia (MediaWiki).
  • Slack is readily available for a low cost, but other open source alternatives exist, if that’s a problem.
  • Github code repository is also readily available for a reasonable cost but  there may be alternatives that use Google Drive storage for the repo.
  • Web hosting is needed to host the online Wiki, media and other assets.

Initial projects were chosen in computer science, so outside of the above tools, they could depend on open source. Other projects will need to consider how much experimental apparatus, how to fund these apparatus purchases, and how a global researchers can best make use of these.

My crowdresearch projects

Some potential commercial crowdresearch projects where we could use aggregate credit score and perhaps other measures of participation to apportion revenue, if any.

  • NVMe storage system using a light weight storage server supporting NVMe over fabric access to hybrid NVMe SSD – capacity disk storage.
  • Proof of Stake (PoS) Ethereum pooling software using Linux servers to create a pool for PoS ETH mining.
  • Bipedal, dual armed, dual handed, five-fingered assisted care robot to supply assistance and care to elders and disabled people throughout the world.

Non-commercial projects, where we would use aggregate credit score to apportion attribution and any potential remuneration.

  • A fully (100%?) mechanical rover able to survive, rove around, perform  scientific analysis, receive/transmit data and possibly, effect repairs from within extreme environments such as the surface of Venus, Jupiter and Chernoble/Fukishima Daiichi reactor cores.
  • Zero propellent interplanetary tug able to rapidly transport rovers, satellites, probes, etc. to any place within the solar system and deploy theme properly.
  • A Venusian manned base habitat including the design, build process and ongoing support for the initial habitat and any expansion over time, such that the habitat can last 25 years.

Any collaborators across the world, interested in collaborating on any of these projects, do let me know, here via comments. Please supply some way to contact you and any skills you’re interested in developing or already have that can help the project(s).

I would be glad to take on PI role for the most popular project(s), if I get sufficient response (no idea what this would be). And  I’d be happy to purchase the Drive, GitHub, Slack and web hosting accounts needed to startup and continue to fruition the most popular project(s). And if there’s any, more domain experienced PIs interested in taking any of these projects do let me know.  


Picture Credit(s): Crowd by Espen Sundve;

Videoblogger Video Conference by Markus Sandy;

Researchers Night 2014 by Department of Computer Science, NTNU;

A steampunk Venusian rover

Read an article last week in theEngineer on “Designing a mechanical rover to explore … Venus“, on a group at JPL, led by Jonathon Sauder who are working on a mechanical rover to study Venus.

Venus has a temperature of ~470c, hot enough to melt lead, which will fry most electronics in seconds. Moreover, the Venusian surface is under a lot of pressure, roughly equivalent to a mile under water or ~160X the air pressure at Earth’s surface (from NASA Venus in depth). Extreme conditions for any rover.

Going mobile

Sauder and his team were brainstorming mechanical rovers, that operated similar to Theo Jansen’s StrandBeest which walks using wind energy alone. (Checkout the video of the BEEST walking).

Jansen had told Sauder’s team that his devices work much better on smooth surfaces and that uneven, beach like surfaces presented problems.

So, Sauder’s team started looking at using something with tracks instead of legs/feet, sort of like a World War 1 tank. That could operate upside down as well as rightside up.

Rather than sails (as the StrandBeest), they plan to use multiple vertical axis wind turbines, called Sarvonius rotors, located inside the tank to create energy and store that energy in springs for future use.

Getting data

They’re not planning to ditch electronics all together but need to minimize the rovers reliance on electronics.

There are some electronics that can operate at 450C based on silicon carbide and gallium carbide which have a very low level of integration at this time, just a 100 transistors per chip.  And they could use this to add electronic processing and control to their mechanical rover.

Solar panels can supply electricity to the high temperature electronics and can operate at 450C.

But to get information off the rover and back to the Earth, they plan to use a highly radio reflective spot on the rover and a mechanical shutter mechanism. The mechanism can be closed and opened and together with an orbiting satellite generating radio pulses and recording the rover’s reflectivity or not, send Morse code from rover to satellite. The orbiting satellite could record this information and then transmit it to Earth.

The rover will make use of simple chemical reactions to measure soil, rock and atmospheric chemistry. Soil and rocks suitable for analysis can be scooped up, drilled out and moved to the analysis chamber(s) via mechanical devices. Wind speed and direction can be sensed with simple mechanical devices.

In order to avoid obstacles wihile roving around the planet, they  plan to use a mechanical probe out othe front (and back?) of the rover with control systems attached to this to avoid obstacles. This way the rover can move around more of the planets surface.

Such a mechanical rover with high temperature electronics might also be suitable for other worlds in the solar system, Mercury for sure but moons of the Jovian planets, also have extreme pressure environments.

And such a electrical-mechanical rover also might work great to probe volcano’s on earth, although the temperatures are 700 to 1200C, ~2 to 3X Venus. Maybe such a rover could be used in highly radioactive environments to record information and send this back to personnel outside the environment or even effect some preprogrammed repairs. Ocean vents could also be another potential place where such a rover might work well.

Possible improvements

Mechanical probes would need to be moved vertically and swing horizontally to be effective and would necessarily have to poke outside the tanks envelope to read obstacles ahead.

Sonar could work better. Sounds or clicks could be produced mechanically and their reflections could be also received mechanically (a mic is just a mechanical transducer). At the pressures on Venus, sound should travel far.

Morse code was designed to efficiently send alpha-numerics and not much else. It would seem that another codec could be designed to send scientific information faster. And if one mechanical spot is good, multiple spots would be better assuming the satellite could detect multiple radio reflective spots located in close proximity to one another on the rover.

Radio works but why not use infrared. If there were some way to read an infrared signal from the probe, it could present more information per pass.

For instance, an infrared photo of the rover’s bottom or top, using with a flat surface, could encode information in cold and hot spots located across that surface.

This could work at whatever infrared resolution available from the satellite orbiting overhead and would send much more information per orbital pass.

In fact, such an infrared surface readout might allow the rover to send B&W pictures up to the satellite. Sonar could provide a mechanism to record a (sound) picture of the environment being scanned. The infrared information could be encoded across the surface via pipes of cool and hot liquids, sort of like core memory of old.

What about steam power. With 450C there ought to be more than enough heat to boil some liquid and have it cool via expansion. Having cool liquid could be used to cool electronics, chemical and solar devices.  And as the high temperatures on Venus seem constant, steam power and liquid cooling would be available all the time and eliminating any need for springs to hold energy.

And the cooling liquid from steam engines could be used to support an infrared signaling mechanism.

Still not sure why we need any electronics. A suitably configured, shrunken, analytical engine could provide the rudimentary information processing necessary to work the shutter or other transmitter mechanisms, initiate, readout and store mechanical/chemical/sonar sensors and control the other items on the rover.

And with a suitably complex analytical engine there might be some way to mechanically program it with various modes using something like punched tape or cards. Such a device could be used to hold and load information for separate programs in minimal space and could also be used to store information for later transmission, supplying a 100% mechanical storage device.

Going 100% mechanical could also lead to a potentially longer lived rover than something using some electronics and mostly mechanical devices on a planet like Venus. Mechanical devices can fail, but their failure modes are normally less catastrophic, well understood. Perhaps with sufficient mechanical redundancy and concern for tribology, such a 100% mechanical rover could last an awful long time, without any maintenance, e.g., like swiss watches.


Photo Credit(s): World War One tank – mark 1 by Photos of the Past

Vintage Philmor morse code practice … by Joe Haupt

Accompanied by an instructor… by vy pham;

Core memory more detail by Kenneth Moore;

Model of the Analytical Engine By Bruno Barral (ByB), CC BY-SA 2.5;

Punched tape by Rositslav Lisovy

Steam locomotives by Jim Phillips

A tale of two storage companies – NetApp and Vantara (HDS-Insight Grp-Pentaho)

It was the worst of times. The industry changes had been gathering for a decade almost and by this time were starting to hurt.

The cloud was taking over all new business and some of the old. Flash’s performance was making high performance easy and reducing storage requirements commensurately. Software defined was displacing low and midrange storage, which was fine for margins but injurious to revenues.

Both companies had user events in Vegas the last month, NetApp Insight 2017 last week and Hitachi NEXT2017 conference two weeks ago.

As both companies respond to industry trends, they provide an interesting comparison to watch companies in transition.

Company role

  • NetApp’s underlying theme is to change the world with data and they want to change to help companies do this.
  • Vantara’s philosophy is data and processing is ultimately moving into the Internet of things (IoT) and they want to be wherever the data takes them.

Hitachi Vantara is a brand new company that combines Hitachi Data Systems, Hitachi Insight Group and Pentaho (an analytics acquisition) into one organization to go after the IoT market. Pentaho will continue as a separate brand/subsidiary, but HDS and Insight Group cease to exist as separate companies/subsidiaries and are now inside Vantara.

NetApp sees transitions occurring in the way IT conducts business but ultimately, a continuing and ongoing role for IT. NetApp’s ultimate role is as a data service provider to IT.

Customer problem

  • Vantara believes the main customer issue is the need to digitize the business. Because competition is emerging everywhere, the only way for a company to succeed against this interminable onslaught is to digitize everything. That is digitize your manufacturing/service production, sales, marketing, maintenance, any and all customer touch points, across your whole value chain and do it as rapidly as possible. If you don’t your competition will.
  • NetApp sees customers today have three potential concerns: 1) how to modernize current infrastructure; 2) how to take advantage of (hybrid) cloud; and 3) how to build out the next generation data center. Modernization is needed to free capital and expense from traditional IT for use in Hybrid cloud and next generation data centers. Most organizations have all three going on concurrently.

Vantara sees the threat of startups, regional operators and more advanced digitized competitors as existential for today’s companies. The only way to keep your business alive under these onslaughts is to optimize your value delivery. And to do that, you have to digitize every step in that path.

NetApp views the threat to IT as originating from LoB/shadow IT originating applications born and grown in the cloud or other groups creating next gen applications using capabilities outside of IT.

Product direction

  • NetApp is looking mostly towards the cloud. At their conference they announced a new Azure NFS service powered by NetApp. They already had Cloud ONTAP and NPS, both current cloud offerings, a software defined storage in the cloud and a co-lo hardware offering directly attached to public cloud (Azure & AWS), respectively.
  • Vantara is looking towards IoT. At their conference they announced Lumada 2.0, an Industrial IoT (IIoT) product framework using plenty of Hitachi software functionality and intended to bring data and analytics under one software umbrella.

NetApp is following a path laid down years past when they devised the data fabric. Now, they are integrating and implementing data fabric across their whole product line. With the ultimate goal that wherever your data goes, the data fabric will be there to help you with it.

Vantara is broadening their focus, from IT products and solutions to IoT. It’s not so much an abandoning present day IT, as looking forward to the day where present day IT is just one cog in an ever expanding, completely integrated digital entity which the new organization becomes.

They both had other announcements, NetApp announced ONTAP 9.3, Active IQ (AI applied to predictive service) and FlexPod SF ([H]CI with SolidFire storage) and Vantara announced a new IoT turnkey appliance running Lumada and a smart data center (IoT) solution.

Who’s right?

They both are.

Digitization is the future, the sooner organizations realize and embrace this, the better for their long term health. Digitization will happen with or without organizations and when it does, it will result in a significant re-ordering of today’s competitive landscape. IoT is one component of organizational digitization, specifically outside of IT data centers, but using IT resources.

In the mean time, IT must become more effective and efficient. This means it has to modernize to free up resources to support (hybrid) cloud applications and supply the infrastructure needed for next gen applications.

One could argue that Vantara is positioning themselves for the long term and NetApp is positioning themselves for the short term. But that denies the possibility that IT will have a role in digitization. In the end both are correct and both can succeed if they deliver on their promise.



Two paths to better software

Read an article last week in the Atlantic, The coming software apocalypse, about some of the problems in how we develop software today.

Most software development today is editing text files. Some of these text files have 1,000s of lines and are connected to other text files with 1,000s of more lines which are connected to other text files with 1,000s of lines, etc. Pretty soon you have millions of lines of code all interacting with one another.

The problem

Been there done that and it’s not pretty. We even spent some time trying to reduce the code bloat by macro-izing some of it, and that just made it harder to understand, but reduced the lines of code.

The problem is much worse now where . we have software everywhere you look, from the escalator-elevator you take up and down between floors, to the cars you drive around town, to the trains and airplanes you travel between cities.

All of these literally have millions of lines of code controlling them and are many more each year. How can they all possibly be correct.

Well you can test the s&*t out of them. But you can’t cover every path in a lifetime or ten of testing a million line program. And even if you could, changing a single line would generate another 100K or more paths to test. So testing was never a true answer.

Two solutions

The article talks about two approaches that have some merit to solve the real problem.

  • Model based development, a new development and coding environment. In this approach your not so much coding as playing with a model of the behavior your looking for. Say you were coding robot control logic, rather than editing 1000s of lines of Java text, you work with a model of your robot and its environment on 1/2 a screen and on the other half, model parameters (dials, sliders, arrow keys, etc) and logic (sequences) that you  manipulate to do what the robot needs to do. Sort of like Scratch on steroids (see my post on 10 years of Scratch) with the sprite being whatever you need to code for be it a jet engine, automobile, elevator, whatever. The playground would be a realtime/real life simulation of the entity under control of the code and you would code by setting parameters  and defining sequences. But the feedback would be immediate!
  • TLA+ a formal design verification approach. Formal methods have been around since the early 70s. They are used to rigorously specify a design of  some code or a whole system. The idea is that if you can specify a  provably correct design, then the code (derived from that design) has the potential to be more correct. Yes there’s still the translation from code to design that’s error prone but the likelihood is that these errors will be smaller in scope than having a design that wrong.

Model based  development

One can find model based development already in the Apple new application development language, Swift, ANSYS SCADE suite based on Esterel Technologies, and Light Table software development environment.

I have never used any of them but they all look interesting. Esterel was developed for safety critical, real-time aerospace applications. Light Table was a kickstarter project started by a leading engineer of Microsoft’s Visual Studio, the leading IDE. Apple Swift was developed to make it much easier to develop IOS apps.


TLA+ takes a bit getting used to. All formal methods depend on advanced mathematics and sophisticated logic and requires an adequate understanding of these in order to use properly. TLA+ was developed by Leslie Lamport and stands for temporal logic of actions.

TLA+ specifications identify the set of all correct system actions. I would call it a formal pseudo code.

There’s apparently a video course , a hyperbook and a book on the language It’s being used in AWS and Microsoft XBOX and Azure. (See the wikipedia TLA+ article for more information).

There’s PlusCal algorithm (specification) language which is translated into a TLA+ specification which can then be checked by the automated TLC model checker.  There’s also an automated TLAPS, a TLA+ proof system although it doesn’t support all of the TLA+ primitives.  There’s a whole TLA+ toolbox that has these and other tools that can make TLA+ easier to use.


We dabbled in formal specifications methods for on our million+ line storage system at a former employer. It worked well and cleaned up a integrity critical area of the product. Alas, we didn’t expand it’s use to other areas of the product and it sort of fell out of favor. But it worked when and where we applied it.

Of course this was before automated formal methods of today, but even manual methods of specification precision can be helpful to think out what a design has to do to be correct.

I have no doubt that both TLA+ formal methods and model based development approaches and more are required to truly vanquish the coming software apocalypse.

At least until artificial intelligence starts developing all our code for us.


Photo Credits: Six easy pieces of quantitatively analyzing open source, SAP Research;

Spaghetti code still existed,;

How to write apps with Swift, MacWorld;

Modeling the dining philosophers problem in TLA+, Metadata blog


Compressing information through the information bottleneck during deep learning

Read an article in Quanta Magazine (New theory cracks open the black box of deep learning) about a talk (see 18: Information Theory of Deep Learning, YouTube video) done a month or so ago given by Professor Naftali (Tali) Tishby on his theory that all deep learning convolutional neural networks (CNN) exhibit an “information bottleneck” during deep learning. This information bottleneck results in compressing the information present, in for example, an image and only working with the relevant information.

The Professor and his researchers used a simple AI problem (like recognizing a dog) and trained a deep learning CNN to perform this task. At the start of the training process the CNN nodes at the top were all connected to the next layer, and those were all connected to the next layer and so on until you got to the output layer.

Essentially, the researchers found that during the deep learning process, the CNN went from recognizing all features of an image to over time just recognizing (processing?) only the relevant features of an image when successfully trained.

Limits of deep learning CNNs

In his talk the Professor identifies two modes of operations of a deep learning CNN: the encoder layers and decoder layers. The encoder function identifies relevant information in the input and the decoder function takes this relevant information and maps this to an output.

This view results in two statistics that can characterize any deep learning CNN:

  • Sample complexity which refers to the the mutual information inside the last hidden layer of the encoder function, and
  • Accuracy or generalization error, which refers to the mutual information inside the last hidden layer of the decoder function.

Where mutual information is defined as how much of the uncertainty of an input is removed when you have an output that is based on that input. (See the talk for a more formal explanation).

The professor states that any complex deep learning CNN can be characterized by these two statistics where sample complexity determines the number of samples required and accuracy determines the precision by which the deep learning CNN can properly interpret those samples. The deep black line in the chart represents the limits of accuracy achievable at some number of training events, with some number of hidden layers and some sample set.

What happens during deep learning

Moreover, the professor shows an interesting characteristic of all CNNs is that they converge over time in accuracy and that convergence differs based mostly on the number of layers, sample size and training count used.

In the chart, the top row show 3 CNNs with different amounts of training data (5%, 40% and 80% of total). The chart shows the end result and trace of learning within the CNN over the same number of epochs (training cycles). More training data generates more accurate results.

The Professor views those epochs after the farthest right traces (where the trace essentially starts moving up and to the left in the chart), the compression phase of deep learning.

Statistics of deep learning process

The professor goes on to characterize the deep learning  process by calculating the mean and variance of each layers connection weights.

In the chart he shows an standard “eiffel tower” neural network, with 6 hidden layers, each with less neurons (nodes)  than the previous layer (12 nodes, 10 nodes, 7 nodes, etc.). And what he plots is the average weights and variance between layers (red lines are average and variance of the weights for arcs[connections] between nodes in layer 1 to nodes in layer 2, blue lines the mean and variance of weights for arcs between layer 2 and 3, purple lines the mean and variance of weights for arcs between layer 3 and 4, etc.).

He shows that at the start of training the (randomly assigned) weights for each layer have a normalized mean which is higher than its normalized variance. He calls this phase as high signal to noise (I would say the opposite, its low signal to noise, more noise than signal). But as training proceeds (over more epochs), there comes a point where the layer mean drops below its variance and the signal to noise ratio changes dramatically. After that point the mean weights and variance of the group of layers start to diverge or move apart.

The phase (epochs) after the line where the weights means are lower than its variance, he calls the Compression phase of the deep layer CNN training.

The Professor suggests that every complex deep learning CNN looks the same during training if you perform the calculations. The professor shows charts like this for other deep learning CNNs used on different problems and they all exhibit some point where their means are lower than their weights after which means and variances between layers starts to differentiate.

Do layer counts and sample size matter?

It turns out that the more hidden layers you have, the sooner (less training) you need to begin the compression phase. This chart shows the same problem, with different hidden layer counts. One can see in the traces, that not only is accuracy improved with more layers but it also more quickly reaches the compression phase.

Using his sample complexity and accuracy statistics, the Professor has also shown that their are limits to the amount of accuracy to any deep learning CNN based on the function of layer counts, sample size and training event counts.


As far as I know, The Professor and his team are the first to try to characterize and understand what happens during deep learning. In doing so, he has shown that the number of layers and the number of samples can be used to predict the speed of learning. And ultimately how accurate any deep learning CNN can be.


VMworld2017’s forecast, cloudy with a high chance of containers

Attended VMworld2017 this past week in Vegas and aside from all the parties there was a lot of news, mostly for public cloud users.

In talking with analysts and others at the show it seems like VMware has recently discovered that they can’t fight the cloud, so they better join them. Early this year VMware divested itself of its vCloud Air Business to OVH, which removed their owned competition to the cloud. Now, VMware’s on a different tack, figuring out how to best work with today’s public cloud providers and implementing this.

Last year VMware announced an agreement with IBM and to supply vCloud Air services on IBM’s SoftLayer public cloud. This year, VMware ramps up other public cloud offerings with VMware Cloud on AWS and PKS (Pivotal Container Services) on vSphere.

First up, VMware on the (AWS) cloud

You may recall that earlier this year VMware showed a tech preview of vSphere running in AWS. At VMworld2017 they took off the wraps on this service and made it real. At first it’s only available in AWS US WEST region but they plan to roll it out to the rest of US soon and rest of the world after that.

VMware Cloud on AWS is vSphere, vCenter, NSX, and vSAN running ontop of AWS Elastic cloud services. Essentially, any VM that you run onprem, can be run on AWS, using VMware Cloud on AWS.

The AWS EC2 machines you run VMware on are BIG – 2 CPU, 36 cores (72 hyper threads) with 512GiB of memory and a local (SSD) cache of 3.6TB/10.7TB raw capacity. VMware Cloud on AWS requires four EC2 instances to run. No information about the networking capabilities but I assume HIGH SPEED.

The cost for the service is high but you are paying for 7x24x365 AWS EC2 services. For a 3 year “reservation”, it will cost $109.4K/host. That comes out to be about $3K/month/host for 36 months. VMware claims that on a 3 year TCO basis this would be cheaper than running an equivalent configuration onprem.

You can also contract for VMware Cloud on AWS on an hourly basis. You do have to have a VMware login and VMware credits (?) to do so. It’s certainly not as simple as just having a credit card and an AWS login. But the costs for this are $8.361/hour/host. This seems awfully high but there’s no direct comparison to other EC2 machine configurations. Although there is an EC2 X1.16 with 64 vCPUs (hyper thread equivalents), 976GiB DRAM and 1-1920 (GiB) SSD that lists for $6.669/hour – close, but not a complete match.

You are running a VMware service on AWS so the billing is done through VMware. And any data you move in or out of the cloud will be billed (through VMware) at whatever AWS would charge for the data egress/import.

It seems that if you “connect” your VMware Cloud on AWS to your onprem   vSphere cluster (through stretched layer 2 NSX networking and ? other means) you can vMotion VMs from onprem to AWS and back again. There is a behind the scenes Storage vMotion that also happens to get the data to AWS so that the VMs can operate properly.

VMware vCenter offers a dashboard of sorts to tell admins whether a particular VM is a good candidate to move to AWS or not. This is based on the VM’s connections to other VMs and maybe the amount of data that would need to moved.

Next, (PKS) containers and more (GCP) cloud

VMware together with Pivotal and Google Cloud announced a tech preview of the Pivotal Container Service (PKS) on vSphere. The new service implements Pivotal Kubo, or Kubernetes container orchestration with Bosh HA infrastructure management ontop of vSphere. PKS also comes with Harbor a secure, enterprise class container registry from VMware

This would allow a development team to develop a container micro-services application, completely within a VMware environment and to run it under vSphere. This seems tailor made to cloud developers.

Kubernetes has worker and master nodes and each which would run as a VM on vSphere. Inside worker nodes, Kubernetes runs Pods which have one or more tightly connected container(s) which enclose an application and share context.

I was talking with the vSphere team and they had been spending a lot of time making vSphere native services available to PKS. This means that you can use NSX networking and vSAN, VVOLs or VMDK storage for your container (persistent) storage.

Not exactly sure where DevOps fits into PKS on vSphere but my assumption is that you could run PuppetChef or if your up to the challenge, vRA to automate application roll out.

There was specific talk of having PKS run on AWS, probably within VMware Cloud on AWS in the future.

Of course, PKS containers that run on vSphere are completely compatible with GKE (Google Container Engine) which runs on Google Cloud Platform

No information on VMware PKS pricing as of yet.

Where lies Photon and VIC (VMware Integrated Containers)

You may recall that VMware announced Photon last year which was a open source container framework and Photon OS which was an OS for Photon containers. This still exists as an open source project and is still being developed but there was nary a word about Photon this year.

VIC still exists. VIC can support running a container as a VM but is not a real container orchestration engine. Yes you could potentially run Docker Swarm as VM or a number of containers as separate VMs under VI, but this is not the same as having a fully integrated container orchestration and management service layer in vSphere. That’s where PKS fits in.


Although timelines weren’t discussed there were a number of discussions that led me to believe that VMware on AWS would be rolled out to other public cloud provider (read Azure and GCP). And how long it would take to be rolled out to other AWS regions around the world was not discussed.  VMware Cloud would really make sense to run on GCP, but Azure might be a bit of a stretch.

Similarly, PKS seems already heading for VMware Cloud on AWS and is already available in native form as GKE on GCP. But Azure already has a native Kubernetes Container Service. And there was no discussion as to whether PKS would be made available on IBM Softlayer or OVH vCloud Air.

Stay tuned more to come as VMware finds its true path to the cloud.