Weight Agnostic Neural Networks (WANNs)

Read an article the other day (Neural Networks Can Drive Without [weight] Learning) about a new form of deep learning neural network (NN) that is not dependent on the weights assigned to network nodes. The new NN is called WANN (Weight Agnostic NN). There’s also a scientific paper (on Github, Weight Agnostic Neural Networks) that describes WANNs in more detail.

How WANNs differ from normal NN

If I understand them properly, WANNs are trained, but instead of assigning weights during training, WANN networks architectures (nodes and connections) are modified and optimized to perform well against the training data.

Indeed, most NN start out with assigning random weights to all network nodes and then these weights are adjusted through the training cycle, until the NN performs well on the training data. But NN such as these, have a structure (# nodes/layer, # layers, connectivity type, etc.) defined by the researcher, that is stable and unchanging during a training-validation cycle. If the NN model is not accurate enough, the researcher has two choices, find better data or change the model’s structure. WANNs start and end with changing the model’s structure.

With WANNs they start out with a set of NN architectures (#nodes/layer, #layers, connection types, etc). Each NN architecture is evaluated against the training data with a single shared randomized weight. That shared weight is altered (randomly) for a training pass and the model evaluated for accuracy.

At the end of a WANN training pass you have a set of evaluation metrics for each model structure. The resultant WANNs are then ordered by performance and complexity. The highest performing networks are then used to create a new population (set) of WANN architecture to be tested and the process iterates from there. This would presumably continue until you have reached a plateau of accuracy statistics across a number of shared randomized weights. And this would be the WANN model used for the application

Why WANN?

For a normal NN, each node weight would be adjusted automatically and independently at the end of each training batch. There would, of course, be a large number of batches, causing each weight in the NN nodes to be altered (via floating point arithmetic). So the math would be floating point arithmetic*#nodes*#layers*# of training batches (* # training passes (or epochs).

WANNs avoid this inner loop math altogether. Instead they would need to test a model on a number of shared random weights. This would presumably be done after a complete training pass (each epoch). And even if you had the same number of WANN models as nodes in a normal NN, the computations would be much less. Something on the order of #models * # epochs (each training pass [or epoch] could conceivable test a different shared random weight).

Another advantage of WANNs is that they result in simpler, less complex NN models (# nodes, # layers, # of connections, etc.) than normal DL NNs. Simpler NN models could be very useful for IoT applications, where computational power and storage is limited.

The main disadvantage of WANNs is that they aren’t as accurate as normally (weight adjusted) NNs. However, once you have a WANN, you can always elect to re-train it in the normal fashion by adjusting weights to gain more accuracy. And doing so would likely be much closer to a more complex NN model that was trained from the start by altering weights.

WANNs are more like nature

Human and other mammal (probably avian, aquatic, etc as well) seem to be born with certain innate abilities, visual, perceptive, mobility and with certain habits such as nursing, facial mimicking, hunger-feeding, etc. Presumably these innate abilities and habits are hardwired neuron networks that don’t depend on envirnonmental learning. Something that they are all born with.

Concievably WANNs could be consider similar to these hardwired (unlearned) neuron networks. WANNs could be used in a similar fashion to embed certain innate habits and abilities into robots or other automation that could be further trained with their interactions with their environment

““`

The Github paper has an online WANN model widget with a slider where you can alter a shared random weight and see its impact on the operation of a the widget. Playing with this, the only weight that seems to have a significant impact on the actions of the widget is zero…

Photo Credit(s): “Neural Connections In the Human Brain” by Image Editor is licensed under CC BY-NC-ND 2.0 

Internet of Tires

Read an article a couple of weeks back (An internet of tires?… IEEE Spectrum) and can’t seem to get it out of my head. Pirelli, a European tire manufacturer was demonstrating a smart tire or as they call it, their new Cyber Tyre.

The Cyber Tyre includes accelerometer(s) in its rubber, that can be used to sense the pavement/road surface conditions. Cyber Tyre can communicate surface conditions to the car and using the car’s 5G, to other cars (of same make) to tell them of problems with surface adhesion (hydroplaning, ice, other traction issues).

Presumably the accelerometers in the Cyber Tyre measure acceleration changes of individual tires as they rotate. Any rapid acceleration change, could potentially be used to determine whether the car has lost traction due and why.

They tested the new tires out at a (1/3rd mile) test track on top of a Fiat factory, using Audi A8 automobiles and 5G. Unclear why this had to wait for 5G but it’s possible that using 5G, the Cyber Tyre and the car could possibly log and transmit such information back to the manufacturer of the car or tire.

Accelerometers have become dirt cheap over the last decade as smart phones have taken off. So, it was only a matter of time before they found use in new and interesting applications and the Cyber Tyre is just the latest.

Internet of Vehicles

Presumably the car, with Cyber Tyres on it, communicates road hazard information to other cars using 5G and vehicle to vehicle (V2V) communication protocols or perhaps to municipal or state authorities. This way highway signage could display hazardous conditions ahead.

Audi has a website devoted to Car to X communications which has embedded certain Audi vehicles (A4, A5 & Q7), with cellular communications, cameras and other sensors used to identify (recognize) signage, hazards, and other information and communicate this data to other Audi vehicles. This way owning an Audi, would plug you into this information flow.

Pirelli’s Cyber Car Concept

Prior to the Cyber Tyre, Pirelli introduced a Cyber Car concept that is supposedly rolling out this year. This version has tyres with real time pressure, temperature, (static) vertical load and a Tyre ID. Pirelli has been working with car manufacturers to roll out Cyber Car functionality.

The Tyre ID seems to be a file that can include anything that the tyre or automobile manufacturer wants. It sort of reminds me of a blockchain data blocks that could be used to validate tyre manufacturing provenance.

The vertical load sensor seems more important to car and tire manufacturers than consumers. But for electrical car owners, knowing car weight could help determine current battery load and thereby more precisely know how much charge is left in a battery.

Pirelli uses a proprietary algorithm to determine tread wear. This makes use of the other tyre sensors to predict wear and perhaps uses an AI DL algorithm to do this.

~~~

ABS has been around for decades now and tire pressure sensors for over 10 years or so. My latest car has enough sensors to pretty much drive itself on the highway but not quite park itself as of yet. So it was only a matter of time before something like smart tires would show up.

But given their integration with car electronics systems, it would seem that this would only make sense for new cars that included a full set of Cyber Tyres. That is until all tire AND car manufacturers agreed to come up with a standard protocol to communicate such information. When that happens, consumers could chose any tire manufacturer and obtain have similar if not the same functionality from them.

I suppose someone had to be first to identify just what could be done with the electronics available today. Pirelli just happens to be it for now in the tire industry.

I just don’t want to have to upgrade tires every 24 months. And, if I have to wait a long time for my car to boot up and establish communications with my tires, I may just take a (dumb) bike.

Photo Credit(s):

Made in space

Read an article in IEEE Spectrum recently titled, 4 Products it makes sense to manufacture in space. The 4 products identified in the article include:

1) Metal alloys – because of micro-gravity, the mixture of metals that go into metal alloys should be much more even and as a result, should create a purer mixture of the metal alloy at the end of the process.

2) Fibre optical cables – the article says, ZBLAN, which is a heavy-metal fluoride glass fibre could have 1/10th the signal loss of current cable but is hard to manufacture on earth due to micro-crystal formation. Apparently, when manufactured (mixed-drawn) in micro-gravity, there’s less of this defect in the glass.

3) Printed human organs – the problem with printing biological organs, hearts, lungs, livers, etc. is they require scaffolding for the cells to adhere to that needs to be bio-degradeable and in the form of whatever organ is needed. However, in micro-gravity there should be less of a need for any scaffolding.

4) Artificial meat – similar to human organs above, by being able to build (3D print) biological products, one could create a steak or other cuts of meat that biological #D printing.

Problems with space manufacture

One problem with manufacturing metal alloys and fibre optic cable in space, is the immense heat required. Glass melts at 1400C, metals anywhere from 650C to 3400C. Getting rid of all that heat in space could present a significant problem. Not to mention the vessels required to hold molten materials weigh a lot.

And metal and glass manufacturing processes can also create waste, such as hot metal/glass particulates that settle on the floor on earth, but who knows where in space. To manufacture metal or glass on ISS would require a very heat tolerant, protected environment or capsule, lots of power to provide heat and radiator surfaces to release said heat.

And of course, delivering raw materials for metals and glass to space (LEO) would cost a lot (SpaceX $2.7K/kg , Atlas V $13.2K/kg). As such, the business case for metal alloy manufacturing in space doesn’t appear positive.

But given the reduced product weight and potentially higher prices one can charge for the product, fibre pptical glass may make business sense. Especially, if you could get by with 1/10th the glass because it has 1/10th the signal loss.

And if you don’t have to ship raw materials from earth (using the moon or asteroids instead), it would improvesboth business cases. That is, assuming raw material discovery and shipping costs are 1/6th or less as much as shipping from earth.

As for organs, as they can’t be manufactured on earth (yet), it could be the “killer app’ for made in space. But it’s sort of a race against time. Doing this in space may be a lot easier today but more research is going on to create organs on earth than in space. But eventually, manufacturing these on earth could be a lot cheaper and just as effective.

But I don’t see a business case for meat in space unless it’s to support making food for astronauts on ISS. Even then, it might be cheaper to just ship them some steak.

Products hard to make in space

I would think anything that doesn’t require gravity to work, should be easier to produce in space.

But that eliminates distillation, e.g., fossil fuel refining, fermentation, and many other chemical distillation processes (see Wikipedia article on Distillation).

But gravity is also used in depositing and holding multiple layers onto one another. So manufacturing paper, magnetic/optical disk platters, magnetic tapes, or any other product built up layer by layer, may not be suitable for space manufacture.

Not sure about semiconductors, as deposition steps can make use of chemical vapors. And that seems to require gravity. But it’s conceivable that in the absence of gravity, chemicals may still adhere to the wafer surface, as it’s an easier location to combine with than other surfaces in the chamber. On the other hand, they may just as likely retain their mixture in the vapor.

Growing extremely pure silicon ingots may be something better done in space. However, it may suffer from the same problems as metal alloy manufacturing. Given the need for extreme purity and the price paid for pure silicon, I would think this would be something to research ahead of metal alloys.

For further research

But in the end, if and when we become a space fairing people, we will need to manufacture everything in space. As well as grow or find raw materials easier than shipping them from the earth.

So, some research ought to be directed on how to perform distillation and multi-layer product manufacturing in space/micro-gravity. Such processes could potentially be done in a centrifuge, if they truly can’t be gone without gravity.

It’s also unclear how to boil any liquid in 0g or micro-g without convection (see Bizarre Boiling NASA Science article). According to the article, it creates one big bubble that stays where it is formed. Providing some way to extract this bubble in place would seem difficult. Boiling liquids in a centrifuge may work.

In any case, I’m sure the ISS crew would be more than happy to do any research necessary to figure out how to brew beer, let alone, distill vodka in space.

Picture Credit(s):

Data analysis of history

Read an article the other day in The Guardian (History as a giant data set: how analyzing the past could save the future), which talks about this new discipline called cliodynamics (see wikipedia cliodynamics article). There was a Nature article (in 2012), Human Cycles: History as Science, which described cliodynamics in a bit more detail.

Cliodynamics uses mathematical systems theory on historical data to predict what will happen in the future for society. According to The Guardian and Nature articles, the originator of cliodynamics, Peter Turchin, predicted in 2010 that the world would change dramatically for the worse over the coming decade, with violence peaking in 2020.

What is cliodynamics

Cliodynamics depends on vast databases of historical data that has been amassed over the last decade or so. For instance, the Seshat Global History Databank (started in 2011, has 3 datasets: moralizing gods, axial age history [8th to 3rd cent. BCE], & social complexity), International Institute of Social History (est. 1935, in 2013 re-organized their collection to focus on data, has 33 dataverses ranging from data on apprenticeships, prices and wage history, strike history of various countries and time periods, etc. ), and Google NGRAM viewer (started in 2010, provides keyword statistics on Google BOOKs).

Cliodynamics uses the information from databases like the above to devise a mathematical model of the history of the world. From their mathematical model, cliodynamics researchers have discerned patterns or cycles in human endeavors that have persisted over centuries.

Cliodynamic cycles

Two of cycles of interest come to mind:

  • Secular cycle – this plays out over 2-3 centuries and starts out with a new egalitarian society that has low levels of inequality where the supply and demand for labor are roughly equal. Over time as population grows, the supply of labor outstrips demand and inequality increases. Elites then start to battle one another, war and political instability results in a new more equal society, re-starting the cycle .
  • Fathers and sons cycle – this plays out over 50 years and starts when the (fathers) generation responds violently to social injustice and the next (sons) generation resigns itself to injustice (or hopefully resolves it) until the next (fathers) generation sees injustice again and erupts violently re-starting the cycle over again. .

It’s this last cycle that Turchin predicted to peak again in 2020, the last one peaking in 1970 and the ones before that peaking in 1920 and 1870.

We’ve seen such theories before. In the 19th and 20th centuries there were plenty of historical theorist. Probably the most prominent was Marx but there were others as well.

The problem with cliodynamics, good data

Sparsity and accuracy of data has always been a problem with historical study. Much information is lost through natural or manmade disasters and much of what’s left is biased. Nonetheless, more and more data is being amassed of a historical nature every day, most of it quantitative and suitable to analysis.

Historical data, where available, can be assessed scientifically, and analyzed by using current tools such as data analytics, machine learning, & deep learning to ascertain trends and make predictions. And the more data available, the more accurate these analyses and predictions can become. Cliodynamics pre-dates much of these tools. but that’s no excuse for not to taking advantage of them.

~~~~

As for 2020, AI, automation and globalization has led and will lead to more job disruption. Inequality is also on the rise, at least throughout much of the west. And then there’s Brexit, USA elections and general mid-east turmoil that seems to all be on the horizon.

Stay tuned, 2020 seems only months away.

Photo Credits:

From Key Historic Figures of WW1 article, Mansell/Ghetty Images, (c) ThoughtCo

Anti War March (1968 Chicago) By David Wilson , CC BY 2.0, Link

Eleven times Americans have marched on Washington, (1920, Washington DC) (c) Smithsonian Magazine

Cambrian Explosion of AI DL app’s in industry and the world

I was at the NetApp Insight conference last week and recorded a podcast (see: GreyBeards Podcast) on what NetApp is doing in the AI DL (Deep Learning) space. On the podcast, we talked about a number of verticals that were deploying AI DL right now and using it to improve outcomes.

It was only is 2012 that AI DL broke out and pretty much conquered the speech recognition contest by improving recognition accuracy by leaps and bounds. Prior to that improvements had been very small and incremental at best. Here we are, just 7 years later and AI DL models are proliferating across industry and every other sector of the world economy.

DL applications in the real world

At the show. we talked about AI DL models being used in healthcare (radiological image analysis, cell counts for infection assessments), automotive (self driving cars), financial services (fraud detection), and retail (predicting how make up would look on someone).

And early this year, at HPE Discover, they discussed a new technique to share training data but still keep it private. In this case, they use block chain technology to publish and share a DL neural network model weights and other hyper parameters trained for some real world purpose.

Customers download and use the model in their day to day activities but record the data that their model analyzes and its predictions. They use this data to update (re-train) their DL neural net. They then publish their new neural net model weights and other parameters to all the other customers. Each customer of the model do the same, updating (re-training) their DL neural net.

At some point an owner or global model arbitrator takes all these individual model updates and aggregates the neural net weights, into a new neural net model and publishes the new model. And then the process starts over again. In this way, training data is never revealed, kept secure and private but DL model updates that result from re-training the model with secured private data would be available to any customer.

Recently, there’s been a slew of articles across many different organizations that show how AI DL is being adopted to work in different areas:

And that’s just a sample of the last few weeks of papers of AI DL activity.

Next Steps

All it takes is data, that can be quantified and classified. With data and classifications in hand, anyone can train a DL model that performs that classification. It doesn’t require GPU farms, decent CPUs are up to the task for TB of data.

But if you want better prediction/classificatoin accuracy, you will need more data which means longer AI DL training runs. So at some point, maybe at >100TB of data, or use AI DL training a lot, you may want that GPU farm.

The Deep Learning with Python book (my favorite) has a number of examples such as, sentiment analysis of text, median real estate pricing predictions, generating text that looks like an authors work, with maybe a dozen more that one can use to understand AI DL technology. But it’s not rocket science, I believe any qualified programmer could do it, with some serious study.

So the real question is what are you doing with your data to make use of AI DLmodels now?

I suppose the other question ought to be, how can you collect more data and classification information, to train more AI DL models?

~~~~

It’s great to be in the storage business.

Photo Credit(s):

Quantum computing NNs

As many who have been following our blog know, AI, Machine Learning (ML) and Deep Learning (DL) (e.g. see our Learning machine learning – part 3, & Industrial revolution deep learning & NVIDIA’s 3U supercomputer, AI reaches a crossroads posts), have become much more mainstream and AI has anointed DL as the best approach for pattern recognition, classification, and prediction, but has applicability beyond that.

One problem with DL has been it’s energy costs. There have been some approaches to address this, but none have been entirely successful (e.g. see Intel new DL Boost, New GraphCore GC2 chips, AI processing at the edge posts) just yet. At one time neuromorphic hardware was the answer but I’ve become disillusioned with that technology over time (see Are neuromorphic chips a dead end post).

This past week we learned of a whole new approach, something called a Quantum Convolutional NN or QCNN (see PhysOrg Introducing QCNN, pre-print of Quantum CNNs, presentation deck on QCNNs, Nature QCNN paper paywall).

Some of you may not know that convolutional neural networks (ConvNets) are the latest in a long line of DL architectures focused on recognizing patterns and classification of data in sequence. DL ConvNets can be used to recognize speech, classify photo segments, analyze ticker tapes, etc.

But why quantum computing

First off, quantum computing (QC) is a new leading edge technology targeted to solving very hard (NP Complete, wikipedia) problems, like cracking Public Key encryption keys, solving the traveling salesperson problem and assembling an optimum Bitcoin block problem (see List of NP complete problems, wikipedia).

QC utilizes quantum mechanical properties of the universe to solve these problems without resorting to brute force searches, such as, going down every path in the traveling salesmen problem (see our QC programming and QC at our doorsteps posts).

At the moment, IBM, Google, Intel and others are all working on the QC and trying to scale it up, by increasing the number of Qubits (quantum bits) their systems support. The more qubits, the more quantum storage you have, and the more sophisticated NP complete problems one can solve. Current qubit counts include: 72 qubits for Google, 42 for Intel, and 50 for IBM. Apparently not all qubits are alike, and they don’t last very long, ~100 microseconds (see Timeline of QC, wikipedia).

What’s a QCNN?

What’s new is the use of quantum computing circuits to create ConvNets. Essentially the researchers have created a way to apply AI DL (ConvNet) techniques to quantum computing data (qubits).

Apparently there are QC [qubit] phases that need to be recognized and what better way to do that than use DL ConvNets. The only problem is that performing DL on QC data with today’s tools, would require reading out the phase into a digital (a pattern recognition problem), converting to digital data, and then processing it via CPU/GPU DL ConvNets, a classic chicken or egg problem. But with QCNNs, one has a DL ConvNet entirely implemented in QC.

DL ConvNets are typically optimized for a specific problem, varying layer counts, nodes/layer, node connectivity, etc. QCNNs match this and also come in various sizes. Above is a QCNN circuit, optimized to recognize the phase (joining?) of two sets of symmetrically-protected topology numbers (SPT, see pre-print article).

I won’t go into the QC technology used in any detail (as I barely understand it), but the researchers have come up with a way to map DL ConvNets into QC circuitry. Assuming this all works, one can then use QC to perform DL pattern recognition on qubit data.

~~~~

Comments?

Photo Credits:

Your Rx on blockchain

We have been discussing blockchain technology for a while now (e.g., see our posts Etherium enters the enterprise, Blockchains go mainstream, and our podcast Discussing blockchains with Donna Dillenberger). And we were at VMworld 2019 where there was brief mention of Project Concord and VMware’s blockchain use for supply chain management.

But recently there was an article in Science Daily about the use of blockchain technology to improvedprescriptions. This was summary of a research paper on Cryptopharmacueticals (paper behind paywall).

How cryptopharmaceuticals could work

Essentially the intent is to use a medical blockchain, to fight counterfeit drugs. They have a proof of concept IOS & Android MedBlockChain app that would show how it could work, but it’s just a sample of some of its functionality, and doesn’t use an external blockchain.

The MedBlockChain app would create a platform and have at least two sides to it.

  • One side would be the phamaceutical manufacturers which would use the blockchain to add or checkin medications that they manufacture to it in an immutable fashion. So the block chain would essentially have an unfalsifiable record of each pill or batch of pills, that was ever manufactured by pharmaceutical companies around the world.
  • The other side would be used by a person taking a medication. Here they could check-out or use the app to see if the medication they are taking was manufactured by a certified supplier of the drug. Presumably there would be a QR code or something similar, that could be read off the medicine package or pill itself. The app would scan the QR code and then use MedBlockChain to look up the provenance of the medication to see if it’s valid or not (a fraudulent copy).

The example MedBlockChain app also has more medical information that could be made available on the block chain such as test results, body measurements, vitals, etc. These could all be stored immutably in the MedBlockChain and provided to medical practitioners. How such medical (HIPPA controlled) personal information would be properly secured and only supplied in plaintext to appropriate personnel is another matter..

~~~~
Cryptopharmaceuticals and the MedBlockChain reminds me of IBM’s blockchain providing diamond provenance and other supply chain services, only in this case applied to medications. Diamond provenance makes sense because of its high cost but drugs seem a harder market to make to me.

I was going to say that such a market may not exist in first world countries. But then I -saw a wikipedia article on counterfeit medicines (bad steroids and cancer medicines with no active ingredients). It appears that counterfeit/fraudulent medications are a problem wherever you may live.

Then of course, the price of medications seems to be going up. So maybe, it could start as a provenance tool for expensive medications and build a market from there.

How to convince manufacturers and the buying public to use the blockchain is another matter. It’s sort of a chicken and egg thing. You need the manufacturers to use it for medications, pills or batch that they manufacture. Doing so adds overhead, time and additional expense and they would need to add a QR code or something similar to every pill, pen or other drug delivery device.

Then maybe you could get consumers and medical practitioners administering drugs to start using it to validate expensive meds. Starting with expensive medications could potentially build the infrastructure, consumer/medical practitioner and pharmaceutical company buy in that would kick start the MedBlockChain. Once started there, it could work its way down to more widely used medications.

Comments?

Photo Credit(s):

Shedding light on all optical neural networks

Read a couple of articles in the past week or so on all optical neural networks (see All optical neural network (NN) closes performance gap with electronic NN and New design advances optical neural networks that compute at the speed of light using engineered matter).

All optical NN solutions operate faster and use less energy to inference than standard all electronic ones. However, in reality they aree more of a hybrid soulution as they depend on the use of standard ML DL to train a NN. They then use 3D printing and other lithographic processes to create a series diffraction layers of an all optical NN that matches the trained NN.

The latest paper (see: Class-specific Differential Detection in Diffractive Optical Neural Networks Improves Inference Accuracy) describes a significant advance beyond the original solution (see: All-Optical Machine Learning Using Diffractive Deep Neural Networks, Ozcan’s original paper).

How (all optical) Diffractive Deep NNs (DDNNs) work for inferencing

In the original Ozcan discussion, a DDNN consists of a coherent light source (laser), an image, a bunch of refractive and reflective diffraction layers and photo detectors. Each neural network node is represented by a point (pixel?) on a diffractive layer. Node to node connections are represented by lights path moving through the diffractive layer(s).

In Ozcan’s paper, the light flowing through the diffraction layer is modified and passed on to the next diffraction layer. This passing of the light through the diffraction layer is equivalent to the mathematical bias (neural network node FP multiplier) in the trained NN.

The previous challenge has been how to fabricate diffraction layers and took a lot of hand work. But with the advent of 3D printing and other lithographic techniques, nowadays, creating a diffraction layer is relatively easy to do.

In DDNN inferencing, one exposes (via a coherent beam of light) the first diffraction layer to the input image data, then that image is transformed into a different light pattern which is sent down to the next layer. At some point the last diffraction layer converts the light hitting it into classification patterns which is then be detected by photo detectors. Altenatively, the classification pattern can be sent down an all optical computational path (see our Photonic computing sees the light of day post and Photonic FPGAs on the horizon post) to perform some function.

In the original paper, they showed results of an DDNN for a completely connected, 5 layer NN, with 0.2M neurons and 8B connections in total. They also showed results from a sparsely connected, 5 layer NN ,with 0.45M neurons and <0.1B connections

Note, that there’s significant power advantages in exposing an image to a series of diffraction gratings and detecting the classification using a photo detector vs. an all electronic NN which takes an image, uses photo detectors to convert it into an electrical( pixel series) signal and then process it through NN layers performing FP arithmetic at layer node until one reaches the classification layer.

Furthermore, the DDNN operates at the speed of light. The all electronic network seems to operate at FP arithmetic speeds X number of layers. That is only if it could all done in parallel (with GPUs and 1000s of computational engines. If it can’t be done in parallel, one would need to add another factor X the number of nodes in each layer . Let’s just say this is much slower than the speed of light.

Improving DDNN accuracy

The team at UCLA and elsewhere took on the task to improve DDNN accuracy by using more of the optical technology and techniques available to them.

In the new approach they split the image optical data path to create a positive and negative classifier. And use a differential classifier engine as the last step to determine the image’s classification.

It turns out that the new DDNN performed much better than the original DDNN on standard MNIST, Fashion MNIST and another standard AI benchmark.

DDNN inferencing advantages, disadvantages and use cases

Besides the obvious power efficiencies and speed efficiencies of optical DDNN vs. electronic NNs for inferencing, there are a few other advantages:

  • All optical data paths are less noisy – In an electronic inferencing path, each transformation of an image to a pixel file will add some signal loss. In an all optical inferencing engine, this would be eliminated.
  • Smaller inferencing engine – In an electronic inferencing engine one needs CPUs, memory, GPUs, PCIe busses, networking and all the power and cooling to make it work. For an all optical DDNN, one needs a laser, diffraction layers and a set of photo detectors. Yes there’s some electronics involved but not nearly as much as an all electronic NN. And an all electronic NN with 0.5m nodes, and 5 layers with 0.1B connections would take a lot of memory and compute to support. Their DDNN to perform this task took up about 9 cm (3.6″) squared by ~3 to5 cm (1.2″-2.0″) deep.

But there’s some problems with the technology.

  • No re-training or training support – there’s almost no way to re-train the optical DDNN without re-fabricating the DDNN diffraction layers. I suppose additional layers could be added on top of or below the bottom layers, sort of like a corrective lens. Also, if perhaps there was some sort of way to (chemically) develop diffraction layers during training steps then it could provide an all optical DL data flow.
  • No support for non-optical classifications – there’s much more to ML DL NN functionality than optical classification. Perhaps if there were some way to transform non-optical data into optical images then DDNNs could have a broader applicability.

The technology could be very useful in any camera, lidar, sighting scope, telescope image and satellite image classification activities. It could also potentially be used in a heads up displays to identify items of interest in the optical field.

It would also seem easy to adapt DDNN technology to classify analog sensor data as well. It might also lend itself to be used in space, at depth and other extreme environments where an all electronic NN gear might not survive for very long.

Comments?

Photo Credit(s):

Figure 1 from All-Optical Machine Learning Using Diffractive Deep Neural Networks

Figure 2 from All-Optical Machine Learning Using Diffractive Deep Neural Networks

Figure 2 from Class-specific Differential Detection in Diffractive Optical Neural Networks Improves Inference Accuracy

Figure 3 from Class-specific Differential Detection in Diffractive Optical Neural Networks Improves Inference Accuracy