Weight Agnostic Neural Networks (WANNs)

Read an article the other day (Neural Networks Can Drive Without [weight] Learning) about a new form of deep learning neural network (NN) that is not dependent on the weights assigned to network nodes. The new NN is called WANN (Weight Agnostic NN). There’s also a scientific paper (on Github, Weight Agnostic Neural Networks) that describes WANNs in more detail.

How WANNs differ from normal NN

If I understand them properly, WANNs are trained, but instead of assigning weights during training, WANN networks architectures (nodes and connections) are modified and optimized to perform well against the training data.

Indeed, most NN start out with assigning random weights to all network nodes and then these weights are adjusted through the training cycle, until the NN performs well on the training data. But NN such as these, have a structure (# nodes/layer, # layers, connectivity type, etc.) defined by the researcher, that is stable and unchanging during a training-validation cycle. If the NN model is not accurate enough, the researcher has two choices, find better data or change the model’s structure. WANNs start and end with changing the model’s structure.

With WANNs they start out with a set of NN architectures (#nodes/layer, #layers, connection types, etc). Each NN architecture is evaluated against the training data with a single shared randomized weight. That shared weight is altered (randomly) for a training pass and the model evaluated for accuracy.

At the end of a WANN training pass you have a set of evaluation metrics for each model structure. The resultant WANNs are then ordered by performance and complexity. The highest performing networks are then used to create a new population (set) of WANN architecture to be tested and the process iterates from there. This would presumably continue until you have reached a plateau of accuracy statistics across a number of shared randomized weights. And this would be the WANN model used for the application

Why WANN?

For a normal NN, each node weight would be adjusted automatically and independently at the end of each training batch. There would, of course, be a large number of batches, causing each weight in the NN nodes to be altered (via floating point arithmetic). So the math would be floating point arithmetic*#nodes*#layers*# of training batches (* # training passes (or epochs).

WANNs avoid this inner loop math altogether. Instead they would need to test a model on a number of shared random weights. This would presumably be done after a complete training pass (each epoch). And even if you had the same number of WANN models as nodes in a normal NN, the computations would be much less. Something on the order of #models * # epochs (each training pass [or epoch] could conceivable test a different shared random weight).

Another advantage of WANNs is that they result in simpler, less complex NN models (# nodes, # layers, # of connections, etc.) than normal DL NNs. Simpler NN models could be very useful for IoT applications, where computational power and storage is limited.

The main disadvantage of WANNs is that they aren’t as accurate as normally (weight adjusted) NNs. However, once you have a WANN, you can always elect to re-train it in the normal fashion by adjusting weights to gain more accuracy. And doing so would likely be much closer to a more complex NN model that was trained from the start by altering weights.

WANNs are more like nature

Human and other mammal (probably avian, aquatic, etc as well) seem to be born with certain innate abilities, visual, perceptive, mobility and with certain habits such as nursing, facial mimicking, hunger-feeding, etc. Presumably these innate abilities and habits are hardwired neuron networks that don’t depend on envirnonmental learning. Something that they are all born with.

Concievably WANNs could be consider similar to these hardwired (unlearned) neuron networks. WANNs could be used in a similar fashion to embed certain innate habits and abilities into robots or other automation that could be further trained with their interactions with their environment

““`

The Github paper has an online WANN model widget with a slider where you can alter a shared random weight and see its impact on the operation of a the widget. Playing with this, the only weight that seems to have a significant impact on the actions of the widget is zero…

Photo Credit(s): “Neural Connections In the Human Brain” by Image Editor is licensed under CC BY-NC-ND 2.0 

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.