I attended HPE Discover conference in Vegas this past week and among all their product announcements, there was a panel discussion on something called Swarm Intelligence. But it was really about collaborative learning.
Swarm Intelligence at HPE is a way for multiple organizations/edge devices to train a model collaboratively. They end up using their own data to train local models but then share their models (actually model node weights) with one another.
In this fashion, if say one hospital specializes in the detection and treatment of pneumonia and another in TB, they could both train a shared model on their respective sets of data. But during training, they share their model weights between them and after some number of training iterations, have a single model than supports detection of both.
How does swarm intelligence work?
To make swarm intelligence work:
- All parties have to reach consensus on model hyper parameters, i.e, type of model (CNN, RNN, LSM, etc.), number of nodes per layer, number of layers, levels of connections between nodes, etc. So there’s a single model architecture to be trained across all the organizations.
- All organization training data needs to be the same type, (e.g., X-rays).
- After each model training session all model weights have to be shared with each other
- All organizations have to decide on the method used to merge or combine the model weights (e.g. averaging). .
In the end, after N number of training epochs, their combined model would be essentially cross trained on each organizations data. But no one shared any data!
Why attempt swarm intelligence
HPE believes swarm intelligence would be a way to not have to transmit all that edge data to a central repository, but there’s other advantages:
- A combined model could be trained on more data than any single organization could provide.
- A combined model would have less organizational bias.
There’s one other possibility, but it’s unclear whether this is legally valid or not, but a combined model could be trained on data that it didn’t have legal access to.
One problem with the edge is the vast amount of data there
It turns out that a self driving car could generate 4TB of data/day of driving. Moving 4TB a day from all the cars in say a major metropolitan area (4 million people with ~1 million cars of which 20% are on the road each week day) could represent as much as 200K*4TB or 200EB of data/day.
There is not enough bandwidth in a fully 5G world to move that amount of data each day wirelessly and probably not enough bandwidth to move that amount of data over wire.
But if each car were to train its own (self-driving) model each day on its own data and then share that training model of say 1024 nodes with 1024 layers, it would represent 1M node weights ( floating point numbers), or ~16MB of data, if done effectively, one could have a cities worth of training data to train your self driving car models.
The allure of swarm intelligence/collaborative learning is high. It seems a small cost to reach consensus on the model hyper-parameters, collaborative learning methodology and synchronized training epochs to create a model trained on multiple organization/edge device training data.
HPE discussed using private blockchains to coordinate the sharing of model training across organizations or edge devices and use the block chain to compensate organizations for the use of their trained models. Certainly this could work well with edge devices but it seems an unnecessary complication for collaborative organizations.
Nonetheless, swarm intelligence may just be one way to address some of serious problems with deep learning today.