Distributed computing – Silverton Consulting

Blockchain Compute cloud

Posted on January 26, 2024January 26, 2024 by Ray in Cloud services, Distributed computing

Over the past year or so I’ve been hearing a lot about a new use of blockchain technology to deploy a compute cloud.

In the old days, mining crypto would reward you for doing the work. But over time, it’s become harder to mine and to make money from crypto. Specialized hardware took over more of this activity making it much less profitable for the rest of us

But with the emergence of crypto distributed compute clouds, this maybe changing. Akash is a relatively popular one, but I read an article in ScienceDaily about the use of the Golem network to implement a search for earth’s chemistry precursors that led to life (see: Chemists use the blockchain to simulate … the origins of life) which was was describing a CHEM open access paper Emergence of metabolic-like cycles in blockchain-orchestrated reaction networks.

The science

The science was intended to simulate chemical reactions based on chemicals available to primitive earth to determine which reaction chain(s) could lead to life. They programmed the set of reactions and the chemicals available (water, methane, & ammonia) to early earth and intended to let this run and generate all possible reaction cycles.

The researchers realized that doing this much computation would require more compute power than available to them. So they decided to deploy the computations across a distributed compute cloud. They chose the Golem Network to do their computations. Their computations ultimately resulted a reaction cycle database they called the Network of Early Life (NOEL) (see: NOEL Network).

Once the distributed cloud compute was in operation they used it to come up with 11B reaction cycles of which ~5B would “entail no incompatibilities or selectivity conflicts”. They then used these to construct a series of metabolic network 100K larger than ever produced before as depicted in NOEL.

Using NOEL, the team was able to discover some standard metabolic pathways (reaction cycles) and a limited set that produced simple sugars and amino acids could emerge from the chemicals available to primitive earth.

But they also found about a 100 reaction cycles that involved self-replicating molecules (molecules that could create additional copies of themselves). Self replicating molecules is also believed to be a requirement for the origin of life.

It turned out that the work to construct NOEL on the Golem network took 400 machines, over 20K cores and two months to do the calculations. The cost to them was 82K GLMs (at ~0.21 GLM/USD this would be $17.2K). The team estimated it would have required a top of the line AMD 256 core server about 6 months to compute which would have cost substantially more to purchase and of course running it for 6 months would cost even more.

The team chose Golem, because the work only needed to be available in the form of docker containers, didn’t require the central work server to be online constantly, automatically matched the compute with cloud resource, and managed it all using a cryptographically secure and distributed interface.

Distributive compute cloud

The science is interesting but what’s more interesting (to me) is it was done using a crypto distributed computing cloud.

Looking at the Golem network statistics they show ~510 compute providers with about 5000 cores available of which 50-100 providers supplied computing use to the cloud over the past 4 hrs (26Jan2024: 1600 MDT). That doesn’t seem like a lot of providers but each could have multiple servers running compute.

The Golem network provides a relatively straightforward tutorial on how to set up a server to supply compute to the network. There are some tricks (port forwarding, screen/tmux deployment) but it all seems pretty straight forward (probably something even I could do in an hour or so).

And when you start supplying compute to Golem mainnet, you earn GLMs which are a cryptocurrency (ERC20 coin of ETH). So one should easily be able to convert GLM to ETH and whatever currency you desire.

Many former crypto miners have idle servers that could be put to use providing resources to distributed compute clouds. And if I thought doing so might help some (under resourced organization) produce real scientific research, I might be even more tempted to do so.

~~~~

So if you’ve got some servers sitting idle in your (home) office. This weekend, fire them back up, install the Golem provider software and run the Golem network. Who knows by doing so, you just might help some researcher someplace change the world.

Picture credit(s):

From the CHEM Emergence of metabolic-like cycles in blockchain-orchestrated reaction networks article
From the Golem stats page

FAST(HARD) or Slow(soft)AGI takeoff – AGI Part 6

Posted on October 27, 2022October 27, 2022 by Ray in AGI, Cognitive computing, Distributed computing, Forecasting, Scenario planning, Strategic Inflection Points

I was listening to a podcast a couple of weeks back and the person being interviewed made a comment that he didn’t believe that AGI would have a fast (hard) take off rather it would be slow (soft). Here’s the podcast John Carmack interviewed by Lex Fridman).

Hard vs. soft takeoff

A hard (fast) takeoff implies a relatively quick transition (seconds, hours, days, or months) between AGI levels of intelligence and super AGI levels of intelligence. A soft (slow) takeoff implies it would take a long time (years, decades, centuries) to go from AGI to super AGI.

We’ve been talking about AGI for a while now and if you want to see more about our thoughts on the topic, check out our AGI posts (in most recent order: AGI part 5, part 4, part 3 (ish), part (2), part (1), and part (0)).

The real problem is that many believe that any AGI that reaches super-intelligence will have drastic consequences for the earth and especially, for humanity. However, this is whole other debate.

The view is that a slow AGI takeoff might (?) allow sufficient time to imbue any and all (super) AGI with enough safeguards to eliminate or minimize any existential threat to humanity and life on earth (see part (1) linked above).

A fast take off won’t give humanity enough time to head off this problem and will likely result in an humanity ending and possibly, earth destroying event.

Hard vs Soft takeoff – the debate

I had always considered AGI would have a hard take off but Carmack seemed to think otherwise. His main reason is that current large transformer models (closest thing to AGI we have at the moment) are massive and take lots of special purpose (GPU/TPU/IPU) compute, lots of other compute and gobs and gobs of data to train on. Unclear what the requirements are to perform inferencing but suffice it to say it should be less.

And once AGI levels of intelligence were achieved, it would take a long time to acquire any additional regular or special purpose hardware, in secret, required to reach super AGI.

So, to just be MECE (mutually exclusive and completely exhaustive) on the topic, the reasons researchers and other have posited to show that AGI will have a soft takeoff, include:

AI hardware for training and inferencing AGI is specialized, costly, and acquisition of more will be hard to keep secret and as such, will take a long time to accomplish;
AI software algorithmic complexity needed to build better AGI systems is significantly hard (it’s taken 70yrs for humanity to reach todays much less than AGI intelligent systems) and will become exponentially harder to go beyond AGI level systems. This additional complexity will delay any take off;
Data availability to train AGI is humongous, hard to gather, find, & annotate properly. Finding good annotated data to go beyond AGI will be hard and will take a long time to obtain;
Human government and bureaucracy will slow it down and/or restrict any significant progress made in super AGI;
Human evolution took Ms of years to go from chimp levels of intelligence to human levels of intelligence, why would electronic evolution be 6-9 orders of magnitude faster.
AGI technology is taking off but the level of intelligence are relatively minor and specialized today. One could say that modern AI has been really going since the 1990s so we are 30yrs in and today have almost good AI chatbots today and AI agents that can summarize passages/articles, generate text from prompts or create art works from text. If it takes another 30 yrs to get to AGI, it should provide sufficient time to build in capabilities to limit super-AGI hard take off.

I suppose it’s best to take these one at a time.

Hardware acquisition difficulty – I suppose the easiest way for an intelligent agent to acquire additional hardware would be to crack cloud security and just take it. Other ways may be to obtain stolen credit card information and use these to (il)legally purchase more compute. Another approach is to optimize the current AGI algorithms to run better within the same AGI HW envelope, creating super AGI that doesn’t need any more hardware at all.
Software complexity growing – There’s no doubt that AGI software will be complex (although the podcast linked to above, is sub-titled that “AGI software will be simple”). But any sub-AGI agent that can change it’s code to become better or closer to AGI, should be able to figure out how not to stop at AGI levels of intelligence and just continue optimizating until it reaches some wall. i
Data acquisition/annotation will be hard – I tend to think the internet is the answer to any data limitations that might be present to an AGI agent. Plus, I’ve always questioned if Wikipedia and some select other databases wouldn’t be all an AGI would need to train on to attain super AGI. Current transformer models are trained on Wikipedia dumps and other data scraped from the internet. So there’s really two answers to this question, once internet access is available it’s unclear that there would be need for anymore data. And, with the data available to current transformers, it’s unclear that this isn’t already more than enough to reach super AGI
Human bureaucracy will prohibit it: Sadly this is the easiest to defeat. 1) there are roque governments and actors around the world with more than sufficient resources to do this on their own. And no agency, UN or otherwise, will be able to stop them. 2) unlike nuclear, the technology to do AI (AGI) is widely available to business and governments, all AI research is widely published (mostly open access nowadays) and if anything colleges/universities around the world are teaching the next round of AI scientists to take this on. 3) the benefits for being first are significant and is driving a weapons (AGI) race between organizations, companies, and countries to be first to get there.
Human evolution took Millions of years, why would electronic be 6-9 orders of magnitude faster – electronic computation takes microseconds to nanoseconds to perform operations and humans probably 0.1 sec, or so. Electronics is already 5 to 8 orders of magnitude faster than humans today. Yes the human brain is more than one CPU core (each neuron would be considered a computational element). But there are 64 core CPUs/4096 CORE GPUs out there today and probably one could consider similar in nature if taken in the aggregate (across a hyperscaler lets say). So, just using the speed ups above it should take anywhere from 1/1000 of a year to 1 year to cover the same computational evolution as human evolution covered between the chimp and human and accordingly between AGI and AGIx2 (ish).
AGI technology is taking a long time to reach, which should provide sufficient time to build in safeguards – Similar to the discussion on human bureaucracy above, with so many actors taking this on and the advantages of even a single AGI (across clusters of agents) would be significant, my guess is that the desire to be first will obviate any thoughts on putting in safeguards.

Other considerations for super AGI takeoff

Once you have one AGI trained why wouldn’t some organization, company or country deploy multiple agents. Moreover, inferencing takes orders of magnitude less computational power than training. So with 1/100-1/1000th the infrastructure, one could have a single AGI. But the real question is wouldn’t a 100- or 1000-AGis represent super intelligence?

Yes and no, 100 humans doesn’t represent super intelligence and a 1000 even less so. But humans have other desires, it’s unclear that 100 humans super focused on one task wouldn’t represent super intelligence (on that task).

Interior view of a data center with equipment

What can be done to slow AGI takeoff today

Baring something on the order of Nuclear Proliferation treaties/protocols, putting all GPUs/TPUs/IPUs on weapons export limitations AND restricting as secret, any and all AI research, nothing easily comes to mind. Of course Nuclear Proliferation isn’t looking that good at the moment, but whatever it’s current state, it has delayed proliferation over time.

One could spend time and effort slowing technology progress down. Such as by reducing next generation CPU/GPU/IPU compute cores , limiting compute speedups, reduce funding for AI research, putting a compute tax, etc. All of which, if done across the technological landscape and the whole world, could give humanity more time to build in AGI safeguards. But doing so would adversely impact all technological advancement, in healthcare, business, government, etc. And given the proliferation of current technology and the state actors working on increasing capabilities to create more, it would be hard to envision slowing technological advancement down much, if at all.

It’s almost like putting a tax on slide rules or making their granularity larger.

It could be that super AGI would independently perceive itself benignly, and only provide benefit to humanity and the earth. But, my guess is that given the number of bad actors intent on controlling the world, even if this were true, they would try to (re-)direct it to harm segments of humanity/society. And once unleashed, it would be hard to stop.

The only real solution to AGI in bad actor hands, is to educate all of humanity to value all humans and to cherish the environment we all live in as sacred. This would eliminate bad actors,

It sounds so naive, but in reality, it’s the only thing, I believe, the only way we can truly hope to get us through this AGI technological existential crisis.

Just like nuclear, we as a society will keep running into technological existential crisis’s like this. Heading all these off, with a better more all inclusive, more all embracing, and less combative humanity could help all of them.

Comments?

Picture Credits:

Slide rule from wikipedia article

The problem with smarter robots

Posted on October 13, 2021October 15, 2021 by Ray in Artificial Intelligence, Deep Learning, Distributed computing, Neural network, Reinforcement Learning, Robots

Read an article the other week about how Deepmind (at Google) is approaching the training of robotics using simulation, reinforcement learning, elastic weights, knowledge distillation and progressive learning.

It seems relatively easy to train a robot to handle some task like grabbing or walking. But doing so can take an awfully long time. If you want to try to train a robot to grab something and put it someplace. You can have it start out making some random movements of its arm, wrist and fingers (if they have such things) and then use reinforcement learning to help it improve its movements over time.

But if each grab attempt takes 10 seconds, using reinforcement learning may take 10,000 attempts before it starts to make any significant progress and perhaps another 20,000-50,000 more to get expert at it. Let’s see 60K *10 seconds is 10,000 minutes or ~170 hours. And that’s just one object pick and place. But then maybe you would like to grab different parts and maybe place them in different locations. All these combinations start adding up.

And of course doing 1000s of movements will wear out gears, motors, mechanisms etc. If only this could all be done in electronic simulations. Then assuming the simulations are accurate enough the whole thing could be done in a matter of hours without wearing anything out. Enter robot simulators such as NVIDIA Isaac Sim, OpenAI RoboSchool/PyBullet

But the problems with simulation are …

Simulations are getting more accurate but at some point their accuracy defeats its purpose because the real world is always noisy, windy and not as deterministic as any simulation. One researcher said you could conceivable have a two armed robot be trained to throw all of a cell phones components up into the air and they will all land in their proper places, proper orientations. But in the real world this could never actually happen, or if it did, it could only happen once.

Hurricane Ike - 2008/09/12 - 21:26 UTC by CoreBurn (cc) (from Flickr) — Hurricane Ike – 2008/09/12 – 21:26 UTC by CoreBurn (cc) (from Flickr)

Weather researchers have been dealing with this problem in spades for a long time. There appears to be a fundamental limit to how far in advance we can predict weather and it’s due to the accuracy with which sensors operate and the complexity of feedback loops between the atmosphere, oceans, landforms, etc. So at a fundamental level, simulations can never be completely accurate. But they can be better.

Today’s weather simulations we see on TV/radio use models that average a number of distinct simulations, where sensor information has been slightly and randomly modified. Something similar could be done for robotic simulation environments, to make them more realistic.

But there are other problems with training robots to do lots of tasks.

Forget me not…

AI deep learning and reinforcement learning algorithms are great when charged with learning a single task, but having it learn multiple tasks is much harder to do. Because each task requires its own deep neural network (DNN) and if you train a DNN on one task and then try to train in on a another task, it forgets all the learnings from the original task. Researchers call this catastrophic forgetting.

One way researchers have dealt with this problem is to effectively freeze certain DNN nodes from having their weights changed during subsequent training rounds and leave others flexible or changeable. One can see this when one trains an image recognition DNN to classify different objects by importing a well trained object classifier and freezing all of it’s layers except the top one or two and then training these layers to classify new objects.

This works well but you have effectively changed the DNN to forget the original object classification training and replaced it with a new one. One solution to this approach is to have multiple passes of training, after each one, certain nodes and connections (of importance to that particular task) are selectively frozen. This works well for a limited number of different tasks but over time all nodes become frozen which means that no more learning can take place. Researchers call this approach to the catastrophic forgetting problem elastic weights.

One way to get around the all nodes frozen issue in elastic weights is to have multiple NNs. One which is trained on a specific task and whose weights are frozen and then a DNN that exists alongside this one with it’s own initialized set of weights. But which uses the original DNN as part of the new DNN inputs. This effectively includes and incorporates all the previously learned knowledge into the new, combined DNN. This is called Progressive Neural Networks.

In this fashion one progressive DNN can be sequentially trained on any number of tasks each of which ends up providing input to all subsequent task training activity. Such a progressive network never forgets and can use previously learned knowledge on new tasks.

The problem with progressive DNNs is a proliferation of DNN column. one for each trained task. However there are a couple of approaches to shrinking an ensemble of DNN like progressive training creates into one that is simpler and just as effective. One way is to perturb weights in DNN nodes and see how model prediction accuracy is impacted on all its tasks. If accuracy isn’t impacted that much, then that node and all its connections could be deleted from the model with minimal impact on model accuracy.

Another approach is to use one DNN to train another. Sort of like a teacher-student. This is called Knowledge Distilation. Where one DNN is a large network (the teacher) and a smaller (student) network that is trained to mimic the teacher DNN to achieve similar accuracy. This is done by training the smaller student network to match the predictions/classifications of the larger one.

Google researchers have shown that knowledge distillation works best when the gap in the sizes of the two networks (teacher and student) aren’t that large. They have solved this problem by introducing an intermediate step (called teachers assistent). They train this TA first then use the TA to train the student.

In the above graphic, when using a teacher of size 110 and a student of size 8 the resulting accuracy suffers but if one uses an intermediate DNN, with a size 20 the resultant accuracy of the student is much closer to the teacher..

~~~~

So with realistic simulation we can train a robot to do any specific task, all using only compute resources. And using progressive DNN training, a robot could conceivably be trained to do any number of tasks. And with appropriate knowledge distillation one can reduce the DNN from progressive training into something much smaller (<10%) than the original DNN.

Want a personal robot that can clean up around your place, do the wash, cook your food and do anything else needed. You know what to do.

from Deepmind Progressive Neural Networks article
from Deepmind Progressive Neural Networks article
from Deepmind Improved Knowledge Distillation via Teacher Assistant: Bridging the Gap Between Student and Teacher

CTERA, Cloud NAS on steroids

Posted on August 13, 2021August 13, 2021 by Ray in Cloud services, Cloud storage, data access, Data consistency, Data security, Distributed computing, File Storage, Mobile computing, Object storage, storage scalability, Strategic Inflection Points

We attended SFD22 last week and one of the presenters was CTERA, (for more information please see SFD22 videos of their session) discussing their enterprise class, cloud NAS solution.

We’ve heard a lot about cloud NAS systems lately (see our/listen to our GreyBeards on Storage podcast with LucidLink from last month). Cloud NAS systems provide a NAS (SMB, NFS, and S3 object storage) front-end system that uses the cloud or onprem object storage to hold customer data which is accessed through the use of (virtual or hardware) caching appliances.

These differ from file synch and share in that Cloud NAS systems

Don’t copy lots or all customer data to user devices, the only data that resides locally is metadata and the user’s or site’s working set (of files).
Do cache working set data locally to provide faster access
Do provide NFS, SMB and S3 access along with user drive, mobile app, API and web based access to customer data.
Do provide multiple options to host user data in multiple clouds or on prem
Do allow for some levels of collaboration on the same files

Although admittedly, the boundary lines between synch and share and Cloud NAS are starting to blur.

CTERA is a software defined solution. But, they also offer a whole gaggle of hardware options for edge filers, ranging from smart phone sized, 1TB flash cache for home office user to a multi-RU media edge server with 128TB of hybrid disk-SSD solution for 8K video editing.

They have HC100 edge filers, X-Series HCI edge servers, branch in a box, edge and Media edge filers. These later systems have specialized support for MacOS and Adobe suite systems. For their HCI edge systems they support Nutanix, Simplicity, HyperFlex and VxRail systems.

CTERA edge filers/servers can be clustered together to provide higher performance and HA. This way customers can scale-out their filers to supply whatever levels of IO performance they need. And CTERA allows customers to segregate (file workloads/directories) to be serviced by specific edge filer devices to minimize noisy neighbor performance problems.

CTERA supports a number of ways to access cloud NAS data:

Through (virtual or real) edge filers which present NFS, SMB or S3 access protocols
Through the use of CTERA Drive on MacOS or Windows desktop/laptop devices
Through a mobile device app for IOS or Android
Through their web portal
Through their API

CTERA uses a, HA, dual redundant, Portal service which is a cloud (or on prem) service that provides CTERA metadata database, edge filer/server management and other services, such as web access, cloud drive end points, mobile apps, API, etc.

CTERA uses S3 or Azure compatible object storage for its backend, source of truth repository to hold customer file data. CTERA currently supports 36 on-prem and in cloud object storage services. Customers can have their data in multiple object storage repositories. Customer files are mapped one to one to objects.

CTERA offers global dedupe, virus scanning, policy based scheduled snapshots and end to end encryption of customer data. Encryption keys can be held in the Portals or in a KMIP service that’s connected to the Portals.

CTERA has impressive data security support. As mentioned above end-to-end data encryption but they also support dark sites, zero-trust authentication and are DISA (Defense Information Systems Agency) certified.

Customer data can also be pinned to edge filers, Moreover, specific customer (director/sub-directorydirectories) data can be hosted on specific buckets so that data can:

Stay within specified geographies,
Support multi-cloud services to eliminate vendor lock-in

CTERA file locking is what I would call hybrid. They offer strict consistency for file locking within sites but eventual consistency for file locking across sites. There are performance tradeoffs for strict consistency, so by using a hybrid approach, they offer most of what the world needs from file locking without incurring the performance overhead of strict consistency across sites. For another way to do support hybrid file locking consistency check out LucidLink’s approach (see the GreyBeards podcast with LucidLink above).

At the end of their session Aron Brand got up and took us into a deep dive on select portions of their system software. One thing I noticed is that the portal is NOT in the data path. Once the edge filers want to access a file, the Portal provides the credential verification and points the filer(s) to the appropriate object and the filers take off from there.

CTERA’s customer list is very impressive. It seems that many (50 of WW F500) large enterprises are customers of theirs. Some of the more prominent include GE, McDonalds, US Navy, and the US Air Force.

Oh and besides supporting potentially 1000s of sites, 100K users in the same name space, and they also have intrinsic support for multi-tenancy and offer cloud data migration services. For example, one can use Portal services to migrate cloud data from one cloud object storage provider to another.

They also mentioned they are working on supplying K8S container access to CTERA’s global file system data.

There’s a lot to like in CTERA. We hadn’t heard of them before but they seem focused on enterprise’s with lots of sites, boatloads of users and massive amounts of data. It seems like our kind of storage system.

Comments?

Swarm learning for distributed & confidential machine learning

Posted on June 2, 2021June 2, 2021 by Ray in Artificial Intelligence, Data security, Deep Learning, Distributed computing, Information economy, Machine Learning, Neural network, R&D measures, Strategic Inflection Points

Read an article the other week about researchers in Germany working with a form of distributed machine learning they called swarm learning (see: AI with swarm intelligence: a novel technology for cooperative analysis …) which was reporting on a Nature magazine article (see: Swarm Learning for decentralized and confidential clinical machine learning).

The problem of shared machine learning is particularly accute with medical data. Many countries specifically call out patient medical information as data that can’t be shared between organizations (even within country) unless specifically authorized by a patient.

So these organizations and others are turning to use distributed machine learning as a way to 1) protect data across nodes and 2) provide accurate predictions that uses all the data even though portions of that data aren’t visible. There are two forms of distributed machine learning that I’m aware of federated and now swarm learning.

The main advantages of federated and swarm learning is that the data can be kept in the hospital, medical lab or facility without having to be revealed outside that privileged domain BUT the [machine] learning that’s derived from that data can be shared with other organizations and used in aggregate, to increase the prediction/classification model accuracy across all locations.

How distributed machine learning works

Distributed machine learning starts with a common model that all nodes will download and use to share learnings. At some agreed to time (across the learning network), all the nodes use their latest data to re-train the common model and share new training results (essentially weights used in the neural network layers) with all other members of the learning network.

Shared learnings would be encrypted with TLS plus some form of homomorphic encryption that allowed for calculations over the encrypted data.

In both federated and swarm learning, the sharing mechanism was facilitated by a privileged block chain (apparently Etherium for swarm). All learning nodes would use this blockchain to share learnings and download any updates to the common model after sharing.

Federated vs. Swarm learning

The main difference between federated and swarm learning is that with federated learning there is a central authority that updates the model(s) and with swarm learning that processing is replaced by a smart contract executing within the blockchain. Updating model(s) is done by each node updating the blockchain with shared data and then once all updates are in, it triggers a smart contract to execute some Etherium VM code which aggregates all the learnings and constructs a new model (or at least new weights for the model). Thus no node is responsible for updating the model, it’s all embedded into a smart contract within the Etherium block chain. .

Buthow does the swarm (or smart contract) update the common model’s weights. The Nature article states that they used either a straight average or a weighted average (weighted by “weight” of a node [we assume this is a function of the node’s re-training dataset size]) to update all parameters of the common model(s).

Testing Swarm vs. Centralized vs. Individual (node) model learning

In the Nature paper, the researchers compared a central model, where all data is available to retrain the models, with one utilizing swarm learning. To perform the comparison, they had all nodes contribute 20% of their test data to a central repository, which ran the common swarm updated model against this data to compute an accuracy metric for the swarm. The resulting accuracy of the central vs swarm learning comparison look identical.

They also ran the comparison of each individual node (just using the common model and then retraining it over time without sharing this information to the swarm versus using the swarm learning approach. In this comparison the swarm learning approach alway seemed to have as good as if not better accuracy and much narrower dispersion.

In the Nature paper, the researchers used swarm learning to manage the machine learning model predictions for detecting COVID19, Leukemia, Tuberculosis, and other lung diseases. All of these used public data, which included PBMC (peripheral blood mono-nuclear cells) transcription data, whole blood transcription data, and X-ray images.

Swarm learning also provides the ability to onboard new nodes in the network. Which would supply the common model and it’s current weights to the new node and add it to the shared learning smart contract.

The code for the swarm learning can be downloaded from HPE (requires an HPE passport login [it’s free]). The code for the models and data processing used in the paper are available from github. All this seems relatively straight forward, one could use the HPE Swarm Learning Library to facilitate doing this or code it up oneself.

Photo Credit(s):

From Nature magazine article Swarm Learning for decentralized and confidential clinical machine learning
From Nature magazine article Swarm Learning for decentralized and confidential clinical machine learning
From Nature magazine article Swarm Learning for decentralized and confidential clinical machine learning
From Nature magazine article Swarm Learning for decentralized and confidential clinical machine learning

The birth of biocomputing (on paper)

Posted on March 26, 2021March 26, 2021 by Ray in Distributed computing, Mobile computing, Processing performance, Strategic Inflection Points, System effectiveness

Read an article this past week discussing how researchers in Barcelona Spain have constructed a biological computing device on paper (see Biocomputer built with cells printed on paper). Their research was written up in a Nature Article (see 2D printed multi-cellular devices performing digital or analog computations).

We’ve written about DNA computing and storage before (see DNA IT …, DNA Computing… posts and our GBoS podcast on DNA storage…). But this technology takes all that to another level.

The challenges with biological computing previously had been how to perform the input processing and output within a single cell or when using multiple cells for computations, how to wire the cells together to provide the combinational logic required for the circuit.

The researchers in Spain seemed to have solved the wiring problems by using diffusion across a porous surface (like paper) to create a carrier signal (wire equivalent) and having cell groups at different locations along this diffusion path either enhance or block that diffusion, amplify/reduce that diffusion or transform that diffusion into something different

Analog (combinatorial circuitry types of) computation for this biocomputer are performed based on the location of sets of cells along this carrier signal. So spatial positioning is central to the device and the computation it performs. Not unlike digital or combinatorial circuitry, different computations can be performed just by altering the position along the wire (carrier signal) that gates (cells) are placed.

Their process seems to start with designing multiple cell groups to provide the processing desired, i.e., enhancing, blocking, transforming of the diffusion along the carrier signal, etc. Once they have the cells required to transform the diffusion process along the carrier signal, they then determine the spatial layout for the cells to be used in the logical circuit to perform the computation desired. Then they create a stamp which has wells (or indentations) which can be filled in with the cells required for the computation. Then they fill these wells with cells and nutrients for their operation and then stamp the circuit onto a porous surface.

The carrier signal the research team uses is a small molecule, the bacterial 3OC6HSL acyl homoserine lactone (AHL) which seems to be naturally used in a sort of biologic quorum sensing. And the computational cells produce an enzyme that enhances or degrades the AHL flow along the carrier signal. The AHL diffuses across the paper and encounters these computational cells along the way and compute whatever it is that’s required to be computed. At some point a cell transforms AHL levels to something externally available

They created:

Source cells (Sn) that take a substance as input (say mercury) and converts this into AHL
.Gate cells (M) that provide a switch on the solution of AHL difusing across the substrate.
Carrier reporter cells (CR) which can be used to report on concentrations of AHL.

The CR cells produce green florescent reporter proteins (GFP). Moreover, each gate cell expresses red florescent reporter proteins (RFP) as well for sort of a diagnostic tap into its individual activity.

Mapping of a general transistor architecture on a cellular printed pattern obtained using a stamping template. Similar to the transistor architecture, the cellular pattern is composed of three main components: source (S₁ cells), gate (M⁻ cells) that responds to external inputs and a drain (CR cells) as the final output responding to the presence of the carrying signal (CS). b Stamping template used to create the circuit made of PLA with a layer of synthetic fibre (green). Cellular inks (yellow) are in their corresponding containers. Before stamping, the synthetic fibre is soaked with the different cell types. Finally, the stamping template is pressed against the paper surface, depositing all cells. c Circuit response. In the absence of external input, i.e. arabinose, the CS encoded in the production of AHL molecules by S₁ cells diffuses along the surface, inducing GFP expression in reporter cells CR. In the presence of 10⁻³ M arabinose (Ara), the modulatory element M⁻_ara produces the AHL cleaving enzyme Aiia, which degrades the CS. Error bars are the standard deviation (SD) of three independent experiments. Data are presented as mean values ± SD. Experiments are performed on paper strips. The average fold change is 5.6x. d Photography of the device. Source data are provided as a Source Data file.

Using S, M and CR cells they are able to create any type of gate needed. This includes OR, AND, NOR and XNOR gates and just about any truth table needed. With this level of logic they could potentially implement any analog circuit on a piece of paper (if it was big enough).

a Schematic representation of the multi-branch implementation of a truth table. bImplementation of different logic gates. A schematic representation of the cells used in each paper strip and their corresponding distance points is given (Left). Gates with two sources of S₁ (OR and XNOR gates) are circuits carrying two branches, while the other gates (NOR and AND gates) can be implemented with just one branch. Input concentrations are Ara = 10⁻³ M and aTc = 10⁻⁶ M. M⁺_aTc and M⁻_aTc are, respectively, positive and negative modulatory cells responding to aTc. M⁺_ara and M⁻_ara are, respectively, positive and negative modulatory cells responding to arabinose. S₁ cells produce AHL constitutively and CR are the reporter cells. Error bars are the standard deviation (SD) of three independent experiments. The average fold change has been obtained from the mean of ON and OFF states from each circuit. OR gate 14.31x, AND gate 6.21x, NOR gate 6.58x, XNOR gate 5.6x. Source data are provided as a Source Data file.

As we learn in circuits class, any digital logic can be reduced to one of a few gates, such as NAND or NOR.

As an example of uses of the biocomputing, they implemented a mercury level sensing device. Once the device is dipped in a solution with mercury, the device will display a number of green florescent dots indicating the mercury levels of the solution

The bio-logical computer can be stamped onto any surface that supports agent diffusion, even flexible surfaces such as paper. The process can create a single use bio-logic computer, sort of smart litmus paper that could be used once and then recycled.

The computational cells stay “alive” during operation by metabolizing nutrients they were stamped with. As the biocomputer uses biological cells and paper (or any flexible diffusible substrate) as variable inputs and cells can be reproduced ad-infinitum for almost no cost, biocomputers like this can be made very inexpensively and once designed (and the input cells and stamp created) they can be manufactured like a printing press churns out magazines.

~~~~

Now I’d like to see some sort of biological clock capability that could be used to transform this combinatorial logic into digital logic. And then combine all this with DNA based storage and I think we have all the parts needed for a biological, ARM/RISC V/POWER/X86 based server.

And a capacitor would be a nice addition, then maybe they could design a DRAM device.

Its one off nature, or single use will be a problem. But maybe we can figure out a way to feed all the S, M, and CR cells that make up all the gates (and storage) for the device. Sort of supplying biological power (food) to the device so that it could perform computations continuously.

Ok, maybe it will be glacially slow (as diffusion takes time). We could potentially speed it up by optimizing the diffusion/enzymatic processes. But it will never be the speed of modern computers.

However, it can be made very cheap, and very height dense. Just imagine a stack of these devices 40in tall that would potentially consist of 4000-8000 or more processing elements with immense amounts of storage. And slowness may not be as much of a problem.

Now if we could just figure out how to plug it into an ethernet network, then we’d have something.

Photo credit(s):

2 Bit alu from Wikipedia
Figures 1 & 3 from Nature article 2D printed multi-cellular devices performing digital and analog computation

Storageless data!?

Posted on March 10, 2021March 18, 2021 by Ray in data access, data mobility, Data QoS, Distributed computing, File Storage, Strategic Inflection Points

I (virtually) attended SFD21 earlier this year and a company called Hammerspace presented discussing their vision for storageless data (see videos of their session at SFD21).

We’ve talked them before but now they have something to offer the enterprise – data mobility or storageless data.

The white board after David Flynn’s session at SFD8

In essence, customers want to be able to run their workloads wherever it makes the most sense, on prem, in private cloud, and in the public cloud among other places. Historically, it’s been relatively painless to transfer an application’s binary from one to another data center, to a managed service provider or to the public cloud.

And with VMware Cloud Foundation, Kubernetes, Docker and Linux operating everywhere, the runtime environment and other OS services that applications depend on are pretty much available in any of those locations. So now customers have 2 out of 3, what’s left?

It’s all about the Data

Data can take a very long time to move around a data center, let alone across the web between locations. MBs and even GBs of data may be relatively painless to move, but TBs of data can be take days, and moving PBs of data is suicidal.

For instance, when we signed up for a globally accessible file synch and share storage service, I probably had 75GB or so of data I wanted managed. It took literally several days of time to upload this. Yes, I didn’t have data center class internet access, but even that might have only sped this up 2-5X. Ok, now try this with 1TB or more and it’s pretty much going to take days, and you can easily multiple that by 10 to do a PB or more. And that’s if it happens to continue to perform the transfer without disruption.

So what’s Hammerspace storageless data got to do with any of this.

Hammerspace’s idea

It’s been sort of a ground truth of storage, since I’ve been in the industry (40+ years now), that not all random IO data is accessed at the same frequency. That is, some data is accessed a lot and other data accessed hardly at all. That’s why DRAM caching of data can be so important to a host or storage system.

Similarly for sequential access, if you can get the first blocks of data to the host and then stream the rest in time, a storage system can appear to read fast.

Now I won’t go into all the tricks of doing good data caching, (the secret sauce to every vendor’s enterprise storage), but if you can appear to cache data well, you don’t actually need to transfer all the data associated with an application to a location it’s running in, you can appear as if all the data is there, when actually only some of it is present.

Essentially, Hammerspace creates a global file system for your data, across any locations you wish to use it, with great caching, optimized data transfer and with real storage behind it. Servers running your applications mount a Hammerspace file system/share that stitches together all the file storage behind it, across all the locations it’s operating in.

An application request goes to Hammerspace and if the data is not present there, Hammerspace goes and fetches and caches blocks of data as fast as it can. This will let the application start performing IO while the rest of the data is being cached and if allowed, moved to the new location.

Storage can be not managed by Hammerspace, read-write managed by Hammerspace or read-only managed by Hammerspace. For customers who want the whole Hammerspace storageless data functionality they would use read-write mode. For those who just want to access data elsewhere read-only would suffice. Customers who want to continue to access data directly but want read access globally, would use the read-only mode.

Once read-write storage is assigned to Hammerspace grabs all the file metadata information on the storage system. Once this process completes, customers no longer access this file data directly, but rather must access it through Hammerspace. At that point, this data is essentially storageless and can be accessed wherever Hammerspace services are available.

How does Hammerspace do it

Behind the scenes is a lot of technology. Some of which is discussed in the SFD21 sessions (see video’s above). Hammerspace is not in the data path but rather in the control path of data access. But it does orchestrate data movement, and it does route data IO requests from an application to where the data (currently) resides.

Hammerspace also supports Service Level Objectives (SLOs) for performance, geolocation, security, data protection options,, etc. These can be used to keep data in particular regions, to encrypt data (using KMIP), ensure high performance, high data availability, etc.

Hammerspace can manage data across 32 separate sites. It takes a couple of hours to deploy. per site. Each site has a Hammerspace metadata service with standalone access to all data within that site. For example, standalone access could be used, in the event of a network loss.

At the moment, they support eventual consistency and don’t support a global lock service. Rather, Hammerspace uses a conflict resolution service in the event data is overwritten by two or more applications. For any file that was being updated in two or more locations, that file would be flagged as in conflict, Hammerspace would provide snapshots of the various versions of the file(s) and it would require some sort of manual intervention to resolve the conflict. Each location would have (temporary) access to the data it had written directly, but at some point the conflict would need resolution.

They also support NFS and SMB file access for the front end and use object storage services for backend data. Data is copied on demand to the local site’s storage when accessed based on the SLO policies in effect for it. During data movement it is copied up, temporarily into objects on AWS, Microsoft Azure, or GCP, and then copied down to the location it’s being moved to. I believe this temporary object data is encrypted and compressed. Hammerspace support KMIP key providers.

Pricing for Hammerspace is on a managed capacity basis. But anyone can use Hammerspace for up to 10TB for free. Hammerspace is available in AWS marketplace for configuration there.

~~~~

Well it’s been a long time coming, but it appears to be here. Any customers wanting hybrid-cloud operations or global access to their data would be remiss to not check out Hammerspace.

[Edited after posting, The Eds.]

The rise of MinIO object storage

Posted on February 3, 2021February 3, 2021 by Ray in Cloud storage, Crowdsourcing, Distributed computing, IoT, NVMe storage, Object storage, Storage performance, Strategic Inflection Points

MinIO presented at SFD21 a couple of weeks back (see videos here). They had a great session, as always with Jonathan and AB leading the charge. We’ve had a couple of GreyBeardsOnStorage podcasts with AB as well (listen and see GreyBeards talk open source S3… and GreyBeards talk Data Persistence …). We first talked with MinIO last year at SFD 19 where AB made a great impression on the bloggers (see videos here)

Their customers run the gamut from startups to F500. AB said that ~58% of the F500 have MinIO installed and over 8% of the F500 have added capacity over the last year. AB said they have a big presence in Finance, e.g., the 10 largest banks run MinIO, also the auto and Space/Defense sectors have adopted their product.

One reason for the later two sectors (auto & space/defense) is the size of MinIO’s binary, 50MB. And my guess for why the rest of those customers have adopted MinIO is because it’s S3 API compatible, it’s open source, and it’s relatively inexpensive.

Object storage trends

Customers running in the cloud have a love-hate relationship with object storage, they love that it scales but hate what it costs. There are numerous on prem object storage alternatives from traditional and non-traditional storage vendors, but most are deployed on appliances.

With appliances, customers have to order, wait for delivery, rack-configure-set up and after maybe weeks to months finally they have object storage on prem. But with MinIO a purely software, open source solution, it can be tried by merely downloading a couple of (Docker) containers and deployed/activated in under an hour..

As mentioned above, MinIO is API compatible with AWS S3 which helps with adoption. Moreover, now that it’s an integral part of VMware (see their new Data Persistence Platform), it can be enabled in seconds on your standard enterprise VMware cluster with Tanzu.

The other trend is that the edge needs storage, and lots of it. The main drivers of massive edge storage requirement are TelCos deploying 5G and auto industry’s self-driving cars. But this is just a start, industrial IoT will be generating reams of sensor log data at the edge, it will need to be stored somewhere. And what better place to store all this data, but on object storage. Furthermore, all this is driving more adoption of object storage, with MinIO picking up the lion’s share of deployments.

In addition, MinIO recently ported their software to run on ARM. AB said this was to support the expanding hobbyist and developers community driving edge innovation.

And then there was Kubernetes. Everyone in the industry (with the possible exception of Google) is surprised by the adoption of K8S. Google essentially gifted ~$1Bs in R&D on how to scale apps to the world of IT, and now any startup, anywhere, can scale with as well as Google can. And scaling is the “killer app” for the SW industry.

But performance isn’t bad either

Jonathan made mention of MinIO performance (see MinIO 24 node disk and MinIo 32 node NVMe SSD reports) benchmarks. Their disk data shows avg read and write performance of 16.3GB/s and 9.4GB/s, respectively and their NVMe SSD average read and write performance of 183.2GB/s and 171.3GB/s, respectively. The disk numbers are very good for object storage, but the SSD numbers are spectacular.

It turns out that modern, cloud native apps don’t need quick access to data as much as high data throughput. Modern apps have moved to a processing data in memory rather than off of storage, which means they move (large) chunks of data to memory and crunch on it there, and then spit it back out to storage This type of operating mode seems to scale better (in the cloud at least) than having a high priced storage system servicing a blizzard of IO requests from everywhere.

Other vendors had offered SSD object storage before but it never took off. But nowadays, with NVMe SSDs, MinIO is seeing starting to see healthcare, finance, and any AI/ML workloads all deploying NVMe SSD object storage. Yes for large storage repositories, (object storage’s traditional strongpoint), ie, 5PB to 100PB, disk can’t be beat but where blistering high throughput, is needed, NVMe SSD object storage is the way to go.

Open source vs. open core

AB mentioned that MinIO business model is 100% open source vs. many other vendors that use open source but whose business model is open core. The distinction is that open core vendors use open source as base functionality and then build proprietary, charged for, software features/functions on top of this.

But open source vendors, like MinIO offer all their functionality under an open source license (Apache SM License V2.0, GNU AGPL v3 Open Source license and other FOSS licenses), but if you want to use it commercially, build products with it embedded inside, or have enterprise class support, one purchases a commercial license.

As presented at SFD21, but their website home page has updated numbers reflected below

The pure open source model has some natural advantages:

It’s a great lead gen solution because anyone, worldwide, 7X24X365 can download the software and start using it, (see Docker Hub or MinIO’s download page
It’s a great hiring pool. Anyone, who has contributed to the MinIO open source is potentially a great technical hire. MinIO stats says they have 685 contributors, 19 in just the last month for MinIO base code (see MinIO’s GitHub repo).
It’s a great development organization. With ~20 commits a weekover the last year, there’s a lot going on to add functionality/fix bugs. But that’s the new world of software development. Given all this activity, release frequencies increase, ~4 releases a month ((see GitHub repo insights above).
It’s a great testing pool with, ~480M Docker Pulls (using a Docker container to run a standard, already configured MinIO server, mc, console, etc.) and ~18K enterprises running their solution, that’s an awful lot of users. With open source a lot of eye’s or contributors make all problems visible, but what’s more typical, from my perspective, is the more users that deploy your product, the more bugs they find.

Indeed, with the VMware’s Data Persistence Platform, Tanzu customers can use MinIO’s object storage at the click of a button (or three).

Of course, open source has downsides too. Anyone can access packages directly (from GitHub repo and elsewhere) and use your software. And of course, they can clone, fork and modify your source code, to add any functionality they want to it. Historically, open source subscription licensing models don’t generate as much revenues as appliance purchases do. And finally, open source, because it’s created by geeks, is typically difficult to deploy, configure, and use.

But can they meet the requirements of an Enterprise world

Because most open source is difficult to use, the enterprise has generally shied away from it. But that’s where there’s been a lot of changes to MinIO.

MinIO always had a “mc” (minio [admin] client) that offered a number of administrative services via an API, programmatically controlled interface. but they have recently come out with a GUI offering, the minIO console, which has a similarly functionality to their mc APU. They demoed the console on their SFD21 sessions (see videos above).

Supporting 18K enterprise users, even if only 8% are using it a lot, can be a challenge, but supporting almost a half a billion docker pulls (even if only 1/4th of these is a complete minIO deployment) can be hell on earth. The surprising thing is that MinIO’s commercial license promises customers direct-to-engineer support.

At their SFD21 sessions, AB stated they were getting ~2.7 new (tickets) problems a day. I assume these are what’s just coming in from commercial licensed users and not the general public (using their open source licensed offerings). AB said their average resolution time for these tickets was under 15 minutes.

Enter SubNet, the MinIO Subscription Network and their secret (not open source?) weapon to scale enterprise class support. Their direct-to-engineer support model involves a much, more collaborative approach to solving customer problems then you typical enterprise support with level 1, 2 & 3 support engineers. They demoed SubNet briefly at SFD21, but it could deserve a much longer discussion/demostration.

What little we saw (at SFD21) was that it looked almost like slack-PM dialog between customer and engineer but with unlimited downloads and realtime interaction.

MinIO also supports a very active Slack discussion group with ~11K users. Here anyone can ask a question and it will get answered by anyone. MinIO’s Slack has 2 channels: (Ggeneral and GitHub for notifications). It seems like MinIO is using Slack as a crowdsourced level 1 support.

But in the long run, to continue to offer “direct-to-engineer” levels of support, may require adding a whole lot more engineers. But AB seems prepared to do just that.

~~~~

MinIO is an interesting open source S3 API compatible, object storage solution that seems to run just about anywhere, is freely deployable with enterprise class support available (at a price) and has high throughput performance. What’s not to like.