January 2024 – Silverton Consulting

Over the past year or so I’ve been hearing a lot about a new use of blockchain technology to deploy a compute cloud.

In the old days, mining crypto would reward you for doing the work. But over time, it’s become harder to mine and to make money from crypto. Specialized hardware took over more of this activity making it much less profitable for the rest of us

But with the emergence of crypto distributed compute clouds, this maybe changing. Akash is a relatively popular one, but I read an article in ScienceDaily about the use of the Golem network to implement a search for earth’s chemistry precursors that led to life (see: Chemists use the blockchain to simulate … the origins of life) which was was describing a CHEM open access paper Emergence of metabolic-like cycles in blockchain-orchestrated reaction networks.

The science

The science was intended to simulate chemical reactions based on chemicals available to primitive earth to determine which reaction chain(s) could lead to life. They programmed the set of reactions and the chemicals available (water, methane, & ammonia) to early earth and intended to let this run and generate all possible reaction cycles.

The researchers realized that doing this much computation would require more compute power than available to them. So they decided to deploy the computations across a distributed compute cloud. They chose the Golem Network to do their computations. Their computations ultimately resulted a reaction cycle database they called the Network of Early Life (NOEL) (see: NOEL Network).

Once the distributed cloud compute was in operation they used it to come up with 11B reaction cycles of which ~5B would “entail no incompatibilities or selectivity conflicts”. They then used these to construct a series of metabolic network 100K larger than ever produced before as depicted in NOEL.

Using NOEL, the team was able to discover some standard metabolic pathways (reaction cycles) and a limited set that produced simple sugars and amino acids could emerge from the chemicals available to primitive earth.

But they also found about a 100 reaction cycles that involved self-replicating molecules (molecules that could create additional copies of themselves). Self replicating molecules is also believed to be a requirement for the origin of life.

It turned out that the work to construct NOEL on the Golem network took 400 machines, over 20K cores and two months to do the calculations. The cost to them was 82K GLMs (at ~0.21 GLM/USD this would be $17.2K). The team estimated it would have required a top of the line AMD 256 core server about 6 months to compute which would have cost substantially more to purchase and of course running it for 6 months would cost even more.

The team chose Golem, because the work only needed to be available in the form of docker containers, didn’t require the central work server to be online constantly, automatically matched the compute with cloud resource, and managed it all using a cryptographically secure and distributed interface.

Distributive compute cloud

The science is interesting but what’s more interesting (to me) is it was done using a crypto distributed computing cloud.

Looking at the Golem network statistics they show ~510 compute providers with about 5000 cores available of which 50-100 providers supplied computing use to the cloud over the past 4 hrs (26Jan2024: 1600 MDT). That doesn’t seem like a lot of providers but each could have multiple servers running compute.

The Golem network provides a relatively straightforward tutorial on how to set up a server to supply compute to the network. There are some tricks (port forwarding, screen/tmux deployment) but it all seems pretty straight forward (probably something even I could do in an hour or so).

And when you start supplying compute to Golem mainnet, you earn GLMs which are a cryptocurrency (ERC20 coin of ETH). So one should easily be able to convert GLM to ETH and whatever currency you desire.

Many former crypto miners have idle servers that could be put to use providing resources to distributed compute clouds. And if I thought doing so might help some (under resourced organization) produce real scientific research, I might be even more tempted to do so.

~~~~

So if you’ve got some servers sitting idle in your (home) office. This weekend, fire them back up, install the Golem provider software and run the Golem network. Who knows by doing so, you just might help some researcher someplace change the world.

Picture credit(s):

From the CHEM Emergence of metabolic-like cycles in blockchain-orchestrated reaction networks article
From the Golem stats page

Read an article in MIT Tech Review (Google’s DeepMind’s new AI systems can solve complex geometry problems) about AlphaGeometry which is a new AI tool that DeepMind has come up with that can be used to solve geometry problems. The article was referring to a Nature article (Solving olympiad geometry without human demonstrations) about the technology.

DeepMind has tested AlphaGeometry on International Mathematics Olympiad (IMO) geometry problems and have shown that it was capable of performing expert level geometry proofs.

There’s a number of interesting capabilities DeepMind used in AlphaGeometry. But the ones of most interest from my perspective

How they generated their (synthetic) data to train their solution.
Their use a Generative AI LLM which is prompted with a plane geometry figure, theorem to prove and generates proof steps and if needed, auxiliary constructions.
The use of a deduction rule engine (DD) plus algebraic rule engine (AR), which when combined into a symbolic engine (DD+AR) can exhaustively generate all the proofs that can be derived from a figure.

First the data

DeepMind team came up with a set of rules or actions that could be used to generate new figures. Once this list was created it could randomly select each of these actions with some points to create a figure.

Some examples of actions (given 3 points A, B and C):

Construct X such that XA is parallel to BC
Construct X such that XA is perpendicular to BC
Construct X such that XA=BC

There’s sets of actions for 4 points, for 2 points, actions that just use the 3 points and create figures such as (isosceles, equilateral) triangles, circles, parallelograms. etc.

With such actions one can start out with 2 random points on a plane to create figures of arbitrary complexity. They used this to generate millions of figures.

They then used their DD+AR symbolic engine to recursively and exhaustively deduce a set of all possible premises based on that figure. Once they had this set, they could select one of these premises as a conclusion and trace back through the set of all those other premises to find those which were used to prove that conclusion.

With this done they had a data item which included a figure, premises derived from that figure, proof steps and conclusion based on that figure or ([figure], premises, proof steps, conclusion) or as the paper uses (premises, conclusion, proof steps). This could be transformed into a text sequence of <premises> <conclusion> <proof steps>. They generated 100M of these (premises, conclusion, proof steps) text sequences

They then trained their LLM to input premises and conclusions as a prompt to generate proof steps as a result. As trained, the LLM would accept premises and conclusion and generate additional proof steps.

The challenge with geometry and other mathematical domains is that one often has to add auxiliary constructions (lines, points, angles, etc.) to prove some theory about a figure.

The team at DeepMind were able to take all the 100M <premises> <conclusion> <proof steps> they had and select only those that involved auxiliary constructions in their proof steps. This came down to 9M text sequences which they used to fine tune the LLM so that it could be used to generate possible auxiliary constructions for any figure and theorem

AlphaGeometry in action

The combination of (DD+AR) and trained LLM (for auxiliary constructions) is AlphaGeometry.

AlphaGeometry’s proof process looks like this:

Take the problem statement (figure, conclusion [theorem to prove]),
Generate all possible premises from that figure.
If it has come up with the conclusion (theorem to prove), trace back and generate the proof steps,
If not, use the LLM to add an auxiliary construction to the figure and recurse.

In reality AlphaGeometry generates up to 512 of the best auxiliary constructions (out of an infinite set) for the current figure and uses each of these 512 new figures to do an exhaustive premise generation (via DD+AR) and see if any of these solves the problem statement.

Please read the Nature article for more information on AlphaGeometry.

~~~~

IMHO what’s new here is their use of synthetic data to generate millions of new training datums, fine tuning their LLM to produce auxiliary constructions, combining the use of DD and AR in their symbolic engine and then using both the DD+AR and the LLM to prove the theorem.

But what’s even more important here is that a combination of methods such as a symbolic engine and LLM points the way forward to create domain specific intelligent agents. One supposes, with enough intelligent agents, that could be combined to work in tandem, one could construct an AGI ensemble that masters a number of domains.

Picture Credit(s):

From Nature article, Solving olympiad geometry without human demonstrations
From Euclid’s proof of Pythagorean Theorem

Month: January 2024

Blockchain Compute cloud

The science

Distributive compute cloud

DeepMind takes on Geometry, AGI part-9

First the data

AlphaGeometry in action