Strategy – Silverton Consulting

Life hacks: living (phone) life in B&W

Posted on March 5, 2026 by Ray in Decision making

I’ve been running an experiment with my phone. I’ve changed it to display in B&W only. Sometimes I change it back to view a picture or video but for the most part it stays in B&W.

My phone use has been going through the roof these past few years. Occasionally mostly on weekends I’ve tacked up 6+hrs of screen time a day and thought there’s got to be something I can do to reduce this.

I’ve tried setting time limits but that’s too easy to ignore. I’ve abandoned most social media, FB, Twitter (X), Reddit, BlueSky etc. [Ok LinkedIn still counts but it’s less a true social media platform these days than just keeping up with connections, but I can see it’s moving in that direction…] But doing all that just seems to move my phone activity elsewhere.

I hate commercials so watching Youtube with commercials on my phone is painful, so that’s not a major problem, for me at least.

But the damn web, news sites, emails, slack channels, etc. it’s all just waiting there to fetch up your phone and start scrolling…

So, I read an article (I think the NYT), a while back, that suggested trying out changing your phone to display B&W only. It’s relatively easy to change from B&W to full color and back again. [On both the iPhone and Android its under accessibility controls.] So decided to try this.

Not sure how long it’s been but it’s been at least a couple of months by now. My best guess is it drops my screen time on average 30% a day. Not much I know but going from 6 hours to 4 is a meaningful amount. And even after a couple of months, it’s still working.

Now, it’s got me wondering if doing this with TVs, desktop displays, tablets, magazines, et al, would have the same impact.

Tried to change my desktop display to B&W but couldn’t find anyway to do this. I may have to do it at the laptop settings level.

I don’t think they make B&W TVs anymore but maybe there’s some out there that can be made to display in only B&W. Wonder if this would change my binge watching.

Magazines are typically full color but maybe I could move back to newsletters or newspapers and view/print them out in B&W. I remember when USA Today debuted as a color newspaper and it was very popular (for as long as newspapers still had a monetization model that worked in print). Most print newspapers are dead these days. Surprisingly, B&W WSJ still persists…hmm

One wonders what I do with the 30% of my phone time that moving it to display only B&W freed up for me. I must say it’s most probably spent reading (B&W) books and rarely, spending more time in nature. Is this a good trade off.

I think yes, altho even more time in nature would be better.

Comments?

What R/R tracks can tell us about AI deployment

Posted on February 18, 2026 by Ray in Market dynamics, Strategic Inflection Points

I saw this chart the other night in a history class I’m taking. No idea where it was sourced from but I found it more intriguing than the discussion going on.

There’s an awful lot of R/R track in Europe, Eastern US, India, Japan and Eastern China. Not much elsewhere. There are vast spaces of emptiness in northern Canada, northeastern Russia, northern and central Africa, Northern and southern South America. northern and central Australia, and others.

The question that comes to mind is why the open space. Yeah mountainous regions could present a problem but the Alps didn’t seem to inhibit R/R track laying in Europe. Tundra and deserts maybe a problem. But still South America, Upper Canada, Russia, elsewhere outside Africa don’t fit that pattern. Population density maybe but Africa and China don’t fit that pattern.

And then I thought there’s a technological change that happened during the 20th Century that made R/R not as necessary to economic development. Namely the advance of the automobile, tractor trailers, highways/roadways, etc. But these didn’t really take off until after the 1950s. Arguably there were at least 100yrs of R/R dominance in transportation that occurred between 1850 and 1950

Many of the open spaces were actively fought over. e.g. Africa, South America, Asia, etc. and attempts were made to develop them throughout the 19th century. But they still never got the density of railways that the advanced economies had. And what explains the less dense portions of USA from the higher density portions of the USA. And the discrepancy between India’s track density and the lack of density in mainland China.

Re the US, I can only think that subsidies ran dry after a while in the USA which curtailed R/R track construction. But the vast majority of track laid in the Eastern US was not subsidized by the government (IMHO) why so dense there.

I suppose timing could account for some of this variance. R/R track was laid to support transport of goods and people. The relative sparsity of population of the Western US (at least during 1850-1950) may have had an impact on the R/R track laid down.

I believe two main factors combine to dictate how that R/R track map looks today:

The availability of capital for infrastructure development
The economic need to support/improve transport to market for industrial goods and agriculture

Claude Sonnet 4.6 created map of long haul fibre connections over the world, using the prompt “Can you find or create a world map, showing where current long haul fibre links exist today”

But none of that tells us why China’s interior doesn’t have a dense R/R track network. My guess is that although the population, industry and (agricultural) production was high in China during 1850-1950, capital and a centralized authority to protect property was missing at the time.

Great Britain had the money (during 1850-1925, at least) and used it to develop the R/R network in India, but didn’t do the same in Australia why. Because the need wasn’t as great, because Australia didn’t have the population, agricultural production, industrial production of India, probably.

What does all this mean for AI

R/R technology was economically essential for much of the 100yrs between 1850-1950. If we assume that AI will fill a necessary economic niche 2020-?, as essential as the R/R, we should see similar developments driving how AI is deployed.

Ultimately, we should see AI data centers be deployed mainly in support of industrial and agricultural production and services in areas where capital is available and can be deployed. AI adoption will likely not occur or be deployed as much in areas that are less advanced economically, mainly because of lack of capital and legal infrastructure to protect it.

IMHO, AI and AI datacenter deployments will probably look similar to the R/R track map above with some minor changes. It will follow the money, economic need and legal structures needed to support it.

In today’s world, literally awash in capital searching for investment, capital shouldn’t be a limiting factor. And legal infrastructure protecting property is almost universal throughout the world these days.

However economic activity and the need to support it is widely variable and dispersed in today’s world. Some regions have migrated away from manufacturing to services, others have undergone a serious manufacturing build out, but all need agriculture to sustain populations.

All that may change the AI deployment maps from matching the R/R track map above a bit. As a result we may see a broader spread for AI deployments than R/R track of yesterday.

I believe the main deployments for AI data centers will be throughout the coasts of USA, with lots in the Eastern states, some in the midwest, lots in Europe, lots in China, India, Japan, Korea, Israel, Taiwan, and Australia/NZ maybe, with spots in South East Asia, spots in Africa, and spots in South America.

Claud Sonnet 4.6 created map using prompt “now can you find or create a similar map showing the current and proposed AI data centers as dots on a world map”

I suppose similar maps could be used to display electricity generation and transport, telephone lines, and fibre channel connections (tried that above but wasn’t as useful as I thought). If I’m correct they should all look similar to the above with minor changes based on when the technology was economically essential.

Comments?

Hammerspace and the Open Flash Platform at #AIIFD3

Posted on September 19, 2025 by Ray in AI storage needs, Ethernet, File Storage, Storage density, Storage performance, Strategic Inflection Points

Was at AI Infrastructure Field Day 3 (AIIFD3) last week in CA and Hammerspace presented. (videos here). Molly and Floyd talked about their solution and some of their recent MLCommon’s performance results but Kurt discussed the Open Flash Platform (OFP) Consortium, announced last July which they and partners have been working on..

OFP currently has 6 partners ranging from Hammerspace (storage software supplier), SK Hynix (NAND and SSDs) and Linux Foundation among others and includes end users (Las Alamos National Labs), computational storage (ScaleFlux) and AI solution providers (Xsight).

As I understand it, the OFP is pushing to become a standard adopted by the Open Compute Project (OCP).

OFP is an attempt to redefine NAS as we know it. Hammerspace has been on this journey for a long time with their software only solution but technology is now at a place where it’s time to tackle hardware changes to NAS that would enable even better performance and throughput for AI and other data intensive workloads.

Some of the technology changes driving the need for a different approach to NAS storage:

NAND capacities are going through the roof, accessing all that capacity in an effective and performant way, requires a re-architecturing of the storage stack
Compute is becoming more widespread and ubiquitous. Every thing seems to have more and more compute capability that it’s causing a rethink as to how to take advantage of all this ubiquitous compute to better address IT (and AI) performance needs
AI bandwidth and performance requirements are extreme and are only becoming more so. .
Power has become a limiting factor in many AI deployments.

Hammerspace has addressed much of this from a software perspective with their Linux standards efforts to implement Parallel File System and Flex Files in the Linux kernel and in NFS standards as NFSv4.2. PFS and FlexFiles allows Hammerspace to offer very high file bandwidth and data mobility that can’t be supplied any other way.

So it’s time to see what can be done in hardware to make this even better. Enter OFP.

OFP, NAS storage reborn

The idea is to come up with a new packaging of an NFS (v3) server that’s all storage with high amounts of networking and enough compute to serve the storage. Effectively they are putting a DPU (computational intensive networking card) with 1-800Gbps Ethernet connection in front of a train (or toboggan) of NVMe SSDs and calling this a sled.

Their first version using U.2 NVMe SSDs, offers 1PB of capacity with 800Gbps of networking in a 3.5″ X 1.75″ form factor. They would load a NFS v3 Linux based storage server in the DPU and have it run that along with the Networking stack (and more) on the DPU and have access to all this storage capacity in what essentially is a NFSv3 (relatively dumb storage) storage sled.

Package 6 of these together with a couple of power supplies and now you have 6PB raw capacity in 1RU, with 4.8Tbps of bandwidth, consuming .6 kW of power (presumably this is power consumption at idle).

You will no doubt note that the sled, as configured above, does not allow for hot (or even cold) drive replacement. So when drives fail, the NFSv3 code would need to recover from them and take them out of service. So that over time the sled could still be used even though some SSDs have failed.

In the future, moving from U.2 SSDs to E2(E) NVMe SSDs in the storage sled quadruples the capacity while staying in the same power envelope and supplying the same bandwidth. Again the SSDs are not intended to be (hot or cold) swappable, so drive failure would need to be handled by software. With E2(E) SSDs in a sled and 6 of these in a 1RU, one would have 24PB of storage capacity.

Presumably, OFP Sleds could be hot swappable when enough SSDs in a sled fails.

And of course QLC capacities are not standing still so another doubling of these capacities could easily be possible within the next couple of years (imagine 48PB in a single RU, boggles the mind).

The NAS software one runs in the OFP SLED could be any NFSv3 server software but Hammerspace has their own, called DSX. And when you combine DSX servers with lots of capacity and lots of networking bandwidth, Hammerspace’s NFSv4.2 PFS and FlexFiles can really fly.

And with the power and space efficiency as well as extreme bandwidth available, it could be a winning formula for the AI environments, in contrast to scale-out NAS which is the current alternative.

~~~~

But it seems to me any organization (hypervisors are you listening) with intense storage capacity and storage bandwidth needs would be very interested in the OFP for their own environment.

Comments?

The curse of Scale & AGI

Posted on August 13, 2025 by Ray in AGI, Reinforcement Learning, Strategic Inflection Points

For the past 1/2 decade or more, new generation foundation models have all become significantly (10X or more) larger in parameters than their last versions. The presumption being that more parameters will always lead to better models, better inferences, more users, etc. This has been primarily driven by compute scaling, more compute thrown at training results in bigger models.

But the problem is at some point any process reaches saturation or a point of marginal return where throwing more (of anything) at it only gets marginally better, not incrementally or at least not commensurate with the additional cost. It’s unclear if we are there yet with foundation models, but my guess we are reaching it rapidly.

It’s interesting that ChatGPT-5 seems to have the same number of parameters as ChatGPT-4 (~1.8T).

Not being an active user of foundation models, I can’t really tell if …-5 is much better than …-4, but consensus seems to be they are not getting as better as they used to.

There are probably a number of reasons why this could be the case. The data wall for one. The power and cooling cost of exponentially increasing AI model size is impacting not just training costs but inferencing costs as well. But the end of the scaling advantage maybe another.

Don’t get me wrong if it wasn’t for compute scaling we wouldn’t have the AI we have today. NN training processes were invented in the 50s of last century, but they didn’t have the compute power to use them at the time.. It wasn’t until this century that computation caught up.

As more compute power became available, those old compute bound techniques proved to be the lynchpin for DNN training and we are still riding that curve today, up to a point.

It’s just that speeding up and doing the same old DNN training will lose effectiveness at some point, if not today, then tomorrow.

I’ve seen it myself in some rudimentary models I have trained. At some point adding nodes, layers, training epochs, etc., just doesn’t always result in better models. They often get worse.

AGI

And AGI, I believe, will require us to take a different tack than current foundational model DNN training to get right. Call it a hunch. But one can see glimmers of this in the fact that AGI is always just years away.

In order to achieve AGI, for safety reasons, for planetary climate reasons, and because scale is not getting us there anymore, I strongly believe we need to rethink our approach to foundation model training.

I’m no expert but I think what needs to change is more use of (deep) reinforcement learning (DRL), not just the human feedback reinforcement learning (HFRL) used today for fine tuning foundation models. This would mean using DRL much earlier, more comprehensively in all of phases of foundational model training.

Yes, DRL also consumes compute infrastructure and more “training episodes” for DRL can often lead to better model outcomes, but not always.

DRL training for AGI models

For any reinforcement learning to work, one needs a reward signal that can be used to signal how to optimize the DRL model. So, the real challenge in the use of more DRL for foundation model training is what (or who) supplies that reward signal from some action taken by the DRL model.

Historically, for games reward signals came from the game environment (or model), for robotic motion it can come from physics simulators or movement in the real world.

But any reward signal for AGI foundation models would need much more sophistication than the above.

The easy answer is to create world simulation models. Something that could simulate how the world (in total) would react to an action (or inference) of the foundation model.

But that’s not easy, world simulation models, at the fidelity needed to support DRL for AGI foundation models don’t exist and few if any researchers (AFAIK) are working on getting us there.

But there are some rudimentary baby steps that already exist. Physics engines (or models of real world physical processes) have existed for a long time now and would no doubt be the core of any world simulation model. Nature simulation models exist at least for climate and weather and these could also be incorporated into any world model.

What’s missing would be

Geophysical world simulations that would model how the world would react to actions taken by a AGI model. I’m aware of many petroleum earth based simulations ditto for plate tectonics, wind, and water movement, but these would all need to be combined into something that provides a entire world, geophysical reactions to model actions,
Biospherical world simulations that would model (at least at some level) how the (biological, i.e. animal, plant, fungi, microbe, etc.) natural world would react to actions. Weather models may have some of this, at least with respect to carbon cycles which span human-natural boundaries but we would need a lot more.
Psychological world simulations, or something that would simulate how a person and how a population of humans would react to actions taken by a model. I am unaware of anything available at this level except for a simulation of a baby I saw at SigGraph a couple of years ago. There would need to be a lot more work here to get this up to a level to support AGI training.
Sociological-Political world simulations or something that would model how human society across the world would react to model actions. Again some of these exist, at an even more rudimentary level than financial or weather modeling, and we would need a lot of work to get them to a level of fidelity needed for AGI training.
Financial-Business world simulations that would determine the financial reactions to model actions. Some of these exist for national economies, but would need broadened to the world at large and to much finer resolution, granularity to be suitable to support AGI foundational model training.

I am certainly missing some or more critical models that may be needed for true world simulations but these could provide a start. They would need to be combined, of course, in some fashion.

And determining the various reward weights would be non-trivial. It seems to me that each of these simulations could have multiple reward signals for any action. Combining them all may be non-trivial. But those are parameter optimizations, which once we have world models working in unison we can tweak at will.

Then there’s the “action space” for an AGI model. For games and robotic motion, the actions are well defined and finite. For an AGI model, it would seem that the actions are potentially infinite. Even if we limited it to a single domain such as tokenized text strings, the magnitude of such actions would be 10K-10M X anything tried before with DRL. But I still believe it’s doable

Once we had such a model together, with a decent reward function and had some way to categorize/grasp the infinite actions that could be taken by an AGI, DRL could be used to train an AGI.

Of course this may take a few “billion or trillion” actions/training episodes to get something worthwhile out of it.

But maybe after something like (or 10M X) that we could create a safe and effective AGI.

~~~~

Comments?

Photo Credit(s):

OCP Summit 2024, AMD Hardware Optimizations for power efficient AI, presentation slide
Thomas Jefferson National Accelerator Facility (Jefferson Lab), flickr photo
SigGraph 2024, Beyond the illusion of life, Keynote presentation slide

AGI, SuperIntelligence and “The Last Man”

Posted on May 30, 2025 by Ray in AGI, AI Agents, Cognitive computing, Executive leadership

Nietzsche wrote about the last man in Thus Spoke Zarathustra (see Last Man wikipedia article). There’s much to dislike about Nietzsche’s writing but every once in a while there are gems to be found. (Sorry for the sexist statement, it’s not me, blame Nietzsche).

It Zarathustra, Nietzsche talks of the Last Man in contempt. They no longer struggle in their daily life. They no longer create. They have an easy life filled with leisure and entertainment and no work to speak of.

From AGI to SUperIntelligence

I’ve discussed AGI many times before (I think we are up to AGI part 12, this would be part 13 and ASI (Artificial SuperIntelligence) part 3, this would be 4. But I’m thinking numbering them is not helping anymore). How to get there. the existential risk getting there. and many other facets of the risks and rewards of AGI. (Ok less on the rewards…).

I’ve also discussed Artificial SuperIntelligence (ASI). This is what we believe can be attained after AGI. If one were to use AGI to improve AI training algorithms, AI hardware, AI inferencing and use AGI to generate massive amounts of new scientific research/political research/economic research, etc. One could use the new data, the better training, inferencing, and AI hardware to create as ASI agent.

The big debate in the industry is how fast can one go from AGI to ASI. I don’t believe there’s any debate in the industry that SuperIntelligence can be obtained eventually.

There are those that believe

it will take many 3-5-10(?) years to attain SuperIntelligence because of all the infrastructure that has to be put in place to create current LLMs, and the view that AGI will need much more. Thus, build out is years away. If that’s the case it will take more years of infrastructural production, acquisition and data center build out to be ready to train SuperIntelligence after attaining AGI.
It will take just a few years 1-2-3(?) to achieve SuperIntelligence after AGI. This is because, one could use AGI to improve the AI training & inferencing algorithms and drastically increase the utilization of current AI hardware, such that there may be no need for any additional hardware to reach SuperIntelligence. Then the prime determinant of the time it takes to achieve SuperIntelligence is how fast AGI(s) can generate new scientific, medical, sociological, etc. research needed to train SuperIntelligence .

Yes, much scientific, et al research requires experimentation in the real world, (although much can now be done in simulation). But even physical experimentation is being rapidly automated today.

So the time it takes to generate sufficient research to create enough data to train an ASI may be very short. Just consider how fast LLM agents can generate code today to get a feel for what they could do tomorrow for research.

Maybe regulatory bodies could slow this down. But my bet would be that regulatory artifices would turn out to be ineffectual. At best they will drive AGI-ASI training/deployment activity underground which may delay it a couple of years while organizations build up the AI training infrastructure in hiding.

The one serious bottleneck may be AI data center’s power requirements. But if rogue states can build centrifuges to enrich radioactive materials, intercontinental missiles, biological warfare agents, etc., they can certainly steal/buy/find a way to duplicate AI data center infrastructure components.

Regulatory regimens, at worst, would completely ignored by state actors and all large commercial enterprises. The first mover advantages of AGI and ASI are too large for any organization to ignore.

What happens when SuperIntelligence is reached

I see one of two possibilities for how the achievement of AGI and SuperIntelligence plays out, with respect to humanity

Humankind Utopia – AGI & ASI agents can do anything that humans can do and do it better, faster, and more efficiently. The question remains what would be left for humanity to do when this is reached. Alright, at the moment, LLM agents are mostly limited to working in the digital domain. But with robotics coming online over the next decade, this will change to add more real world domains to whatever AGI-ASI agents can do.
Humankind Hell – AGI & ASI agents determine that humanity is a pestilence to the Earth and starts to cut them back to something that’s less consumptive of Earth resources. Again, although AI agents are restricted to the digital domain today, that won’t last for long, especially as AGI & ASI agents go live. So robots with ASI agents will be the worst aggressor in the history of the world and with the tools at their disposal, they could easily create biological, chemical and other weapons of mass destruction to deploy against humanity.

SuperIntelligence risk and rewards

It’s been obvious to me, SciFi authors and some select AI researchers that there is a sizable risk that a SuperIntelligence, once unleashed, will eliminate, severely restrict or enslave humanity resulting in Humanity’s Hell.

On the other extreme are many corporate CEO/CTOs and other AI researchers which believe that SuperIntelligence will be a Godsend to humankind. Once it arrives and is deployed, humanity will no longer have to do any work it does not want to do. All work will be handed off to robots and their ASI agents which will perform it at greater speed, with higher quality and with lower cost than can be conceivable done today.

What seems to be happening today with current AI agents is that some white collar work is becoming easier to perform, if not totally eliminated. CEO’s see this as an opportunity to reduce workforce size. For example, some CEOs are eliminating HR organizations with the belief that LLM chatbots together with a much smaller group can handle this all of what HR was doing before.

And of course as AI agents become more sophisticated this will ensure more workforce reductions. And once AI agents are embodied in robotics, blue collar workforce will also be at risk.

Human Utopia and “The Last Man”

Nietzsche’s was writing in the late 1800s when technology and automation were just starting to make a difference in the world of work. But the industrial revolution was in full steam and had already had significant impact on the work force.

Nietzsche believed that further industrialization, it continued (which of course it has), would result in the Last Man.

The Last Man is at the point where technology and automation has taken over all tasks, trades and work, and where the Last Man has no real duties they need to perform other than consume goods and services provided by automation. For the Last Man, wealthy or poor no longer have any consequences, as they can have anything they could possibly desire.

To Nietzsche, the Last Man is an anathema. He believes that true humanity requires struggle, striving and advancement. Once the Last Man is achieved all these will no longer matter, no longer be a part of humanities existence and no longer impact one’s lifestyle.

When humanity no longer has to struggle, strive and advance, humanity will lose the very essence that makes humanity human. We will, over time, lose the ability and desire to do any of that, as it all becomes the purview of AGI-ASI.

The Last Man is coming already

Example 1: Ethiopian Flight 409 2010 disaster (see wikipedia article) is one example in a very technical domain. As I understand it, the flight was enroute to France when it went into a stall, the pilots did the wrong thing to get out of it and they spiraled into the sea.

The pilot was the most experienced pilot in the airline (logged over 10K flight hrs). The co-pilot was much less experienced. Getting out of a “stall” is rudimentary to flying. In fact, exiting a stall is one of the important skills taught to all pilots and in fact, they need to demonstrate they can get out of a stall before they get their pilot licenses.

The “problem” had been brewing for a while. Ever since aircraft auto-pilots came into service, real live pilots did less and less real flying of airplanes. As a result, these two pilots forgot how to get out of a stall and it caused the accident.

Example 2: Self-driving technology has been rapidly improving over the last decade or so. We often become dependent on its capabilities and when there’s some sort of failure it can be disastrous because we have lost many of our most important driving skills.

In my case, we have a relatively dumb car with what they call “”smart cruise control”. You can set it to a speed and the vehicle will retain that speed unless a vehicle in front of you is going slower, then it will slow down to maintain some set distance behind that vehicle.

We were driving along and a truck cut into our lane. This truck had a very high backend profile with no structures where normal vehicles would protrude until you got to its tires. Well the smart cruise control didn’t detect its existence until we were almost underneath the truck bed. We tried to brake but it took too many seconds to get that done and in the end we had to go off the road to save ourselves. We had lost our emergency braking skills and situational awareness skills. Nowadays we don’t drive with cruise control on as much.

A multitude of examples exist that show AI and automation has led to humans becoming less skilled at some activity. And when AI automation doesn’t work properly, bad things happen, because we no longer know how to react properly.

The Last Man, here today, gone tomorrow.

So imagine a life where you are born with everything you could possible need to succeed. You are educated by the very best automated personal tutors. You are provided an (Amazon and Walmart) X 1000, with unlimited credit. You grow up with everyone else having just the same life as you because all of you have no work to do and have infinite sums and have infinite products to consume.

Life in such a utopia would from some perspective be almost Godlike. But if you take the perspective that humanity needs struggle, needs challenges, needs to strive to better themselves at every stage, such a life would be a disaster.

And that’s what Humanity’s Utopia would look like. Definitely better than Humanity’s Hell but in the end, not sure the difference matters as much.

~~~

I just don’t really see any path forward that’s good for humanity where AGI and SuperIntelligence exists.

Stopping AI development here today, seems idiotic, going where we seem to be going seems insane.

Comments?

Picture Credit(s):

Friedrich Nietzsche by Friedrich Hermann Hartmann
ChatGPT logo by By User:Random837 – Own work (imitated from File:ChatGPT-Logo-2022.svg), Copyrighted free use,
Ethiopian Airline plane By Alastair T. Gardiner, CC BY-SA 4.0,

AlphaEvolve, DeepMind’s latest intelligence pipeline

Posted on May 21, 2025May 20, 2025 by Ray in AI Agents, Artificial Intelligence, Cognitive computing, Strategic Inflection Points

Read an article the other day from ArsTechnica on AlphaEvolve (Google Deepmind creates .. AI that can invent…). After Google announced and released their AlphaEvolve website and paper.

Essentially they have created a pipeline of AI agents (uses GeminiFlash and GeminiPro) that uses genetic/evolutionary techniques to evolve code tor anything really that can be transformed into code to be improve or solve something that has code based evaluation techniques.

Genetic evolution of code has been tried before and essentially it uses various combinatorial (splitting, adding, subtracting, etc.) techniques to modify code under evolution. The challenge with any such techniques is that much of the evolutionary code is garbage so you have to have some method to evaluate (quickly?) whether the new code is better or worse than the old code.

That’s where the evaluation code comes into play. It effectively executes the new code and determines a score (could be a scalar or vector) that AlphaEvolve can use to determine if it’s on the right track or not. Also you can have multiple evaluation functions. And as an example you could have some LLM be asked whether the code is simpler/cleaner/easier to understand. That way you could task AlphaEvolve to not only improve the code functionality but also create simpler/cleaner/easier to understand code.

AlphaEvolve uses GeminiFlash to generate a multitude of code variations and when that approach loses steam (no longer improving much) it invokes GeminiPro to look at the code in depth to determine strategies to make it better.

As discussed above to use AlphaEvolve you need to supply infrastructure (compute, storage, networking), one or more evaluation algorithms/prompts (in any coding language you choose) and a starting solution (again in any coding language you want).

As part of the AlphaEvolve’s process it uses a database to record all code modification attempts and its evaluation scores. This database can be used to retrieve prior modifications and take off from there again.

Results

AlphaEvolve has been tasked with historical math problems that involve geometric constructions, as well as computing algorithms improvement as well as full stack coding improvements.

For instance the paper discusses how AlphaEvolve improved their Google Cloud (Borg) compute scheduling algorithm which increased compute utilization by 7% throughout Google Cloud Data centers.

It also found a kernel improvement which led to Gemini training speedup. It found a simpler logic footprint for a TPU chip function.

It found a faster algorithm to do 4X4 matrix complex multiplication algorithm. It found a solution to the 11 dimension circle kissing problem (geometric construction). And probably 50 or more mathematical problems, coding algorithm improvements etc.

It didn’t improve or solve everything it was tasked to do but it did manage to make improvements or solutions to ~20% or so of the starting solutions it was tasked with.

How to use it

The nice thing about AlphaEvolve is that one can have it work with a whole code repo and have it only evolve a set of sections of code in that repo. All the code to be improved is marked with

#EVOLVE-BLOCK START and
#EVOLVE-BLOCK END.

This would be embedded in the starting solution. Presumably this would be in any comment format for the coding language being used.

And it’s important to note that the starting solution could be very rudimentary, and with the proper evaluation algorithms could still be used to solve or improve any algorithm.

For example if you were interested in optimizing a factory production line by picking a component/finished product to manufacture and you had lets say some sort of coded factory simulation with some way to examine the factory to evaluate whether it’s working well or not.

Your rudimentary starting algorithm could pick at random from the set of products/components to manufacture that are currently needed and use as evaluation the throughput of your factory, utilization of bottleneck/machinery, energy consumption or any other easily code-able evaluation metric of interest in isolation or combination (that could make use of your factory simulation to come up with evaluation socer(s). Surround the random selection code in #EVOLVE-BLOCK START and #EVOLVE-BLOCK END and let AlphaEvolve come up with a new selection algorithm for your factory.

After seeing a couple of (10-100-1000) iterations of new graded selection algorithms you could change your evaluation grading algorithms and start over from where you left off to get something even more sophisticated.

Deepmind has created a GitHub jupyter notebook with some of AlphaEvolve’s mathematical solutions/improvements in case you want to see more.

They also have an AlphaEvolve early signup site in case your interested in trying it out. which

~~~~

If I were Deepmind, I could think of probably 10K things to do with AlphaEvolve. I might rankall the functions in GeminiPro/GeminiFlash inference and training by frequency count and take the top 20% of these functions through the AlphaEvolve pipeline. Ditto for Google Cloud services, Google search, Adwords, etc.

But that would be just the start…

….

Photo/Graphic Credit(s):

From DeepMind’s AlphaEvolve Paper
From DeepMind’s AlphaEvolve website
From DeepMind’s AlphaEvolve Paper
From DeepMind’s AlphaEvolve website

Reward is all you need – part 2, AGI part 12, ASI part 3

Posted on April 18, 2025 by Ray in AGI, ASI, Cognitive computing, Reinforcement Learning, Strategic Inflection Points

Read an article today about how current LLM technology is running out of steam as it approaches equivalents to all current human knowledge. The article is Welcome to the Age of Experience. Apparently it’s a preprint of a chapter in an upcoming book from MIT, Designing an Intelligence. One of the authors is well known for his research in reinforcement learning and is a co-author of the text book, Reinforcement Learning: An Introduction. .

Sometime back before ChatGPT came out there was a paper on reward is enough (see post: For AGI, is reward enough). And at the time it proposed that reinforcement learning with proper reward signals was sufficient to reach AGI.

Since then, attention has become the prominent road to AGI and is evident in all the LLM activity to date (see ArXiv paper: Attention is all you need).

This new paper (and presumably book) suggests that the current AI training technology focused on attention (to current human knowledge) will ultimately reach an impasse, a human wall if you will. Whenever it attains human levels of AG or the Humanity WalI, it will be unable to proceed any farther. And at that point, it will track human knowledge generation but go no further.

Now, from my perspective something like this is inherently safer than having something that can surpass human intelligence. But putting my reservations aside. The new paper on the Era of Experience shows a potential road map of sorts to achieve super human intelligence.

Era of attention

In the case of transformers (current LLM technology) they have billion parameter models based on learning what the next token in a sequence should be. There are ancillary models that determine, for instance, tokenization of text streams (multi dimensional locations for each portion of a word in a paragraph for instance). Tokenization encoded textual semantics and context as well as the textual word part being analyzed into a string of numbers for each token. Essentially, a multi-dimensional address in textual semantic space

But the big, billion+ parameter models were all essentially trained to predict what the next text token would be based on current context. Similarly, for graphical generation models it went from text tokens to predicting the diffusion pixels of a graphic and other visual artifacts.

But pretty much all of this was based on the underlying technology training approach as outlined in attention is all you need.

The Era of Experience paper suggests that this training approach will ultimately run out of steam. And all of these models will hit the Humanity Wall. Where they reach the equivalent to all human knowledge but will be unable to proceed past that point

Era of Games and Proofs

In an online course I took during Covid on reinforcement learning, the level 1 of the course ended up having us code a Reinforcement Learning algorithm to play pong. Mind you this ended up taking me much longer to get right than I had anticipated. But in the end this was essentially training a deep neural network as a value function (prediction whether a move was going to win or lose) to decide which direction to move the paddle based on the balls current position and velocity.

For this reinforcement learning algorithm reward was simply 0, if you continued the game, +1 if you won the game, and -1, if you lost (the ball went past your paddle).

The authors discuss Deep Mind’s “Alpha-Proof” (more of an explanation of the technology) and Alpha-Geometry2 (also described in the same page) as being an examples of super-human thinking capabilities only in the domain of mathematical proofs. Alpha-Proof and Alpha-Geometry2 have won a prestigious International Mathematics Olympiad silver medal for its capabilities.

Alpha-Proof & Alpha-Geometry2 depend on LEAN a formal mathematical description language (similar to coding for mathematics). So a proof request would be converted to LEAN code and then Alpha-Proof and Alpha-Geometry2

Alpha-proof was originally trained on the sum total of all human generated mathematical proofs but then used reinforcement learning to generate 100’s of million more proofs and trained on those, to reach the level of superhuman mathematical proof generator.

Alpha Proof is an example of deploying Alpha-Zero RL technologies to different domains. Alpha-zero already conquered Chess, Shoji and Go games with super-human skill.

These achieved super-human levels of skill, because human (knowledge) was essentially dropped out of the training loop (very early on) and from then on the algorithm trained itself on self-generated data (game play, mathematical proofs). Using a a game simulator and reward signal(s) to determine when play were good or bad.

Era of Experience

But the Era of Experience takes reward signals to a whole other level.

Essentially in order to create super human intelligence using RL, the reward function needs to become yet another Deep Neural Network or two. And it needs to be trained in a fashion which understands how the world, environment, humans, flora, fauna, etc. reacts to what a (super human) agent is doing.

Unclear how you tokenize (encode) all those real world, experience signals into something a DNN could be trained on but my guess is their book will delve into some of these topics.

But in addition to the multi-faceted reward DNN(s), in order to do effective RL, one also needs a (high fidelity) real world simulator. This would be used similar to internal game play, in game playing traditional RL algorithms so that the super human agent could generate a 100 million agentic scenarios in simulation to determine if they were successful or not long before it ever attempted activities in the real world.

So there you have it tokenization for LLMS DNNs and diffusion and text based agentic LLM DNNs, some sort of multi-faceted Reward DNNs (taking input from real and simulated world experience) and multi-faceted World simulator DNNs.

Once you have all that together and with sufficient time and processing powerand after some 100 million or so of generated actions in the simulated world, you should have a super human agent that you can unleash on the real world.

~~~~

You may wish to constrain your new super human intelligent agent early on to make sure the world simulation has true fidelity with the real world we live in. But after a suitable safety checkout period, one should have a super human intelligence agent ready to take over all human thought, society advancement, scientific research, etc.

Sound like fun!!?

Photo/Graphic Credit(s):

From Welcome to the new Era of Experience paper
From DeepMind’s Alpha-Proof webpage.
From DeepMind’s Alpha-Proof webpage.

Benchmarking Agentic AI using Factorio – AGI part 12

Posted on March 13, 2025 by Ray in AGI, AI Agents, Artificial Intelligence, Cognitive computing, Strategic Inflection Points

Yesterday a friend forwarded me something he saw online about a group of researchers who were using the game, Factorio, to benchmark AI Agent solutions (PDF of paper, Github repo).

The premise is that with an effective API for Factorio, AI agents can be tasked with creating various factories for artifacts. The best agents would be able to create the best factories.

Factorio factories can be easily judged by the number of artifacts they produce per time period and the energy use to manufacture those artifacts. They can also be graded based on how many steps it takes to generate those factories.

***Left is Factorio factory progression, middle is AI agent Python code that uses Factorio API, Right is agents submitting programs to Factorio server and receive feedback***

The team has created a Factorio framework for using AI agents that create Python code to drive a set of Factorio APIs to build factories to manufacture stuff.

Factorio is a game in which you create and operate factories. From Factorio website: “You will be mining resources, researching technologies, building infrastructure, automating production, and fighting enemies. Use your imagination to design your factory, combine simple elements into ingenious structures, apply management skills to keep it working, and protect it from the creatures who don’t really like you.”

Presumably FLE has disabled the villainy and focused on just crafting and running factories all out.

FLE Results using current AI agents

***FLE Open-play Results***, ***for open-play, models are scored based on prediction quantities over time***, ***note the chart is log-log***

Factorio, similar to other games, has an inventory of elemens/components/machines used to build factories. And some of these elements are hidden until you one gains enough experience in the game.

The Factorio Learning Environment (FLE) is a complete framework that can prompt Agentic AI to create factories using Python code and Factorio API calls. The paper goes into great detail in it’s appendices as to what AI agent prompts look like, the Factorio API and other aspects of running the benchmark.

In the FLE as currently defined there’s “open-play” and “lab-play”.

Open-play is tasked with building a factory as large as the agent wants to create as much product as possible. The open-play winner is the AI agent that creates a factory that can manufacture the most widgets (iron plates) in the time available for the competition.
Lab-play is tasked with building factories for 24 specific items, with limited resource and time constraints and the winner is the AI agent that is able to build most of these lab-play factories successfull,y in the time and resource constraints available.

***FLE Lab-play (select) results – there were 24 tasks in the lab-play list, no agent completed all of them but Claude did the best on the 5 that were completed by most agents***

The team benchmarked 6 frontier LLM agents: Claude 3.5-Sonnet, GPT-4o, GPT-4o-Mini, Deepseek-v3, Gemini-2-Flash, and Llama-3.3-70B-Instruct, using them for both open-play and lab-play.

The overall winner for both open-play and lab-play was Claude 3.5-Sonnet, by a far margin. In open play it was able to create a factory to manufacture over 290K iron plates (per game minute, we think) and for lab-play was able to construct more (7 out of 24) factories, more than other AI agents.

***FLE Overall A***I ***Agent Results***

The FLE researchers listed some common failings of AI agents under test:

Most agents lack spatial understanding
Most agents don’t handle or recover from errors well
Most agents don’t have long enough planning horizons
Most agents don’t invest enough effort in research (finding out what new Factorio machines do and how they could be used).

They also mentioned that AI agent coding skills seemed to be a key indicator of FLE success and coding style differed substantially between the agents. The researchers characterized agent (Python) coding styles and determined that Claude used a REPL style with plenty of print statements while GPT-4o used more assertions in its code.

“***Example of an FLE program*** used to create a simple
automated iron-ore miner. In step 1 the agent uses a query to find
the nearest resources and place a mine. In step 3 the agent uses an
assert statement to verify that its action was successful.”

IMHO, as a way to measure AI agent ability to achieve long term and short term goals, at least w.r.t. building factories, this is the best I’ve seen so far.

More FLE Lab-play scenarios

I could see a number of additional lab-play benchmarks for FLE:

One focused on drug/pharmaceuticals manufacturing
One focused on electronics PCB manufacturing
One focused on chip manufacturing
One focused on nano technology/meta-materials manufacturing, etc.

What’s missing from all these benchmarks would be the actual science and research needed to come up with new drugs, new electronics, new meta-materials, that are the end product of Factorio factories. I guess that would need to be building of labs, running scientific experiments and understanding (simulated) results.

Although in the current round of FLE benchmarks, for one AI agent at least (Claude), there seemed to be a lot of research into how to use different Factorio tools and machinery.

Ultimate FLE

If FLE as an Ai agent benchmark succeeds, most Agentic AI solutions will start being trained to do better on the benchmark. Doing so should of course lead to better scores by AI agents.

Now people much more familiar with the game than I, say it’s not a great simulation of the real world. There’s only one type of fuel and the boiler is either on or off and numerous other simplifications of the real world are used throughout. And thankfully, for the moment there’s no linkage to actions that impact the real world.

But in reality, simulations like this that are all just stepping stones to AI capabilities. And simulations are all just code and it should not be that hard to increase its fidelity to the real world. .

Getting beyond just simulation, to real world factories is probably the much larger step. This would require physical (not unlimited) inventory of parts, cabling, machines, and belts; real mineral/petroleum deposits; real world physical constraints on where factories could be built. etc. Not to mention the physical automation/robotics that would allow a machine to be selected out of inventory, placed at a specific location inside a factory and connected to power and assembly lines, etc.

~~~~

One common motif in AGI existential crisises, is that some AGI (agent) will be given the task to build a paperclip factory and turns the earth into one giant factory, while inadvertently killing all life on the planet, including of course, humankind.

So training AI agents on “open-play” has ominous overtones.

It would be much better, IMHO, if somehow one could add to Factorio human settlements, plant, animal & sea life, ecosystems, etc. So that there would be natural components that if ruined/degraded/destroyed, could be used to reduce AI agent scores for the benchmarks.

Alas, there doesn’t appear to be anything like this in the current game.

Picture Credit(s):

From Jack Hopkins Factorio Learning Environment (FLE) Github Repo
From Jack Hopkins Factorio Learning Environment (FLE) Github Repo
From Jack Hopkins Factorio Learning Environment (FLE) Github Repo
From Jack Hopkins Factorio Learning Environment (FLE) paper