AI Agents – Silverton Consulting

AGI, SuperIntelligence and “The Last Man”

Posted on May 30, 2025 by Ray in AGI, AI Agents, Cognitive computing, Executive leadership

Nietzsche wrote about the last man in Thus Spoke Zarathustra (see Last Man wikipedia article). There’s much to dislike about Nietzsche’s writing but every once in a while there are gems to be found. (Sorry for the sexist statement, it’s not me, blame Nietzsche).

It Zarathustra, Nietzsche talks of the Last Man in contempt. They no longer struggle in their daily life. They no longer create. They have an easy life filled with leisure and entertainment and no work to speak of.

From AGI to SUperIntelligence

I’ve discussed AGI many times before (I think we are up to AGI part 12, this would be part 13 and ASI (Artificial SuperIntelligence) part 3, this would be 4. But I’m thinking numbering them is not helping anymore). How to get there. the existential risk getting there. and many other facets of the risks and rewards of AGI. (Ok less on the rewards…).

I’ve also discussed Artificial SuperIntelligence (ASI). This is what we believe can be attained after AGI. If one were to use AGI to improve AI training algorithms, AI hardware, AI inferencing and use AGI to generate massive amounts of new scientific research/political research/economic research, etc. One could use the new data, the better training, inferencing, and AI hardware to create as ASI agent.

The big debate in the industry is how fast can one go from AGI to ASI. I don’t believe there’s any debate in the industry that SuperIntelligence can be obtained eventually.

There are those that believe

it will take many 3-5-10(?) years to attain SuperIntelligence because of all the infrastructure that has to be put in place to create current LLMs, and the view that AGI will need much more. Thus, build out is years away. If that’s the case it will take more years of infrastructural production, acquisition and data center build out to be ready to train SuperIntelligence after attaining AGI.
It will take just a few years 1-2-3(?) to achieve SuperIntelligence after AGI. This is because, one could use AGI to improve the AI training & inferencing algorithms and drastically increase the utilization of current AI hardware, such that there may be no need for any additional hardware to reach SuperIntelligence. Then the prime determinant of the time it takes to achieve SuperIntelligence is how fast AGI(s) can generate new scientific, medical, sociological, etc. research needed to train SuperIntelligence .

Yes, much scientific, et al research requires experimentation in the real world, (although much can now be done in simulation). But even physical experimentation is being rapidly automated today.

So the time it takes to generate sufficient research to create enough data to train an ASI may be very short. Just consider how fast LLM agents can generate code today to get a feel for what they could do tomorrow for research.

Maybe regulatory bodies could slow this down. But my bet would be that regulatory artifices would turn out to be ineffectual. At best they will drive AGI-ASI training/deployment activity underground which may delay it a couple of years while organizations build up the AI training infrastructure in hiding.

The one serious bottleneck may be AI data center’s power requirements. But if rogue states can build centrifuges to enrich radioactive materials, intercontinental missiles, biological warfare agents, etc., they can certainly steal/buy/find a way to duplicate AI data center infrastructure components.

Regulatory regimens, at worst, would completely ignored by state actors and all large commercial enterprises. The first mover advantages of AGI and ASI are too large for any organization to ignore.

What happens when SuperIntelligence is reached

I see one of two possibilities for how the achievement of AGI and SuperIntelligence plays out, with respect to humanity

Humankind Utopia – AGI & ASI agents can do anything that humans can do and do it better, faster, and more efficiently. The question remains what would be left for humanity to do when this is reached. Alright, at the moment, LLM agents are mostly limited to working in the digital domain. But with robotics coming online over the next decade, this will change to add more real world domains to whatever AGI-ASI agents can do.
Humankind Hell – AGI & ASI agents determine that humanity is a pestilence to the Earth and starts to cut them back to something that’s less consumptive of Earth resources. Again, although AI agents are restricted to the digital domain today, that won’t last for long, especially as AGI & ASI agents go live. So robots with ASI agents will be the worst aggressor in the history of the world and with the tools at their disposal, they could easily create biological, chemical and other weapons of mass destruction to deploy against humanity.

SuperIntelligence risk and rewards

It’s been obvious to me, SciFi authors and some select AI researchers that there is a sizable risk that a SuperIntelligence, once unleashed, will eliminate, severely restrict or enslave humanity resulting in Humanity’s Hell.

On the other extreme are many corporate CEO/CTOs and other AI researchers which believe that SuperIntelligence will be a Godsend to humankind. Once it arrives and is deployed, humanity will no longer have to do any work it does not want to do. All work will be handed off to robots and their ASI agents which will perform it at greater speed, with higher quality and with lower cost than can be conceivable done today.

What seems to be happening today with current AI agents is that some white collar work is becoming easier to perform, if not totally eliminated. CEO’s see this as an opportunity to reduce workforce size. For example, some CEOs are eliminating HR organizations with the belief that LLM chatbots together with a much smaller group can handle this all of what HR was doing before.

And of course as AI agents become more sophisticated this will ensure more workforce reductions. And once AI agents are embodied in robotics, blue collar workforce will also be at risk.

Human Utopia and “The Last Man”

Nietzsche’s was writing in the late 1800s when technology and automation were just starting to make a difference in the world of work. But the industrial revolution was in full steam and had already had significant impact on the work force.

Nietzsche believed that further industrialization, it continued (which of course it has), would result in the Last Man.

The Last Man is at the point where technology and automation has taken over all tasks, trades and work, and where the Last Man has no real duties they need to perform other than consume goods and services provided by automation. For the Last Man, wealthy or poor no longer have any consequences, as they can have anything they could possibly desire.

To Nietzsche, the Last Man is an anathema. He believes that true humanity requires struggle, striving and advancement. Once the Last Man is achieved all these will no longer matter, no longer be a part of humanities existence and no longer impact one’s lifestyle.

When humanity no longer has to struggle, strive and advance, humanity will lose the very essence that makes humanity human. We will, over time, lose the ability and desire to do any of that, as it all becomes the purview of AGI-ASI.

The Last Man is coming already

Example 1: Ethiopian Flight 409 2010 disaster (see wikipedia article) is one example in a very technical domain. As I understand it, the flight was enroute to France when it went into a stall, the pilots did the wrong thing to get out of it and they spiraled into the sea.

The pilot was the most experienced pilot in the airline (logged over 10K flight hrs). The co-pilot was much less experienced. Getting out of a “stall” is rudimentary to flying. In fact, exiting a stall is one of the important skills taught to all pilots and in fact, they need to demonstrate they can get out of a stall before they get their pilot licenses.

The “problem” had been brewing for a while. Ever since aircraft auto-pilots came into service, real live pilots did less and less real flying of airplanes. As a result, these two pilots forgot how to get out of a stall and it caused the accident.

Example 2: Self-driving technology has been rapidly improving over the last decade or so. We often become dependent on its capabilities and when there’s some sort of failure it can be disastrous because we have lost many of our most important driving skills.

In my case, we have a relatively dumb car with what they call “”smart cruise control”. You can set it to a speed and the vehicle will retain that speed unless a vehicle in front of you is going slower, then it will slow down to maintain some set distance behind that vehicle.

We were driving along and a truck cut into our lane. This truck had a very high backend profile with no structures where normal vehicles would protrude until you got to its tires. Well the smart cruise control didn’t detect its existence until we were almost underneath the truck bed. We tried to brake but it took too many seconds to get that done and in the end we had to go off the road to save ourselves. We had lost our emergency braking skills and situational awareness skills. Nowadays we don’t drive with cruise control on as much.

A multitude of examples exist that show AI and automation has led to humans becoming less skilled at some activity. And when AI automation doesn’t work properly, bad things happen, because we no longer know how to react properly.

The Last Man, here today, gone tomorrow.

So imagine a life where you are born with everything you could possible need to succeed. You are educated by the very best automated personal tutors. You are provided an (Amazon and Walmart) X 1000, with unlimited credit. You grow up with everyone else having just the same life as you because all of you have no work to do and have infinite sums and have infinite products to consume.

Life in such a utopia would from some perspective be almost Godlike. But if you take the perspective that humanity needs struggle, needs challenges, needs to strive to better themselves at every stage, such a life would be a disaster.

And that’s what Humanity’s Utopia would look like. Definitely better than Humanity’s Hell but in the end, not sure the difference matters as much.

~~~

I just don’t really see any path forward that’s good for humanity where AGI and SuperIntelligence exists.

Stopping AI development here today, seems idiotic, going where we seem to be going seems insane.

Comments?

Picture Credit(s):

Friedrich Nietzsche by Friedrich Hermann Hartmann
ChatGPT logo by By User:Random837 – Own work (imitated from File:ChatGPT-Logo-2022.svg), Copyrighted free use,
Ethiopian Airline plane By Alastair T. Gardiner, CC BY-SA 4.0,

AlphaEvolve, DeepMind’s latest intelligence pipeline

Posted on May 21, 2025May 20, 2025 by Ray in AI Agents, Artificial Intelligence, Cognitive computing, Strategic Inflection Points

Read an article the other day from ArsTechnica on AlphaEvolve (Google Deepmind creates .. AI that can invent…). After Google announced and released their AlphaEvolve website and paper.

Essentially they have created a pipeline of AI agents (uses GeminiFlash and GeminiPro) that uses genetic/evolutionary techniques to evolve code tor anything really that can be transformed into code to be improve or solve something that has code based evaluation techniques.

Genetic evolution of code has been tried before and essentially it uses various combinatorial (splitting, adding, subtracting, etc.) techniques to modify code under evolution. The challenge with any such techniques is that much of the evolutionary code is garbage so you have to have some method to evaluate (quickly?) whether the new code is better or worse than the old code.

That’s where the evaluation code comes into play. It effectively executes the new code and determines a score (could be a scalar or vector) that AlphaEvolve can use to determine if it’s on the right track or not. Also you can have multiple evaluation functions. And as an example you could have some LLM be asked whether the code is simpler/cleaner/easier to understand. That way you could task AlphaEvolve to not only improve the code functionality but also create simpler/cleaner/easier to understand code.

AlphaEvolve uses GeminiFlash to generate a multitude of code variations and when that approach loses steam (no longer improving much) it invokes GeminiPro to look at the code in depth to determine strategies to make it better.

As discussed above to use AlphaEvolve you need to supply infrastructure (compute, storage, networking), one or more evaluation algorithms/prompts (in any coding language you choose) and a starting solution (again in any coding language you want).

As part of the AlphaEvolve’s process it uses a database to record all code modification attempts and its evaluation scores. This database can be used to retrieve prior modifications and take off from there again.

Results

AlphaEvolve has been tasked with historical math problems that involve geometric constructions, as well as computing algorithms improvement as well as full stack coding improvements.

For instance the paper discusses how AlphaEvolve improved their Google Cloud (Borg) compute scheduling algorithm which increased compute utilization by 7% throughout Google Cloud Data centers.

It also found a kernel improvement which led to Gemini training speedup. It found a simpler logic footprint for a TPU chip function.

It found a faster algorithm to do 4X4 matrix complex multiplication algorithm. It found a solution to the 11 dimension circle kissing problem (geometric construction). And probably 50 or more mathematical problems, coding algorithm improvements etc.

It didn’t improve or solve everything it was tasked to do but it did manage to make improvements or solutions to ~20% or so of the starting solutions it was tasked with.

How to use it

The nice thing about AlphaEvolve is that one can have it work with a whole code repo and have it only evolve a set of sections of code in that repo. All the code to be improved is marked with

#EVOLVE-BLOCK START and
#EVOLVE-BLOCK END.

This would be embedded in the starting solution. Presumably this would be in any comment format for the coding language being used.

And it’s important to note that the starting solution could be very rudimentary, and with the proper evaluation algorithms could still be used to solve or improve any algorithm.

For example if you were interested in optimizing a factory production line by picking a component/finished product to manufacture and you had lets say some sort of coded factory simulation with some way to examine the factory to evaluate whether it’s working well or not.

Your rudimentary starting algorithm could pick at random from the set of products/components to manufacture that are currently needed and use as evaluation the throughput of your factory, utilization of bottleneck/machinery, energy consumption or any other easily code-able evaluation metric of interest in isolation or combination (that could make use of your factory simulation to come up with evaluation socer(s). Surround the random selection code in #EVOLVE-BLOCK START and #EVOLVE-BLOCK END and let AlphaEvolve come up with a new selection algorithm for your factory.

After seeing a couple of (10-100-1000) iterations of new graded selection algorithms you could change your evaluation grading algorithms and start over from where you left off to get something even more sophisticated.

Deepmind has created a GitHub jupyter notebook with some of AlphaEvolve’s mathematical solutions/improvements in case you want to see more.

They also have an AlphaEvolve early signup site in case your interested in trying it out. which

~~~~

If I were Deepmind, I could think of probably 10K things to do with AlphaEvolve. I might rankall the functions in GeminiPro/GeminiFlash inference and training by frequency count and take the top 20% of these functions through the AlphaEvolve pipeline. Ditto for Google Cloud services, Google search, Adwords, etc.

But that would be just the start…

….

Photo/Graphic Credit(s):

From DeepMind’s AlphaEvolve Paper
From DeepMind’s AlphaEvolve website
From DeepMind’s AlphaEvolve Paper
From DeepMind’s AlphaEvolve website

Benchmarking Agentic AI using Factorio – AGI part 12

Posted on March 13, 2025 by Ray in AGI, AI Agents, Artificial Intelligence, Cognitive computing, Strategic Inflection Points

Yesterday a friend forwarded me something he saw online about a group of researchers who were using the game, Factorio, to benchmark AI Agent solutions (PDF of paper, Github repo).

The premise is that with an effective API for Factorio, AI agents can be tasked with creating various factories for artifacts. The best agents would be able to create the best factories.

Factorio factories can be easily judged by the number of artifacts they produce per time period and the energy use to manufacture those artifacts. They can also be graded based on how many steps it takes to generate those factories.

***Left is Factorio factory progression, middle is AI agent Python code that uses Factorio API, Right is agents submitting programs to Factorio server and receive feedback***

The team has created a Factorio framework for using AI agents that create Python code to drive a set of Factorio APIs to build factories to manufacture stuff.

Factorio is a game in which you create and operate factories. From Factorio website: “You will be mining resources, researching technologies, building infrastructure, automating production, and fighting enemies. Use your imagination to design your factory, combine simple elements into ingenious structures, apply management skills to keep it working, and protect it from the creatures who don’t really like you.”

Presumably FLE has disabled the villainy and focused on just crafting and running factories all out.

FLE Results using current AI agents

***FLE Open-play Results***, ***for open-play, models are scored based on prediction quantities over time***, ***note the chart is log-log***

Factorio, similar to other games, has an inventory of elemens/components/machines used to build factories. And some of these elements are hidden until you one gains enough experience in the game.

The Factorio Learning Environment (FLE) is a complete framework that can prompt Agentic AI to create factories using Python code and Factorio API calls. The paper goes into great detail in it’s appendices as to what AI agent prompts look like, the Factorio API and other aspects of running the benchmark.

In the FLE as currently defined there’s “open-play” and “lab-play”.

Open-play is tasked with building a factory as large as the agent wants to create as much product as possible. The open-play winner is the AI agent that creates a factory that can manufacture the most widgets (iron plates) in the time available for the competition.
Lab-play is tasked with building factories for 24 specific items, with limited resource and time constraints and the winner is the AI agent that is able to build most of these lab-play factories successfull,y in the time and resource constraints available.

***FLE Lab-play (select) results – there were 24 tasks in the lab-play list, no agent completed all of them but Claude did the best on the 5 that were completed by most agents***

The team benchmarked 6 frontier LLM agents: Claude 3.5-Sonnet, GPT-4o, GPT-4o-Mini, Deepseek-v3, Gemini-2-Flash, and Llama-3.3-70B-Instruct, using them for both open-play and lab-play.

The overall winner for both open-play and lab-play was Claude 3.5-Sonnet, by a far margin. In open play it was able to create a factory to manufacture over 290K iron plates (per game minute, we think) and for lab-play was able to construct more (7 out of 24) factories, more than other AI agents.

***FLE Overall A***I ***Agent Results***

The FLE researchers listed some common failings of AI agents under test:

Most agents lack spatial understanding
Most agents don’t handle or recover from errors well
Most agents don’t have long enough planning horizons
Most agents don’t invest enough effort in research (finding out what new Factorio machines do and how they could be used).

They also mentioned that AI agent coding skills seemed to be a key indicator of FLE success and coding style differed substantially between the agents. The researchers characterized agent (Python) coding styles and determined that Claude used a REPL style with plenty of print statements while GPT-4o used more assertions in its code.

“***Example of an FLE program*** used to create a simple
automated iron-ore miner. In step 1 the agent uses a query to find
the nearest resources and place a mine. In step 3 the agent uses an
assert statement to verify that its action was successful.”

IMHO, as a way to measure AI agent ability to achieve long term and short term goals, at least w.r.t. building factories, this is the best I’ve seen so far.

More FLE Lab-play scenarios

I could see a number of additional lab-play benchmarks for FLE:

One focused on drug/pharmaceuticals manufacturing
One focused on electronics PCB manufacturing
One focused on chip manufacturing
One focused on nano technology/meta-materials manufacturing, etc.

What’s missing from all these benchmarks would be the actual science and research needed to come up with new drugs, new electronics, new meta-materials, that are the end product of Factorio factories. I guess that would need to be building of labs, running scientific experiments and understanding (simulated) results.

Although in the current round of FLE benchmarks, for one AI agent at least (Claude), there seemed to be a lot of research into how to use different Factorio tools and machinery.

Ultimate FLE

If FLE as an Ai agent benchmark succeeds, most Agentic AI solutions will start being trained to do better on the benchmark. Doing so should of course lead to better scores by AI agents.

Now people much more familiar with the game than I, say it’s not a great simulation of the real world. There’s only one type of fuel and the boiler is either on or off and numerous other simplifications of the real world are used throughout. And thankfully, for the moment there’s no linkage to actions that impact the real world.

But in reality, simulations like this that are all just stepping stones to AI capabilities. And simulations are all just code and it should not be that hard to increase its fidelity to the real world. .

Getting beyond just simulation, to real world factories is probably the much larger step. This would require physical (not unlimited) inventory of parts, cabling, machines, and belts; real mineral/petroleum deposits; real world physical constraints on where factories could be built. etc. Not to mention the physical automation/robotics that would allow a machine to be selected out of inventory, placed at a specific location inside a factory and connected to power and assembly lines, etc.

~~~~

One common motif in AGI existential crisises, is that some AGI (agent) will be given the task to build a paperclip factory and turns the earth into one giant factory, while inadvertently killing all life on the planet, including of course, humankind.

So training AI agents on “open-play” has ominous overtones.

It would be much better, IMHO, if somehow one could add to Factorio human settlements, plant, animal & sea life, ecosystems, etc. So that there would be natural components that if ruined/degraded/destroyed, could be used to reduce AI agent scores for the benchmarks.

Alas, there doesn’t appear to be anything like this in the current game.

Picture Credit(s):

From Jack Hopkins Factorio Learning Environment (FLE) Github Repo
From Jack Hopkins Factorio Learning Environment (FLE) Github Repo
From Jack Hopkins Factorio Learning Environment (FLE) Github Repo
From Jack Hopkins Factorio Learning Environment (FLE) paper