Read an article today about how current LLM technology is running out of steam as it approaches equivalents to all current human knowledge. The article is Welcome to the Age of Experience. Apparently it’s a preprint of a chapter in an upcoming book from MIT, Designing an Intelligence. One of the authors is well known for his research in reinforcement learning and is a co-author of the text book, Reinforcement Learning: An Introduction. .
Sometime back before ChatGPT came out there was a paper on reward is enough (see post: For AGI, is reward enough). And at the time it proposed that reinforcement learning with proper reward signals was sufficient to reach AGI.
Since then, attention has become the prominent road to AGI and is evident in all the LLM activity to date (see ArXiv paper: Attention is all you need).
This new paper (and presumably book) suggests that the current AI training technology focused on attention (to current human knowledge) will ultimately reach an impasse, a human wall if you will. Whenever it attains human levels of AG or the Humanity WalI, it will be unable to proceed any farther. And at that point, it will track human knowledge generation but go no further.
Now, from my perspective something like this is inherently safer than having something that can surpass human intelligence. But putting my reservations aside. The new paper on the Era of Experience shows a potential road map of sorts to achieve super human intelligence.
Era of attention

In the case of transformers (current LLM technology) they have billion parameter models based on learning what the next token in a sequence should be. There are ancillary models that determine, for instance, tokenization of text streams (multi dimensional locations for each portion of a word in a paragraph for instance). Tokenization encoded textual semantics and context as well as the textual word part being analyzed into a string of numbers for each token. Essentially, a multi-dimensional address in textual semantic space
But the big, billion+ parameter models were all essentially trained to predict what the next text token would be based on current context. Similarly, for graphical generation models it went from text tokens to predicting the diffusion pixels of a graphic and other visual artifacts.
But pretty much all of this was based on the underlying technology training approach as outlined in attention is all you need.
The Era of Experience paper suggests that this training approach will ultimately run out of steam. And all of these models will hit the Humanity Wall. Where they reach the equivalent to all human knowledge but will be unable to proceed past that point
Era of Games and Proofs
In an online course I took during Covid on reinforcement learning, the level 1 of the course ended up having us code a Reinforcement Learning algorithm to play pong. Mind you this ended up taking me much longer to get right than I had anticipated. But in the end this was essentially training a deep neural network as a value function (prediction whether a move was going to win or lose) to decide which direction to move the paddle based on the balls current position and velocity.
For this reinforcement learning algorithm reward was simply 0, if you continued the game, +1 if you won the game, and -1, if you lost (the ball went past your paddle).

The authors discuss Deep Mind’s “Alpha-Proof” (more of an explanation of the technology) and Alpha-Geometry2 (also described in the same page) as being an examples of super-human thinking capabilities only in the domain of mathematical proofs. Alpha-Proof and Alpha-Geometry2 have won a prestigious International Mathematics Olympiad silver medal for its capabilities.
Alpha-Proof & Alpha-Geometry2 depend on LEAN a formal mathematical description language (similar to coding for mathematics). So a proof request would be converted to LEAN code and then Alpha-Proof and Alpha-Geometry2
Alpha-proof was originally trained on the sum total of all human generated mathematical proofs but then used reinforcement learning to generate 100’s of million more proofs and trained on those, to reach the level of superhuman mathematical proof generator.
Alpha Proof is an example of deploying Alpha-Zero RL technologies to different domains. Alpha-zero already conquered Chess, Shoji and Go games with super-human skill.

These achieved super-human levels of skill, because human (knowledge) was essentially dropped out of the training loop (very early on) and from then on the algorithm trained itself on self-generated data (game play, mathematical proofs). Using a a game simulator and reward signal(s) to determine when play were good or bad.
Era of Experience
But the Era of Experience takes reward signals to a whole other level.
Essentially in order to create super human intelligence using RL, the reward function needs to become yet another Deep Neural Network or two. And it needs to be trained in a fashion which understands how the world, environment, humans, flora, fauna, etc. reacts to what a (super human) agent is doing.
Unclear how you tokenize (encode) all those real world, experience signals into something a DNN could be trained on but my guess is their book will delve into some of these topics.
But in addition to the multi-faceted reward DNN(s), in order to do effective RL, one also needs a (high fidelity) real world simulator. This would be used similar to internal game play, in game playing traditional RL algorithms so that the super human agent could generate a 100 million agentic scenarios in simulation to determine if they were successful or not long before it ever attempted activities in the real world.
So there you have it tokenization for LLMS DNNs and diffusion and text based agentic LLM DNNs, some sort of multi-faceted Reward DNNs (taking input from real and simulated world experience) and multi-faceted World simulator DNNs.
Once you have all that together and with sufficient time and processing powerand after some 100 million or so of generated actions in the simulated world, you should have a super human agent that you can unleash on the real world.
~~~~
You may wish to constrain your new super human intelligent agent early on to make sure the world simulation has true fidelity with the real world we live in. But after a suitable safety checkout period, one should have a super human intelligence agent ready to take over all human thought, society advancement, scientific research, etc.
Sound like fun!!?
Photo/Graphic Credit(s):
- From Welcome to the new Era of Experience paper
- From DeepMind’s Alpha-Proof webpage.
- From DeepMind’s Alpha-Proof webpage.