Reward is all you need – part 2, AGI part 12, ASI part 3

Read an article today about how current LLM technology is running out of steam as it approaches equivalents to all current human knowledge. The article is Welcome to the Age of Experience. Apparently it’s a preprint of a chapter in an upcoming book from MIT, Designing an Intelligence. One of the authors is well known for his research in reinforcement learning and is a co-author of the text book, Reinforcement Learning: An Introduction. .

Sometime back before ChatGPT came out there was a paper on reward is enough (see post: For AGI, is reward enough). And at the time it proposed that reinforcement learning with proper reward signals was sufficient to reach AGI.

Since then, attention has become the prominent road to AGI and is evident in all the LLM activity to date (see ArXiv paper: Attention is all you need).

This new paper (and presumably book) suggests that the current AI training technology focused on attention (to current human knowledge) will ultimately reach an impasse, a human wall if you will. Whenever it attains human levels of AG or the Humanity WalI, it will be unable to proceed any farther. And at that point, it will track human knowledge generation but go no further.

Now, from my perspective something like this is inherently safer than having something that can surpass human intelligence. But putting my reservations aside. The new paper on the Era of Experience shows a potential road map of sorts to achieve super human intelligence.

Era of attention

In the case of transformers (current LLM technology) they have billion parameter models based on learning what the next token in a sequence should be. There are ancillary models that determine, for instance, tokenization of text streams (multi dimensional locations for each portion of a word in a paragraph for instance). Tokenization encoded textual semantics and context as well as the textual word part being analyzed into a string of numbers for each token. Essentially, a multi-dimensional address in textual semantic space

But the big, billion+ parameter models were all essentially trained to predict what the next text token would be based on current context. Similarly, for graphical generation models it went from text tokens to predicting the diffusion pixels of a graphic and other visual artifacts.

But pretty much all of this was based on the underlying technology training approach as outlined in attention is all you need.

The Era of Experience paper suggests that this training approach will ultimately run out of steam. And all of these models will hit the Humanity Wall. Where they reach the equivalent to all human knowledge but will be unable to proceed past that point

Era of Games and Proofs

In an online course I took during Covid on reinforcement learning, the level 1 of the course ended up having us code a Reinforcement Learning algorithm to play pong. Mind you this ended up taking me much longer to get right than I had anticipated. But in the end this was essentially training a deep neural network as a value function (prediction whether a move was going to win or lose) to decide which direction to move the paddle based on the balls current position and velocity.

For this reinforcement learning algorithm reward was simply 0, if you continued the game, +1 if you won the game, and -1, if you lost (the ball went past your paddle).

The authors discuss Deep Mind’s “Alpha-Proof” (more of an explanation of the technology) and Alpha-Geometry2 (also described in the same page) as being an examples of super-human thinking capabilities only in the domain of mathematical proofs. Alpha-Proof and Alpha-Geometry2 have won a prestigious International Mathematics Olympiad silver medal for its capabilities.

Alpha-Proof & Alpha-Geometry2 depend on LEAN a formal mathematical description language (similar to coding for mathematics). So a proof request would be converted to LEAN code and then Alpha-Proof and Alpha-Geometry2

Alpha-proof was originally trained on the sum total of all human generated mathematical proofs but then used reinforcement learning to generate 100’s of million more proofs and trained on those, to reach the level of superhuman mathematical proof generator.

Alpha Proof is an example of deploying Alpha-Zero RL technologies to different domains. Alpha-zero already conquered Chess, Shoji and Go games with super-human skill.

These achieved super-human levels of skill, because human (knowledge) was essentially dropped out of the training loop (very early on) and from then on the algorithm trained itself on self-generated data (game play, mathematical proofs). Using a a game simulator and reward signal(s) to determine when play were good or bad.

Era of Experience

But the Era of Experience takes reward signals to a whole other level.

Essentially in order to create super human intelligence using RL, the reward function needs to become yet another Deep Neural Network or two. And it needs to be trained in a fashion which understands how the world, environment, humans, flora, fauna, etc. reacts to what a (super human) agent is doing.

Unclear how you tokenize (encode) all those real world, experience signals into something a DNN could be trained on but my guess is their book will delve into some of these topics.

But in addition to the multi-faceted reward DNN(s), in order to do effective RL, one also needs a (high fidelity) real world simulator. This would be used similar to internal game play, in game playing traditional RL algorithms so that the super human agent could generate a 100 million agentic scenarios in simulation to determine if they were successful or not long before it ever attempted activities in the real world.

So there you have it tokenization for LLMS DNNs and diffusion and text based agentic LLM DNNs, some sort of multi-faceted Reward DNNs (taking input from real and simulated world experience) and multi-faceted World simulator DNNs.

Once you have all that together and with sufficient time and processing powerand after some 100 million or so of generated actions in the simulated world, you should have a super human agent that you can unleash on the real world.

~~~~

You may wish to constrain your new super human intelligent agent early on to make sure the world simulation has true fidelity with the real world we live in. But after a suitable safety checkout period, one should have a super human intelligence agent ready to take over all human thought, society advancement, scientific research, etc.

Sound like fun!!?

Photo/Graphic Credit(s):

SIGGRAPH 2024 Keynote: BabyX – AGI part 11, ASI part 3

SIGGRAPH came back to Colorado, to the Colorado Convention Center, for their 50th anniversary conference, the original SIGGRAPH conference was in Boulder in 1974.

The first SIGGRAPH keynote was a session called Beyond the Illusion of Life, presented by Mark Sagar, Soul Machines, Co-Founder and former Chief Science Office.

The theme of the session was mainly on how AI needs an embodiment to achieve a true breakthrough. Without embodiment, AI is just another secluded machine function and interacting with it will always be divorced from human existence and as such, much harder than interacting with other people.

As an example of embodied AI, Mark presented BabyX, a virtual 12-24 month old infant.

BabyX shows how creating a digital embodiment of a human can lead to faster, easier and more inherently natural, human-machine interactions. This is because we, as humans, have evolved to interact with other humans and do this much better and faster than we can interact with machines, chatbots, and other digital simulacrum.

With BabyX, they have created an emulation rather than an animation or simulation of a human.

BabyX

BabyX is a virtual infant that interacts with a virtual screen AND real people on the other side of that screen. BabyX simulates a real infant in front of a screen with adult supervision.

BabyX interacts with people using verbal cues, virtual screen images and virtual hands/fingers in real time.

BabyX appears to be actually learning and interacting with different people in real time.

If you check out their video (in link above), one can see just how close the emulation can get.

BabyX’s emulation is based on a digital cognitive architectural that mimics the real brain, that includes memory and learning system, motor control system, visual system, etc.

All these systems are distinct computational modules, that in unison, represent the “virtual connectome” of BabyX’s brain emulation. Each of these cognitive systems can be swapped in or out, whenever better versions become available.

This cognitive architecture was designed to digitally, re-construct, the key components of the brain of a 18-24 month infant.

As a result, BabyX learns through interactions with its environment and by talking with the people and viewing a screen. With BabyX, they can even simulate hormonal activity. With the end result the ability to provide real time emotional expression.

With such a cognitive architecture, one could simulate real (virtual) humans interacting with another person, on the other side of a virtual screen.

Soul Machines “virtual” assistants

Soul Machines (like above) has taken BabyX research and created AI avatars used for customer support agents, educational assistants and any commercial activity that depend on human interacting with machines via screens.

It’s unclear just how much of the BabyX cognitive architecture and simulation has made its way into Soul Machines’ Avatars, but they do show similar interactions with a virtual screen and humans, as well as emotional expression.

Soul Machines is in the market of supplying these digital avatars so that companies can provide a better, more human like experience when interacting with AI.

In any case, BabyX was the first time I saw the true embodiment of an AI that uses a cognitive architecture as it is understood today.

AGI?

One can’t help but think that this is a better, or at least, potentially, a more correct way to create human level artificial intelligence or AGI. BabyX uses an digital emulation of human memory & learning, behavior, attention, etc. to construct a machine entity that acts and ineracts similar to how a human would.

With this sort of emulation, one could see training a digital emulation of a human, and after 20 years or so, resulting in a digital human, with human levels of intelligence.

And, of course, once we have re-created a human level intelligence, the (industry) view is all we need do is to focus it on improving (machine) learning algorithms and maybe, (machine) learning hardware, and let it loose to learn all there is to know in the universe and somewhere along the way we will have created super general intelligence or ASI.

Thankfully, it turns out that BabyX’s long term memory has been constrained to be temporary and limited. So, we aren’t able to see how a TeenX would actually behave (thank the powers that be).

Sager mentioned some of the ethical issues in letting BabyX have an indefinite, permanent long term memory.

I’m thinking this won’t stop others from taking this approach on.

Which, in the end, scares the heck out of me.

~~~~
Comments?

The Data Wall – AGI part 11, ASI part 2

Went to a conference the other week (Cloud Field Day 20) and heard a term I hadn’t heard before, the Data Wall. I wasn’t sure what this meant but thought it an interesting concept.

Then later that week, I read an article online, Situational Awareness – The Decade Ahead, by Leopold Ashenbrenner, which talked about the path to AGI. He predicts it will happen in 2027, and ASI in 2030. However he also discusses many of the obstacles to reaching AGI and one key roadblock is the Data Wall.

This is a follow on to our long running series on AGI (see AGI part 10 here) and with this we are creating a new series on Artificial Super Intelligence (ASI) and have relabeled an earlier post as ASI part 1.

The Data Wall

LLMs, these days, are being trained on the internet text, images, video and audio. However the vast majority of the internet is spam, junk and trash. And because of this, LLMs are rapidly reaching (bad) data saturation. There’s only so much real intelligence to be gained from scraping the internet. .

The (LLM) AI industry apparently believes that there has to be a better way to obtain clean, good training data for their LLMs and if that can be found, true AGI is just a matter of time (and compute power). And this, current wall of garbage data is prohibiting true progress to AGI and is what is meant by the Data Wall.

Leopold doesn’t go into much detail about solutions to the data wall other than to say that perhaps Deep Reinforcement Learning (see below). Given the importance of this bottleneck, every LLM company is trying to solve it. And as a result, any solutions to the Data Wall will end up being proprietary because this enables AGI.

National_Security_Agency_seal
National_Security_Agency_seal

But the real gist of Leopold’s paper is that AGI and its follow on, Artificial Super Intelligence (ASI) will be the key to enabling or retaining national supremacy in the near (the next decade and beyond) future.

And that any and all efforts to achieve this must be kept as a National Top Secret. I think, he wants to see something similar to the Manhattan Project be created in the USA, only rather than working to create an atom/hydrogen bomb, it should be focused on AGI and ASI.

The problem is that when AGI and it’s follow on ASI, is achieved it will represent an unimaginable advantage to the country/company than owns it. Such technology if applied to arms, weapons, and national defense will be unbeatable in any conflict. And could conceivably be used to defeat any adversary before a single shot was fired.

The AGI safety issue

In the paper Leopold talks about AGI safety and his proposed solution is to have AGI/ASI agents be focused on crafting the technologies to manage/control this. I see the logic in this and welcome it but feel it’s not sufficient.

I believe (seems to be in the minority these days) that rather than having a few nation states or uber corporations own and control AGI, it should be owned by the world, and be available to all nation states/corporations and ultimately every human on the planet.

My view is the only way to safely pass through the next “existential technological civilizational bottleneck” (eg, AGI is akin to atomic weapons, genomics, climate change all of which could potentially end life on earth), is to have many of these that can compete effectively with one another. Hopefully such a competition will keep all of them all in check and in the end have them be focused on the betterment of all of humanity.

Yes there will be many bad actors that will take advantage of AGI and any other technology to spread evil, disinformation and societal destruction. But to defeat this, it needs to become ubiquitous, every where, and in that way these agents can be used to keep the bad actors in check.

And of course keeping the (AGI/ASI) genie in the bottle will be harder and harder as time goes on.

Computational performance is going up 2X every few years, so building a cluster of 10K H200 GPUs, while today is extremely cost prohibitive for any but uber corporations and nation states, in a decade or so, will be something any average sized corporation could put together in their data center (or use in the cloud). And in another decade or so will be able to be built into a your own personal basement data center.

The software skills to train an LLM while today may require a master’s degree or higher will be much easier to understand and implement in a decade or so. So that’s not much of a sustainable advantage either.

This only leaves the other bottlenecks to achieving AGI, a key one of which is the Data Wall.

Solving the Data Wall.

In order to have as many AGI agents as possible, the world must have an open dialogue on research into solving the Data Wall.

So how can the world generate better data to use to train open source AGIs. I offer a few suggestions below but by no means is this an exhaustive list. And I’m a just an interested (and talented) amateur in all this

Deep reinforcement learning (DRL)

Leopold mentioned DRL as one viable solution to the data wall in his paper. DRL is a technique that Deepmind used to create a super intelligent Atari, Chess and Go player. They essentially programed agents to play a game against itself and determine which participant won the game. Once this was ready they set multiple agents loose to play one another.

Each win would be used to reward the better player, each loss to penalize the worse player, after 10K (or ~10M) games they ended up with agents that could beat any human player.

Something similar could be used to attack the Data Wall. Have proto-AGI agents interact (play, talk, work) with one another to generate, let’s say more knowledge, more research, more information. And over time, as the agents get smarter, better at this, AGI will emerge.

However, the advantage of Go, Chess, Atari, Protein Folding, finding optimal datacenter energy usage, sort coding algorithms, etc. is that there’s a somewhat easy way to determine which of a gaggle of agents has won. For research, this is not so simple.

Let’s say we program/prompt an protoAGI agent to generate a research paper on some arbitrary topic (How to Improve Machine Learning, perhaps). So it generates a research paper, how does one effectively and inexpensively judge if this is better, worse or the same as another agent’s paper.

I suppose with enough proto-AGI agents one could automatically use “repeatability” of the research as one gauge for research correctness. Have a gaggle of proto-AGIs be prompted to replicate the research and see if that’s possible.

Alternatively, submit the papers to an “AGI journal” and have real researchers review it (sort of like how Human Reinforcement Learning for LLMs works today). The costs for real researchers reviewing AGI generated papers would be high and of course the amount of research generated would be overwhelming, but perhaps with enough paid and (unpaid) voluntary reviewers, the world could start generating more good (research) data.

Perhaps at one extreme we could create automated labs/manufacturing lines that are under the control of AGI agent(s) and have them create real world products. With some modest funding, perhaps we could place the new products into the marketplace and see if they succeed or not. Market success would be the ultimate decision making authority for such automated product development.

(This later approach seems to be a perennial AGI concern, tell an AGI agent to make better paper clips and it uses all of the earths resources to do so.)

Other potential solutions to the Data Wall

There are no doubt other approaches that could be used to validate proto-AGI agent knowledge generation.

  • Human interaction – have an AGI agent be available 7X24 with humans as they interact with the world. Sensors worn by the human would capture all their activities. An AGI agent would periodically ask a human why they did something. Privacy considerations make this a nightmare but perhaps using surveillance videos and an occasional checkin with the human would suffice.
  • Art, culture and literature – there is so much information embedded in cultural artifacts generated around the world that I believe this could effectively be mined to capture additional knowledge. Unlike the internet this information has been generated by humans at a real economic cost, and as such represents real vetted knowledge.
  • Babies-children– I can’t help but believe that babies and young children can teach us (and proto-AGI agents) an awful lot on how knowledge is generated and validated. Unclear how to obtain this other than to record everything they do. But maybe it’s sufficient to capture such data from daycare and public playgrounds, with appropriate approvals of course.

There are no doubt others. But finding some that are cheap enough that could be used for open source is a serious consideration.

~~~~

How we get through the next decade will determine the success or failure of AI and perhaps life on earth. I can’t help but think the more the merrier will help us get there..

Comments,