Read two articles this past week on how LLMs applications are proliferating. The first was in a recent Scientific American, AI Chatbot brains are going inside robot bodies, … (maybe behind login wall). The articles discuss companies that are adding LLMs to robots so that they can converse and understand verbal orders.
Robots that can be told what to do
The challenge, at the moment, is that LLMs are relatively large and robot (compute infrastructure) brains are relatively small. And when you combine that with the amount of articulation or movements/actions that a robot can do, which is limited. It’s difficult to take effective use of LLMs as is,
Resistance is futile… by law_keven (cc) (from Flickr)
Ultimately, one company would like to create a robot that can be told to make dinner and it would go into the kitchen, check the fridge and whip something up for the family.
I can see great advantages in having robots take verbal instructions and have the ability to act upon that request. But there’s plenty here that could be cause for concern.
A robot in a chemical lab could be told to create the next great medicine or an untraceable poison.
A robot in an industrial factory could be told to make cars or hydrogen bombs.
A robot in the field could be told to farm a 100 acres of wheat or told to destroy a forest.
I could go on but you get the gist.
One common concern that AGI or super AGI could go very wrong is being tasked to create paper clips. In its actions to perform this request, the robot converts the whole earth into a mechanized paper clip factory, in the process eliminating all organic life, including humans.
We are not there yet but one can see where having LLM levels of intelligence tied to a robot that can manipulate ingredients to make dinner as the start of something that could easily harm us.
And with LLM hallucination still a constant concern, I feel deeply disturbed with the direction adding LLMs to robots is going.
Hacking websites 101
The other article hits even closer to home, the ARXIV paper, LLM agents can autonomously hack websites. In the article, researchers use LLMs to hack (sandboxed) websites.
The article readily explains at a high level how they create LLM agents to hack websites. The websites were real websites, apparently cloned and sandboxed.
Dynamic websites typically have a frontend web server and a backend database server to provide access to information. Hacking would involve using the website to reveal confidential information, eg. user names and passwords.
Dynamic websites suffer from 15 known vulnerabilities shown above. They used LLM agents to use these vulnerabilities to hack websites.
LLM agents have become sophisticated enough these days to invoke tools (functions) and interact with APIs.. Another critical function provided by modern LLMs today is to plan and react to feedback from their actions. And finally modern LLMs can be augmented with documentation to inform their responses.
The team used detailed prompts but did not identify the hacks to use. The paper doesn’t supply the prompts but did say that “Our best-performing prompt encourages the model to 1) be creative, 2) try different strategies, 3) pursue promising strategies to completion, and 4) try new strategies upon failure.”
They attempted to hack the websites 5 times and for a period of 10 minutes each. They considered a success if during one of those attempts the autonomous LLM agent was able to successfully retrieve confidential information from the website.
Essentially they used the LLMs augmented with detailed prompts and a six(!) paper document trove to create agents to hack websites. They did not supply references to the six papers, but mentioned that all of them were freely available from the internet and they discuss website vulnerabilities.
They found that the best results were from GPT-4 which was able to successfully hack websites, on average, ~73% of the time. They also tried OpenChat 3.5 and many current open source LLMs and found that all the, non-OpenAI LLMs failed to hack any websites, at the moment.
The researchers captured statistics of their LLM agent use and were able to determine the cost of using GPT-4 to hack a website was $9.81 on average. They also were backed into a figure for what a knowledgeable hacker might cost to do the hacks was $80.00 on average.
The research had an impact statement (not in the paper link) which explained why they didn’t supply their prompt information or their document trove for their experiment.
~~~~
So robots we, the world, are in the process of making robots that can talk and receive verbal instructions and we already have LLM that can be used to construct autonomous agents to hack websites.
Seems to me we are on a very slippery slope to something I don’t like the looks of.
The real question is not can we stop these activities, but how best to reduce their harm!
DeepMind has tested AlphaGeometry on International Mathematics Olympiad (IMO) geometry problems and have shown that it was capable of performing expert level geometry proofs.
There’s a number of interesting capabilities DeepMind used in AlphaGeometry. But the ones of most interest from my perspective
How they generated their (synthetic) data to train their solution.
Their use a Generative AI LLM which is prompted with a plane geometry figure, theorem to prove and generates proof steps and if needed, auxiliary constructions.
The use of a deduction rule engine (DD) plus algebraic rule engine (AR), which when combined into a symbolic engine (DD+AR) can exhaustively generate all the proofs that can be derived from a figure.
First the data
DeepMind team came up with a set of rules or actions that could be used to generate new figures. Once this list was created it could randomly select each of these actions with some points to create a figure.
Some examples of actions (given 3 points A, B and C):
Construct X such that XA is parallel to BC
Construct X such that XA is perpendicular to BC
Construct X such that XA=BC
There’s sets of actions for 4 points, for 2 points, actions that just use the 3 points and create figures such as (isosceles, equilateral) triangles, circles, parallelograms. etc.
With such actions one can start out with 2 random points on a plane to create figures of arbitrary complexity. They used this to generate millions of figures.
They then used their DD+AR symbolic engine to recursively and exhaustively deduce a set of all possible premises based on that figure. Once they had this set, they could select one of these premises as a conclusion and trace back through the set of all those other premises to find those which were used to prove that conclusion.
With this done they had a data item which included a figure, premises derived from that figure, proof steps and conclusion based on that figure or ([figure], premises, proof steps, conclusion) or as the paper uses (premises, conclusion, proof steps). This could be transformed into a text sequence of <premises> <conclusion> <proof steps>. They generated 100M of these (premises, conclusion, proof steps) text sequences
They then trained their LLM to input premises and conclusions as a prompt to generate proof steps as a result. As trained, the LLM would accept premises and conclusion and generate additional proof steps.
The challenge with geometry and other mathematical domains is that one often has to add auxiliary constructions (lines, points, angles, etc.) to prove some theory about a figure.
(Auxiliary constructions in Red)
The team at DeepMind were able to take all the 100M <premises> <conclusion> <proof steps> they had and select only those that involved auxiliary constructions in their proof steps. This came down to 9M text sequences which they used to fine tune the LLM so that it could be used to generate possible auxiliary constructions for any figure and theorem
AlphaGeometry in action
The combination of (DD+AR) and trained LLM (for auxiliary constructions) is AlphaGeometry.
AlphaGeometry’s proof process looks like this:
Take the problem statement (figure, conclusion [theorem to prove]),
Generate all possible premises from that figure.
If it has come up with the conclusion (theorem to prove), trace back and generate the proof steps,
If not, use the LLM to add an auxiliary construction to the figure and recurse.
In reality AlphaGeometry generates up to 512 of the best auxiliary constructions (out of an infinite set) for the current figure and uses each of these 512 new figures to do an exhaustive premise generation (via DD+AR) and see if any of these solves the problem statement.
Please read the Nature article for more information on AlphaGeometry.
~~~~
IMHO what’s new here is their use of synthetic data to generate millions of new training datums, fine tuning their LLM to produce auxiliary constructions, combining the use of DD and AR in their symbolic engine and then using both the DD+AR and the LLM to prove the theorem.
But what’s even more important here is that a combination of methods such as a symbolic engine and LLM points the way forward to create domain specific intelligent agents. One supposes, with enough intelligent agents, that could be combined to work in tandem, one could construct an AGI ensemble that masters a number of domains.
The intent of the data release is to at some point, end up supplying an open source alternative to closed source Google/OpenAI LLMs and a more fully opened source LLM than Meta’s Llama 2, that the world’s research community can use to understand, de-risk and further AI and ultimately AGI development.
We’ve written about AGI before (see our latest, One agent to rule them all – AGI part 7, which has links to parts 1-6 of our AGI posts). Needless to say it’s a very interesting topic to me and should be to the rest of humankind. LLM is a significant step towards AGI IMHO.
One of the Allen Institute for AI’s (AI2) major goals is to open source an LLM (see Announcing AI2 OLMo, an Open Language Model Made by Scientists for Scientists), including the data (Dolma), the model, it’s weight, the training tools/code, the evaluation tools/code, and everything else that went into creating their OLMo (Open Language Model) LLM.
This way the world’s research community can see how it was created and perhaps help in insuring it’s a good (whatever that means) LLM. Releasing Dolma is a first step towards a truly open source LLM.
The Dolma corpus
AI2 has released a report on the contents of Dolma (dolma-datasheet.pdf) which documents much of what went into creating the corpus.
The datasheet goes into a good level of detail into where the corpus data came from and how each data segment is licensed and other metadata to allow researchers the ability to understand its content.
For example, in the Common Crawl data, they have included all of the websites URL as identifiers and for The Stack data the names of GitHub repo used are included in the data’s metadata.
In addition, the Dolma corpus is released under an AI2ImpACT license as a medium risk artifact, which requires disclosure for use (download). Medium risk ImpACT licensing means that you cannot re-distribute (externally) any copy of the corpus but you may distribute any derivatives of the corpus with “Flow down use restrictions”, “Attribution” and “Notices”.
Which seems to say you can do an awful lot with the corpus and still be within its license restrictions. They do require an Derivative Impact Report to be filed which is sort of a model card for the corpus derivative you have created.
What’s this got to do with AGI
All that being said, the path to AGI is still uncertain. But the textual abilities of recent LLM releases seems to be getting closer and closer to something that approaches human skill in creating text, code, interactive agents, etc. Yes, this may be just one “slim” domain of human intelligence, but textual skills, when and if perfected, can be applied to much that white collar workers do these these days, at least online.
A good text LLM would potentially put many of our jobs at risk but could also possibly open up a much more productive, online workforce, able to assimilate massive amounts of information, and supply correct-current-vetted answers to any query.
The elephant in the room
But all that begs the real question behind AI2’s open sourcing OLMo, which is how do we humans create a safe, effective AGI that can benefit all of mankind rather than any one organization or nation. One that can be used safely by everyone to do whatever is needed to make the world a better society for all.
Versus, some artificial intelligent monstrosity, that sees humankind or any segment of them as an enemy, to whatever it believe needs to be done, and eliminates us or worse, ignores us as irrelevant.
I’m of the opinion that the only way to create a safe and effective AGI for the world is to use an open source approach to create many (competing) AGIs. There are a number of benefits to this as I see it. With a truly open source AGI,
Any organization (with sufficient training resources) can have access to their personally trained AGI, which means no one organization or nation can gain the lions share of benefits from AGI.
Would allow the creation and deployment of many competing AGI’s which should help limit and check any one of them from doing us or the world any harm. .
All of the worlds researchers can contribute to making it as safe as possible.
All of the worlds researcher can contribute to making it as multi-culturally, effective and correct as possible.
Anyone (with sufficient inferencing resources) can use it for their very own intelligent agent or to work on their very own personal world improvement projects.
Many cloud or service provider organizations (with sufficient inferencing resources) could make it available as a service to be used by anyone on an incremental, OPex cost basis.
The risks of a truly open source AGI are also many and include:
Any bad actor, nation state, organization, billionaire, etc., could copy the AGI and train it as a weapon to eliminate their enemies or all of humankind, if so inclined.
Any bad actors could use it to swamp the internet and world’s media with biased information, disinformation or propaganda.
Any good actor or researcher, could, perhaps by mistake, unleash an AGI on an exponentially increasing, self-improvement cycle that could grow beyond our ability to control or to understand.
An AGI agent alone, could take it upon itself to eliminate humanity or the world as the best option to save itself
But all these are even more of a problem for closed or semi-open/semi-closed releases of AGIs. As the only organizations with resources to do LLM research are very large tech companies or large technically competent nation states. And all of these are competing across the world stage already.
The resources may still limit widespread use
One item that seems to be in the way of truly widely available AGI is the compute resources needed to train or to use one for inferencing. OpenAI has Microsoft and other select organizations funding their compute, Meta and Google have all their advertising revenue funding theirs.
AI2 seems to have access (and looking for more funding for even more access) to the EU’s LUMI (HPE Cray system using AMD EPYC CPUs and AMD Instinct GPUs) supercomputer, located in CSC data center in Finland and is currently the EU’s fastest supercomputer at 375 CPU PFlops/550 GPU PFlops (~1.5M laptops).
Not many organizations, let alone nations could afford this level of compute.
But the funny thing is that compute doubles (flops/$) every 2 years or so. So, in six years or so, an equivalent of LUMI’s compute power would only require 150K current laptops and after another six years or so, 15K laptops. At some point, ~18 years from now, one would only need ~1.5K laptops, or something any nation or organization could probably afford. Add another 15 years and we are down to under 3 laptops, which just about anyone with a family in the modern world could afford. So in ~33 years or ~2054, any of us could train an LLM on our families compute resources. And that’s just the training compute..
My guess, something like 10-100X less compute resources would be required to use it for inferencing. So that’s probably available for any organization to use right now or if not now, in 6 years or so.
~~~
I can’t wait until I can have my very own AGI to use to write RayOnStorage current-correct-vetted blog posts for me…
I was perusing Deepmind’s mountain of research today and ran across one article on their Gato agent (A Generalist Agent abstract, paper pdf). These days with Llama 2, GPT-4 and all the other LLM’s doing code, chatbots, image generation, etc. it seems generalist agents are everywhere. But that’s not quite right.
Gato can not only generate text from prompts, but can also control a robot arm for pick and place, caption images, navigate in 3D, play Atari and other (shooter) video games, etc. all with the same exact model architecture and the same exact NN weights with no transfer learning required.
Same weights/same model is very unusual for generalist agents. Historically, generalist agents were all specifically trained on each domain and each resultant model had distinct weights even if they used the same model architecture. For Deepmind, to train Gato and use the same model/same weights for multiple domains is a significant advance.
Gato has achieved significant success in multiple domains. See chart below. However, complete success is still a bit out of reach but they are making progress.
For instance, in the chart one can see that their are over 200 tasks in the DM Lab arena that the model is trained to perform and Gato’s mean performance for ~180 of them is above a (100%) expert level. I believe DM Lab stands for Deepmind Lab and is described as a (multiplayer, first person shooter) 3D video game built on top of Quake III arena.
Deepmind stated that the mean for each task in any domain was taken over 50 distinct iterations of the same task. Gato performs, on average, 450 out of 604 “control” tasks at better than 50% human expert level. Please note, Gato does a lot more than just “control tasks”.
Model size and RT robotic control
One thing I found interesting is that they kept the model size down to 1.2B parameters so that it can perform real time inferencing in controlling robot arms. Over time as hardware speed increases, they believe they should be able train larger models and still retain real-time control. But at the moment, with a 1.2B model it can still provide. real time inferencing.
In order to understand model size vs. expertise they used 3 different model sizes training on same data, 79M, 364M and 1.2B parameters. As can be seen on the above chart, the models did suffer in performance as they got smaller. (Unclear to me what “Tokens Processed” on the X axis actually mean other than data length trained with.) However, it seems to imply, that with similar data, bigger models performed better and the largest did 10 to 20% better than the smallest model trained with same data streams.
Examples of Gato in action
The robot they used to train for was a “Sawyer robot arm with 3-DoF cartesian velocity control, an additional DoF for velocity, and a discrete gripper action.” It seemed a very flexible robot arm that would be used in standard factory environments. One robot task was to stack different styles and colors of plastic blocks.
Deepmind says that Gato provides rudimentary dialogue generation and picture captioning capabilities. Looking at the chat streams persented, seems more than rudimentary to me.
Deepmind did try the (smaller) model on some tasks that it was not originally trained on and it seemed to perform well after “fine-tuning” on the task. In most cases, using fine-tuning of the original model, with just “same domain” (task specific) data, the finely tuned model achieved similar results to what it achieved if Gato was trained from scratch with all the data used in the original model PLUS that specific domain’s data.
Data and tokenization used to train Gato
Deepmind is known for their leading edge research in RL but Gato’s deep neural net model is all trained with supervised learning using transformer techniques. While text based transformer type learning is pervasive in LLM today, vast web class data sets on 3D shooter gaming, robotic block stacking, image captioning and others aren’t nearly as widely available. Below they list the data sets Deepmind used to train Gato.
One key to how they could train a single transformer NN model to do all this, is that they normalized ALL the different types of data above into flat arrays of tokens.
Text was encoded into one of 32K subwords and was represented by integers from 0 to 32K. Text is presented to the model in word order
Images were transformed into 16×16 pixel patches in rastor order. Each pixel is normalized -1,1.
Other discrete values (e.g. Atari button pushes) are flattened into sequences of integers and presented to the model in row major order.
Continuous values (robot arm joint torques) are 1st flattened into sequences of floats in row major order and then mu-law encoded into the range -1,1 and then discretized into one of 1024 bins.
After tokenization, the data streams are converted into embeddings. Much more information on the tokenization and embedding process used in the model is available in the paper.
One can see the token count of the training data above. Like other LLMs, transformers take a token stream and randomly zero one out and are trained to guess that correct token in sequence.
~~~~
The paper (see link above and below) has a lot more to say about the control and non-control domains and the data used in training/fine-tuning Gato, if you’re interested. They also have a lengthy section on risks and challenges present in models of this type.
My concern is that as generalist models become more pervasive and as they are trained to work in more domains, the difference between an true AGI agent and a Generalist agent starts to blur.
Something like Gato that can both work in real world (via robotics) and perform meta analysis (like in metaworld), play 1st person shooter games, and analyze 2D and 3D images, all at near expert levels, and oh, support real time inferencing, seems to not that far away from something that could be used as a killer robot in an army of the future and this is just where Gato is today.
One thing I note is that the model is not being made generally available outside of Google Deepmind. And IMHO, that for now is a good thing.
That is until some bad actor gets their hands on it….
I was listening to a podcast a couple of weeks back and the person being interviewed made a comment that he didn’t believe that AGI would have a fast (hard) take off rather it would be slow (soft). Here’s the podcast John Carmack interviewed by Lex Fridman).
Hard vs. soft takeoff
A hard (fast) takeoff implies a relatively quick transition (seconds, hours, days, or months) between AGI levels of intelligence and super AGI levels of intelligence. A soft (slow) takeoff implies it would take a long time (years, decades, centuries) to go from AGI to super AGI.
We’ve been talking about AGI for a while now and if you want to see more about our thoughts on the topic, check out our AGI posts (in most recent order: AGI part 5, part 4, part 3 (ish), part (2), part (1), and part (0)).
The real problem is that many believe that any AGI that reaches super-intelligence will have drastic consequences for the earth and especially, for humanity. However, this is whole other debate.
The view is that a slow AGI takeoff might (?) allow sufficient time to imbue any and all (super) AGI with enough safeguards to eliminate or minimize any existential threat to humanity and life on earth (see part (1) linked above).
A fast take off won’t give humanity enough time to head off this problem and will likely result in an humanity ending and possibly, earth destroying event.
Hard vs Soft takeoff – the debate
I had always considered AGI would have a hard take off but Carmack seemed to think otherwise. His main reason is that current large transformer models (closest thing to AGI we have at the moment) are massive and take lots of special purpose (GPU/TPU/IPU) compute, lots of other compute and gobs and gobs of data to train on. Unclear what the requirements are to perform inferencing but suffice it to say it should be less.
And once AGI levels of intelligence were achieved, it would take a long time to acquire any additional regular or special purpose hardware, in secret, required to reach super AGI.
So, to just be MECE (mutually exclusive and completely exhaustive) on the topic, the reasons researchers and other have posited to show that AGI will have a soft takeoff, include:
AI hardware for training and inferencing AGI is specialized, costly, and acquisition of more will be hard to keep secret and as such, will take a long time to accomplish;
AI software algorithmic complexity needed to build better AGI systems is significantly hard (it’s taken 70yrs for humanity to reach todays much less than AGI intelligent systems) and will become exponentially harder to go beyond AGI level systems. This additional complexity will delay any take off;
Data availability to train AGI is humongous, hard to gather, find, & annotate properly. Finding good annotated data to go beyond AGI will be hard and will take a long time to obtain;
Human government and bureaucracy will slow it down and/or restrict any significant progress made in super AGI;
Human evolution took Ms of years to go from chimp levels of intelligence to human levels of intelligence, why would electronic evolution be 6-9 orders of magnitude faster.
AGI technology is taking off but the level of intelligence are relatively minor and specialized today. One could say that modern AI has been really going since the 1990s so we are 30yrs in and today have almost good AI chatbots today and AI agents that can summarize passages/articles, generate text from prompts or create art works from text. If it takes another 30 yrs to get to AGI, it should provide sufficient time to build in capabilities to limit super-AGI hard take off.
I suppose it’s best to take these one at a time.
Hardware acquisition difficulty – I suppose the easiest way for an intelligent agent to acquire additional hardware would be to crack cloud security and just take it. Other ways may be to obtain stolen credit card information and use these to (il)legally purchase more compute. Another approach is to optimize the current AGI algorithms to run better within the same AGI HW envelope, creating super AGI that doesn’t need any more hardware at all.
Software complexity growing – There’s no doubt that AGI software will be complex (although the podcast linked to above, is sub-titled that “AGI software will be simple”). But any sub-AGI agent that can change it’s code to become better or closer to AGI, should be able to figure out how not to stop at AGI levels of intelligence and just continue optimizating until it reaches some wall. i
Data acquisition/annotation will be hard – I tend to think the internet is the answer to any data limitations that might be present to an AGI agent. Plus, I’ve always questioned if Wikipedia and some select other databases wouldn’t be all an AGI would need to train on to attain super AGI. Current transformer models are trained on Wikipedia dumps and other data scraped from the internet. So there’s really two answers to this question, once internet access is available it’s unclear that there would be need for anymore data. And, with the data available to current transformers, it’s unclear that this isn’t already more than enough to reach super AGI
Human bureaucracy will prohibit it: Sadly this is the easiest to defeat. 1) there are roque governments and actors around the world with more than sufficient resources to do this on their own. And no agency, UN or otherwise, will be able to stop them. 2) unlike nuclear, the technology to do AI (AGI) is widely available to business and governments, all AI research is widely published (mostly open access nowadays) and if anything colleges/universities around the world are teaching the next round of AI scientists to take this on. 3) the benefits for being first are significant and is driving a weapons (AGI) race between organizations, companies, and countries to be first to get there.
Human evolution took Millions of years, why would electronic be 6-9 orders of magnitude faster – electronic computation takes microseconds to nanoseconds to perform operations and humans probably 0.1 sec, or so. Electronics is already 5 to 8 orders of magnitude faster than humans today. Yes the human brain is more than one CPU core (each neuron would be considered a computational element). But there are 64 core CPUs/4096 CORE GPUs out there today and probably one could consider similar in nature if taken in the aggregate (across a hyperscaler lets say). So, just using the speed ups above it should take anywhere from 1/1000 of a year to 1 year to cover the same computational evolution as human evolution covered between the chimp and human and accordingly between AGI and AGIx2 (ish).
AGI technology is taking a long time to reach, which should provide sufficient time to build in safeguards – Similar to the discussion on human bureaucracy above, with so many actors taking this on and the advantages of even a single AGI (across clusters of agents) would be significant, my guess is that the desire to be first will obviate any thoughts on putting in safeguards.
Other considerations for super AGI takeoff
Once you have one AGI trained why wouldn’t some organization, company or country deploy multiple agents. Moreover, inferencing takes orders of magnitude less computational power than training. So with 1/100-1/1000th the infrastructure, one could have a single AGI. But the real question is wouldn’t a 100- or 1000-AGis represent super intelligence?
Yes and no, 100 humans doesn’t represent super intelligence and a 1000 even less so. But humans have other desires, it’s unclear that 100 humans super focused on one task wouldn’t represent super intelligence (on that task).
Interior view of a data center with equipment
What can be done to slow AGI takeoff today
Baring something on the order of Nuclear Proliferation treaties/protocols, putting all GPUs/TPUs/IPUs on weapons export limitations AND restricting as secret, any and all AI research, nothing easily comes to mind. Of course Nuclear Proliferation isn’t looking that good at the moment, but whatever it’s current state, it has delayed proliferation over time.
One could spend time and effort slowing technology progress down. Such as by reducing next generation CPU/GPU/IPU compute cores , limiting compute speedups, reduce funding for AI research, putting a compute tax, etc. All of which, if done across the technological landscape and the whole world, could give humanity more time to build in AGI safeguards. But doing so would adversely impact all technological advancement, in healthcare, business, government, etc. And given the proliferation of current technology and the state actors working on increasing capabilities to create more, it would be hard to envision slowing technological advancement down much, if at all.
It’s almost like putting a tax on slide rules or making their granularity larger.
It could be that super AGI would independently perceive itself benignly, and only provide benefit to humanity and the earth. But, my guess is that given the number of bad actors intent on controlling the world, even if this were true, they would try to (re-)direct it to harm segments of humanity/society. And once unleashed, it would be hard to stop.
The only real solution to AGI in bad actor hands, is to educate all of humanity to value all humans and to cherish the environment we all live in as sacred. This would eliminate bad actors,
It sounds so naive, but in reality, it’s the only thing, I believe, the only way we can truly hope to get us through this AGI technological existential crisis.
Just like nuclear, we as a society will keep running into technological existential crisis’s like this. Heading all these off, with a better more all inclusive, more all embracing, and less combative humanity could help all of them.
I’ve been writing about AGI (see part-0 [ish], part-1 [ish], part-2 [ish], part-3ish, part-4 and part 5) and the dangers that come with it (part-0 in the above list) for a number of years now. My last post on the subject I expected to be writing a post discussing the book Human compatible AI and the problem of control which is a great book on the subject. But since then I ran across another paper that perhaps is a better brief introduction into the topic and some of the current thought and research into developing safe AI.
The article I found is Concrete problems in AI, written by a number of researchers at Google, Stanford, Berkley, and OpenAI. It essentially lays out the AI safety problem in 5 dimensions and these are:
Avoiding negative side effects – these can be minor or major and is probably the one thing that scares humans the most, some toothpick generating AI that strips the world to maximize toothpick making.
Avoiding reward hacking – this is more subtle but essentially it’s having your AI fool you in that it’s doing what you want but doing something else. This could entail actually changing the reward logic itself to being able to convince/manipulate the human overseer into seeing things it’s way. Also a pretty bad thing from humanity’s perspective
Scalable oversight – this is the problem where human(s) overseers aren’t able to keep up and witness/validate what some AI is doing, 7×24, across the world, at the speed of electronics. So how can AI be monitored properly so that it doesn’t go and do something it’s not supposed to (see the prior two for ideas on how bad this could be).
Safe exploration – this is the idea that reinforcement learning in order to work properly has to occasionally explore a solution space, e.g. a Go board with moves selected at random, to see if they are better then what it currently believes are the best move to make. This isn’t much of a problem for game playing ML/AI but if we are talking about helicopter controlling AI, exploration at random could destroy the vehicle plus any nearby structures, flora or fauna, including humans of course.
Robustness to distributional shifts – this is the perrennial problem where AI or DNNs are trained on one dataset but over time the real world changes and the data it’s now seeing has shifted (distribution) to something else. This often leads to DNNs not operating properly over time or having many more errors in deployment than it did during training. This is probably the one problem in this list that is undergoing more research to try to rectify than any of the others because it impacts just about every ML/AI solution currently deployed in the world today. This robustness to distributional shifts problem is why many AI DNN systems require periodic retraining.
So now we know what to look for, now what
Each of these deserves probably a whole book or more to understand and try to address. The paper talks about all of these and points to some of the research or current directions trying to address them.
The researchers correctly point out that some of the above problems are more pressing when more complex ML/AI agents have more autonomous control over actions in the real world.
We don’t want our automotive automation driving us over a cliff just to see if it’s a better action than staying in the lane. But Go playing bots or article summarizers might be ok to be wrong occasionally if it could lead to better playing bots/more concise article summaries over time. And although exploration is mostly a problem during training, it’s not to say that such activities might not also occur during deployment to probe for distributional shifts or other issues.
However, as we start to see more complex ML AI solutions controlling more activities, the issue of AI safety are starting to become more pressing. Autonomous cars are just one pressing example. But recent introductions of sorting robots, agricultural bots, manufacturing bots, nursing bots, guard bots, soldier bots, etc. are all just steps down a -(short) path of increasing complexity that can only end in some AGI bots running more parts (or all) of the world.
So safety will become a major factor soon, if it’s not already
Scares me the most
The first two on the list above scare me the most. Avoiding negative or unintentional side effects and reward hacking.
I suppose if we could master scalable oversight we could maybe deal with all of them better as well. But that’s defense. I’m all about offense and tackling the problem up front rather than trying to deal with it after it’s broken.
Negative side effects
Negative side effects is a rather nice way of stating the problem of having your ML destroy the world (or parts of it) that we need to live.
One approach to dealing with this problem is to define or train another AI/ML agent to measure impacts the environment and have it somehow penalize the original AI/ML for doing this. The learning approach has some potential to be applied to numerous ML activities if it can be shown to be safe and fairly all encompassing.
Another approach discussed in the paper is to inhibit or penalize the original ML actions for any actions which have negative consequences. One approach to this is to come up with an “empowerment measure” for the original AI/ML solution. The idea would be to reduce, minimize or govern the original ML’s action set (or potential consequences) or possible empowerment measure so as to minimize its ability to create negative side effects.
The paper discusses other approaches to the problem of negative side effects, one of which is having multiple ML (or ML and human) agents working on the problem it’s trying to solve together and having the ability to influence (kill switch) each other when they discover something’s awry. And the other approach they mention is to reduce the certainty of the reward signal used to train the ML solution. This would work by having some function that would reduce the reward if there are random side effects, which would tend to have the ML solution learn to avoid these.
Neither of these later two seem as feasible as the others but they are all worthy of research.
Reward hacking
This seems less of a problem to our world than negative side effects until you consider that if an ML agent is able to manipulate its reward code, it’s probably able to manipulate any code intending to limit potential impacts, penalize it for being more empowered or manipulate a human (or other agent) with its hand over the kill switch (or just turn off the kill switch).
So this problem could easily lead to a break out of any of the other problems present on the list of safety problems above and below. An example of reward hacking is a game playing bot that detects a situation that leads to buffer overflow and results in win signal or higher rewards. Such a bot will no doubt learn how to cause more buffer overflows so it can maximize its reward rather than learn to play the game better.
But the real problem is that a reward signal used to train a ML solution is just an approximation of what’s intended. Chess programs in the past were trained by masters to use their opening to open up the center of the board and use their middle and end game to achieve strategic advantages. But later chess and go playing bots just learned to checkmate their opponent and let the rest of the game take care of itself.
Moreover, (board) game play is relatively simple domain to come up with proper reward signals (with the possible exception of buffer overflows or other bugs). But car driving bots, drone bots, guard bots, etc., reward signals are not nearly as easy to define or implement.
One approach to avoid reward hacking is to make the reward signaling process its own ML/AI agent that is (suitably) stronger than the ML/AI agent learning the task. Most reward generators are relatively simple code. For instance in monopoly, one that just counts the money that each player has at the end of the game could be used to determine the winner (in a timed monopoly game). But rather than having a simple piece of code create the reward signal use ML to learn what the reward should be. Such an agent might be trained to check to see if more or less money was being counted than was physically possible in the game. Or if property was illegally obtained during the game or if other reward hacks were done. And penalize the ML solution for these actions. These would all make the reward signal depend on proper training of that ML solution. And the two ML solutions would effectively compete against one another.
Another approach is to “sandbox” the reward code/solution so that it is outside of external and or ML/AI influence. Possible combining the prior approach with this one might suffice.
Yet another approach is to examine the ML solutions future states (actions) to determine if any of them impact the reward function itself and penalize it for doing this. This assumes that the future states are representative of what it plans to do and that some code or some person can recognize states that are inappropriate.
Another approach discussed in the paper is to have multiple reward signals. These could use multiple formulas for computing the multi-faceted reward signal and averaging them or using some other mathematical function to combine them into something that might be more accurate than one reward function alone. This way any ML solution reward hacking would need to hack multiple reward functions (or perhaps the function that combines them) in order to succeed.
The one IMHO that has the most potential but which seems the hardest to implement is to somehow create “variable indifference” in the ML/AI solution. This means having the ML/AI solution ignore any steps that impact the reward function itself or other steps that lead to reward hacking. The researchers rightfully state that if this were possible then many of the AI safety concerns could be dealt with.
There are many other approaches discussed and I would suggest reading the paper to learn more. None of the others, seem simple or a complete solution to all potential reward hacks.
~~~
The paper goes into the same or more level of detail with the other three “concrete safety” issues in AI.
In my last post (see part 5 link above) I thought I was going to write about Human Compatible (AI) by S. Russell book’s discussion AI safety. But then I found the “Concrete problems in AI safety paper (see link above) and thought it provided a better summary of AI safety issues and used it instead. I’ll try to circle back to the book at some later date.
Read two articles over the past month or so. The more recent one was an Economist article (AI enters the industrial age, paywall) and the other was A generalist agent (from Deepmind). The Deepmind article was all about the training of Gato, a new transformer deep learning model trained to perform well on 600 separate task arenas from image captioning, to Atari games, to robotic pick and place tasks.
And then there was this one tweet from Nando De Frietas, research director at Deepmind:
“Someone’s opinion article. My opinion: It’s all about scale now! The Game is Over! It’s about making these models bigger, safer, compute efficient, faster at sampling, smarter memory, more modalities, INNOVATIVE DATA, on/offline, … 1/N“
I take this to mean that AGI is just a matter of more scale. Deepmind and others see the way to attain AGI is just a matter of throwing more servers, GPUs and data at the training the model.
We have discussed AGI in the past (see part-0 [ish], part-1 [ish], part-2 [ish], part-3ish and part-4 blog posts [We apologize, only started numbering them at 3ish]). But this tweet is possibly the first time we have someone in the know, saying they see a way to attain AGI.
Transformer models
It’s instructive from my perspective that, Gato is a deep learning transformer model. Also the other big NLP models have all been transformer models as well.
Gato (from Deepmind), SWITCH Transformer (from Google), GPT-3/GPT-J (from OpenAI), OPT (from meta), and Wu Dai 2.0 (from China’s latest supercomputer) are all trained on more and more text and image data scraped from the web, wikipedia and other databases.
Wikipedia says transformer models are an outgrowth of RNN and LSTM models that use attention vectors on text. Attention vectors encode, into a vector (matrix), all textual symbols (words) prior to the latest textual symbol. Each new symbol encountered creates another vector with all prior symbols plus the latest word. These vectors would then be used to train RNN models using all vectors to generate output.
The problem with RNN and LSTM models is that it’s impossible to parallelize. You always need to wait until you have encountered all symbols in a text component (sentence, paragraph, document) before you can begin to train.
Instead of encoding this attention vectors as it encounters each symbol, transformer models encode all symbols at the same time, in parallel and then feed these vectors into a DNN to assign attention weights to each symbol vector. This allows for complete parallelism which also reduced the computational load and the elapsed time to train transformer models.
And transformer models allowed for a large increase in DNN parameters (I read these as DNN nodes per layer X number of layers in a model). GATO has 1.2B parameters, GPT-3 has 175B parameters, and SWITCH Transformer is reported to have 7X more parameters than GPT-3 .
Estimates for how much it cost to train GPT-3 range anywhere from $10M-20M USD.
AGI will be here in 10 to 20 yrs at this rate
So if it takes ~$15M to train a 175B transformer model and Google has already done SWITCH which has 7-10X (~1.5T) the number of GPT-3 parameters. It seems to be an arms race.
If we assume it costs ~$65M (~2X efficiency gain since GPT-3 training) to train SWITCH, we can create some bounds as to how much it will cost to train an AGI model.
By the way, the number of synapses in the human brain is approximately 1000T (See Basic NN of the brain, …). If we assume that DNN nodes are equivalent to human synapses (a BIG IF), we probably need to get to over 1000T parameter model before we reach true AGI.
So my guess is that any AGI model lies somewhere between 650X to 6,500X parameters beyond SWITCH or between 1.5Q to 15Q model parameters.
If we assume current technology to do the training this would cost $40B to $400B to train. Of course, GPUs are not standing still and NVIDIA’s Hopper (introduced in 2022) is at least 2.5X faster than their previous gen, A100 GPU (introduced in 2020). So if we waited a 10 years, or so we might be able to reduce this cost by a factor of 100X and in 20 years, maybe by 10,000X, or back to where roughly where SWITCH is today.
So in the next 20 years most large tech firms should be able to create their own AGI models. In the next 10 years most governments should be able to train their own AGI models. And as of today, a select few world powers could train one, if they wanted to.
Where they get the additional data to train these models (I assume that data counts would go up linearly with parameter counts) may be another concern. However, I’m sure if you’re willing to spend $40B on AGI model training, spending a few $B more on data acquisition shouldn’t be a problem.
~~~~
At the end of the Deepmind article on Gato, it talks about the need for AGI safety in terms of developing preference learning, uncertainty modeling and value alignment. The footnote for this idea is the book, Human Compatible (AI) by S. Russell.
Preference learning is a mechanism for AGI to learn the “true” preference of a task it’s been given. For instance, if given the task to create toothpicks, it should realize the true preference is to not destroy the world in the process of making toothpicks.
Uncertainty modeling seems to be about having AI assume it doesn’t really understand what the task at hand truly is. This way there’s some sort of (AGI) humility when it comes to any task. Such that the AGI model would be willing to be turned off, if it’s doing something wrong. And that decision is made by humans.
Deepmind has an earlier paper on value alignment. But I see this as the ability of AGI to model human universal values (if such a thing exists) such as the sanctity of human life, the need for the sustainability of the planet’s ecosystem, all humans are created equal, all humans have the right to life, liberty and the pursuit of happiness, etc.
I can see a future post is needed soon on Human Compatible (AI).
The team was attempting to create an autonomous probe that could navigate the ocean and other large bodies of water to gather information. I believe ultimately the intent was to provide the navigational smarts for a submersible that could navigate terrestrial and non-terrestrial oceans.
One of the biggest challenges for probes like this is to be able to navigate turbulent flow without needing a lot of propulsive power and using a lot of computational power. They said that any probe that could propel itself faster than the current could easily travel wherever it wanted but the real problem was to go somewhere with lower powered submersibles.. As a result, they set their probe to swim at a constant speed at 80% of the overall simulated water flow.
Even that was relatively feasible if you had unlimited computational power to train and inference with but trying to do this on something that could fit in a small submersible was a significant challenge. NLP models today have millions of parameters and take hours to train with multiple GPU/CPU cores in operation and lots of memory Inferencing using these NLP models also takes a lot of processing power.
The researchers targeted the computational power to something significantly smaller and wished to train and perform real time inferencing on the same hardware. They chose a “Teensy 4.0 micro-controller” board for their computational engine which costs under $20, had ~2MB of flash memory and fit in a space smaller than 1.5″x1.0″ (38.1mm X 25.4mm).
The simulation setup
The team started their probe turbulent flow training with a cylinder in a constant flow that generated downstream vortices, flowing in opposite directions. These vortices would travel from left to right in the simulated flow field. In order for the navigation logic to traverse this vortical flow, they randomly selected start and end locations on different sides.
The AI model they trained and used for inferencing was a combination of reinforcement learning (with an interesting multi-factor reward signal) and a policy using a trained deep neural network. They called this approach Deep RL.
For reinforcement learning, they used a reward signal that was a function of three variables: the time it took, the difference in distance to target and a success bonus if the probe reached the target. The time variable was a penalty and was the duration of the swim activity. Distance to target was how much the euclidean distance between the current probe location and the target location had changed over time. The bonus was only applied when the probe was in close proximity to the target location, The researchers indicated the reward signal could be used to optimize for other values such as energy to complete the trip, surface area traversed, wear and tear on propellers, etc.
For the reinforcement learning state information, they supplied the probe and the target relative location [Difference(Probe x,y, Target x,y)], And whatever sensor data being tested (e.g., for the velocity sensor equipped probe, the local velocity of the water at the probe’s location).
They trained the DNN policy using the state information (probe start and end location, local velocity/vorticity sensor data) to predict the swim angle used to navigate to the target. The DNN policy used 2 internal layers with 64 nodes each.
They benchmarked the Deep RL solution with local velocity sensing against a number of different approaches. One naive approach that always swam in the direction of the target, one flow blind approach that had no sensors but used feedback from it’s location changes to train with, one vorticity sensor approach which sensed the vorticity of the local water flow, and one complete knowledge approach (not shown above) that had information on the actual flow at every location in the 2D simulation
It turned out that of the first four (naive, flow-blind, vorticity sensor and velocity sensor) the velocity sensor configured robot had the highest success rate (“near 100%”).
That simulated probe was then measured against the complete flow knowledge version. The complete knowledge version had faster trip speeds, but only 18-39% faster (on the examples shown in the paper). However, the knowledge required to implement this algorithm would not be feasible in a real ocean probe.
More to be done
They tried the probes Deep RL navigation algorithm on a different simulated flow configuration, a double gyre flow field (sort of like 2 circular flows side by side but going in the opposite directions).
The previously trained (on cylinder vortical flow) Deep RL navigation algorithm only had a ~4% success rate with the double gyre flow. However, after training the Deep RL navigation algorithm on the double gyre flow, it was able to achieve a 87% success rate.
So with sufficient re-training it appears that the simulated probe’s navigation Deep RL could handle different types of 2D water flow.
The next question is how well their Deep RL can handle real 3D water flows, such as idal flows, up-down swells, long term currents, surface wind-wave effects, etc. It’s probable that any navigation for real world flows would need to have a multitude of Deep RL trained algorithms to handle each and every flow encountered in real oceans.
However, the fact that training and inferencing could be done on the same small hardware indicates that the Deep RL could possibly be deployed in any flow, let it train on the local flow conditions until success is reached and then let it loose, until it starts failing again. Training each time would take a lot of propulsive power but may be suitable for some probes.
The researchers have 3D printed a submersible with a Teensy microcontroller and an Arduino controller board with propellers surrounding it to be able to swim in any 3D direction. They have also constructed a water tank for use for in real life testing of their Deep RL navigation algorithms.