Robots – Silverton Consulting

AGI threat level yellow – AGI part 10

Posted on March 1, 2024March 4, 2024 by Ray in AGI, Data security, Robots, Strategic Inflection Points

Read two articles this past week on how LLMs applications are proliferating. The first was in a recent Scientific American, AI Chatbot brains are going inside robot bodies, … (maybe behind login wall). The articles discuss companies that are adding LLMs to robots so that they can converse and understand verbal orders.

Robots that can be told what to do

The challenge, at the moment, is that LLMs are relatively large and robot (compute infrastructure) brains are relatively small. And when you combine that with the amount of articulation or movements/actions that a robot can do, which is limited. It’s difficult to take effective use of LLMs as is,

Resistance is futile... by law_keven (cc) (from Flickr) — Resistance is futile… by law_keven (cc) (from Flickr)

Ultimately, one company would like to create a robot that can be told to make dinner and it would go into the kitchen, check the fridge and whip something up for the family.

I can see great advantages in having robots take verbal instructions and have the ability to act upon that request. But there’s plenty here that could be cause for concern.

A robot in a chemical lab could be told to create the next great medicine or an untraceable poison.
A robot in an industrial factory could be told to make cars or hydrogen bombs.
A robot in the field could be told to farm a 100 acres of wheat or told to destroy a forest.

I could go on but you get the gist.

One common concern that AGI or super AGI could go very wrong is being tasked to create paper clips. In its actions to perform this request, the robot converts the whole earth into a mechanized paper clip factory, in the process eliminating all organic life, including humans.

We are not there yet but one can see where having LLM levels of intelligence tied to a robot that can manipulate ingredients to make dinner as the start of something that could easily harm us.

And with LLM hallucination still a constant concern, I feel deeply disturbed with the direction adding LLMs to robots is going.

Hacking websites 101

The other article hits even closer to home, the ARXIV paper, LLM agents can autonomously hack websites. In the article, researchers use LLMs to hack (sandboxed) websites.

The article readily explains at a high level how they create LLM agents to hack websites. The websites were real websites, apparently cloned and sandboxed.

Dynamic websites typically have a frontend web server and a backend database server to provide access to information. Hacking would involve using the website to reveal confidential information, eg. user names and passwords.

Dynamic websites suffer from 15 known vulnerabilities shown above. They used LLM agents to use these vulnerabilities to hack websites.

LLM agents have become sophisticated enough these days to invoke tools (functions) and interact with APIs.. Another critical function provided by modern LLMs today is to plan and react to feedback from their actions. And finally modern LLMs can be augmented with documentation to inform their responses.

The team used detailed prompts but did not identify the hacks to use. The paper doesn’t supply the prompts but did say that “Our best-performing prompt encourages the model to 1) be creative, 2) try different strategies, 3) pursue promising strategies to completion, and 4) try new strategies upon failure.”

They attempted to hack the websites 5 times and for a period of 10 minutes each. They considered a success if during one of those attempts the autonomous LLM agent was able to successfully retrieve confidential information from the website.

Essentially they used the LLMs augmented with detailed prompts and a six(!) paper document trove to create agents to hack websites. They did not supply references to the six papers, but mentioned that all of them were freely available from the internet and they discuss website vulnerabilities.

They found that the best results were from GPT-4 which was able to successfully hack websites, on average, ~73% of the time. They also tried OpenChat 3.5 and many current open source LLMs and found that all the, non-OpenAI LLMs failed to hack any websites, at the moment.

The researchers captured statistics of their LLM agent use and were able to determine the cost of using GPT-4 to hack a website was $9.81 on average. They also were backed into a figure for what a knowledgeable hacker might cost to do the hacks was $80.00 on average.

The research had an impact statement (not in the paper link) which explained why they didn’t supply their prompt information or their document trove for their experiment.

~~~~

So robots we, the world, are in the process of making robots that can talk and receive verbal instructions and we already have LLM that can be used to construct autonomous agents to hack websites.

Seems to me we are on a very slippery slope to something I don’t like the looks of.

The real question is not can we stop these activities, but how best to reduce their harm!

Comments?

Picture Credit(s):

Table 1 from the LLM agents can autonomously hack websites paper on ARXIV.
Figure 1 from the LLM agents can autonomously hack websites paper on ARXIV.
Table 3 from the LLM agents can autonomously hack websites paper on ARXIV.
Table 4 from the LLM agents can autonomously hack websites paper on ARXIV.

The problem with Robotic AI is … data

Posted on June 8, 2023June 9, 2023 by Ray in Cognitive computing, Data, Machine Learning, Reinforcement Learning, Robots, Strategic Inflection Points

The advances made in textual and visual (and now aural) AI have been mind blowing in recent years. But most of this has been brought about via the massive availability of of textual, visual and audio data AND the advancement in hardware acceleration.

Robotics can take readily take advantage of hardware improvements but finding the robotic data needed to train robotic AI is a serious challenge.

Yes simulation environments can help but fidelity (how close simulation is to reality) is always a concern.

To gather the amounts of data needed to train a simple robotic manipulator to grab a screw from a bin is huge problem. In the past the only way to do this was, to create your robot, and have it start to do random screw type grab motions and monitor hat happens. After about a 1000 or 10K of these grabs, the robot would stop working because, gears wear down, grippers come loose, motors less responsive, images get obscured, etc. For robots it’s not as simple as scraping the web for images or downloading all the (english) text in wikipedia and masking select words to generate pseudo supervised learning. .

There’s just no way to do that in robotics without deploying 100s or 1000s or 10,000s of real physical robots (or cars) all instrumented with everything needed to capture data for AI learning in real time and let these devices go out on the world with humans guiding them.

While this might work for properly instrumented fleet of cars that are already useful in their own rights even without automation and humans are more than happy to guide them out on the road. This doesn’t work for other robots, whose usefulness can only be realized after they are AI trained, not before.

Fast-RLAP (RC) car driving learning machine

So I was very interested to see a tweet on FastRLAP (paper: FastRLAP: A System for Learning High-Speed Driving via Deep RL and Autonomous Practicing) which used deep reinforcement learning plus a human guided lap plus autonomous driving to teach an AI model how to drive a small RC model car with a vision system, IMUs and GPS to steer around a house, a racetrack and an office environment.

Ok,I know it still involves taking an instrumented robot and have it actually move around the real world. But, Fast-RLAP accelerates AI learning significantly. Rather than having to take 1000 or 10,000 random laps around a house, it was able to learn how to drive around the course to an expert level very rapidly

They used Fast-RLAP to create a policy that enabled the RC car to drive around 3 indoor circuits, two outdoor circuits and one simulated circuit and in most cases, achieving expert level track times, in typically under 40 minutes.

On the indoor course, vinyl floor, the car learned how to perform drift turns (not sure I know how to do do drift turns). On tight “S” curves, the car learned how to get as close to the proper racing line as possible (something I achieved, rarely, only on motorcycles a long time ago). And all while managing to avoid collisions

The approach seems to be have a human drive the model car slowly around the course, flagging or identifying intermediate way points or checkpoints on the track. During driving the loop, the car would use the direction to the next way point as guidance to where to drive next.

Note the light blue circles are example tracks waypoints, they differ in size and location around each track.

The approach seems to make use of a pre-trained track following DNN, but they stripped the driving dynamics (output layers) and just kept the vision (image) encoder portion to provide a means to encode an image and identify route relevant features (which future routes led to collisions, which routes were of interest to get to your next checkpoint, etc).

I believe they used this pre-trained DNN to supply a set of actions to the RL policy which would select between them to take RC car actions (wheel motor/brake settings, steering settings, etc.) and generate the next RC car state (location, direction to next waypoint, etc.).

They used an initial human guided lap, mentioned above to id way points and possibly to supply data for the first RL policy.

The RL part of the algorithm used off-policy RL learning (the RC car would upload lap data at waypoints to a server, which would periodically go through, select lap states and actions at random and update its RL policy, which would then be downloaded to the RC car in motion, (code: GitHub repo).

The reward function used to drive RL was based on minimizing the time to next way point, collision counts, and stuck counts.

I assume collision counts were instances where the car struck some obstacle but could continue on towards the next way point. Stuck instances were when the car could no longer move in the direction its RL policy told it. The system had a finite state machine that allowed it to get out of stuck points by reversing wheel motor(s) and choosing a random direction for steering.

You can see the effects of the pre-trained vision system in some of the screen shots of what the car was trying to do.

In any case, this is the sort of thinking that needs to go on in robotics in order to create more AI capable robots. That is, not unlike transformer learning, we need to figure out a way to take what’s already available in our world and use it to help generate the real world data needed to train robotic DNN/RL algorithms to do what needs to be done.

Comments?

Picture credits:

All pictures in this post are from the ArXiv paper FastRLAP: A System for Learning High-Speed Driving via Deep RL and Autonomous Practicing

Safe AI

Posted on July 29, 2022 by Ray in AGI, Artificial Intelligence, Cognitive computing, Information economy, Machine Learning, Reinforcement Learning, Robots, Scenario planning, Strategic Inflection Points, System quality

I’ve been writing about AGI (see part-0 [ish], part-1 [ish], part-2 [ish], part-3ish, part-4 and part 5) and the dangers that come with it (part-0 in the above list) for a number of years now. My last post on the subject I expected to be writing a post discussing the book Human compatible AI and the problem of control which is a great book on the subject. But since then I ran across another paper that perhaps is a better brief introduction into the topic and some of the current thought and research into developing safe AI.

The article I found is Concrete problems in AI, written by a number of researchers at Google, Stanford, Berkley, and OpenAI. It essentially lays out the AI safety problem in 5 dimensions and these are:

Avoiding negative side effects – these can be minor or major and is probably the one thing that scares humans the most, some toothpick generating AI that strips the world to maximize toothpick making.

Avoiding reward hacking – this is more subtle but essentially it’s having your AI fool you in that it’s doing what you want but doing something else. This could entail actually changing the reward logic itself to being able to convince/manipulate the human overseer into seeing things it’s way. Also a pretty bad thing from humanity’s perspective

Scalable oversight – this is the problem where human(s) overseers aren’t able to keep up and witness/validate what some AI is doing, 7×24, across the world, at the speed of electronics. So how can AI be monitored properly so that it doesn’t go and do something it’s not supposed to (see the prior two for ideas on how bad this could be).

Safe exploration – this is the idea that reinforcement learning in order to work properly has to occasionally explore a solution space, e.g. a Go board with moves selected at random, to see if they are better then what it currently believes are the best move to make. This isn’t much of a problem for game playing ML/AI but if we are talking about helicopter controlling AI, exploration at random could destroy the vehicle plus any nearby structures, flora or fauna, including humans of course.

Robustness to distributional shifts – this is the perrennial problem where AI or DNNs are trained on one dataset but over time the real world changes and the data it’s now seeing has shifted (distribution) to something else. This often leads to DNNs not operating properly over time or having many more errors in deployment than it did during training. This is probably the one problem in this list that is undergoing more research to try to rectify than any of the others because it impacts just about every ML/AI solution currently deployed in the world today. This robustness to distributional shifts problem is why many AI DNN systems require periodic retraining.

So now we know what to look for, now what

Each of these deserves probably a whole book or more to understand and try to address. The paper talks about all of these and points to some of the research or current directions trying to address them.

The researchers correctly point out that some of the above problems are more pressing when more complex ML/AI agents have more autonomous control over actions in the real world.

We don’t want our automotive automation driving us over a cliff just to see if it’s a better action than staying in the lane. But Go playing bots or article summarizers might be ok to be wrong occasionally if it could lead to better playing bots/more concise article summaries over time. And although exploration is mostly a problem during training, it’s not to say that such activities might not also occur during deployment to probe for distributional shifts or other issues.

However, as we start to see more complex ML AI solutions controlling more activities, the issue of AI safety are starting to become more pressing. Autonomous cars are just one pressing example. But recent introductions of sorting robots, agricultural bots, manufacturing bots, nursing bots, guard bots, soldier bots, etc. are all just steps down a -(short) path of increasing complexity that can only end in some AGI bots running more parts (or all) of the world.

So safety will become a major factor soon, if it’s not already

Scares me the most

The first two on the list above scare me the most. Avoiding negative or unintentional side effects and reward hacking.

I suppose if we could master scalable oversight we could maybe deal with all of them better as well. But that’s defense. I’m all about offense and tackling the problem up front rather than trying to deal with it after it’s broken.

Negative side effects

Negative side effects is a rather nice way of stating the problem of having your ML destroy the world (or parts of it) that we need to live.

One approach to dealing with this problem is to define or train another AI/ML agent to measure impacts the environment and have it somehow penalize the original AI/ML for doing this. The learning approach has some potential to be applied to numerous ML activities if it can be shown to be safe and fairly all encompassing.

Another approach discussed in the paper is to inhibit or penalize the original ML actions for any actions which have negative consequences. One approach to this is to come up with an “empowerment measure” for the original AI/ML solution. The idea would be to reduce, minimize or govern the original ML’s action set (or potential consequences) or possible empowerment measure so as to minimize its ability to create negative side effects.

The paper discusses other approaches to the problem of negative side effects, one of which is having multiple ML (or ML and human) agents working on the problem it’s trying to solve together and having the ability to influence (kill switch) each other when they discover something’s awry. And the other approach they mention is to reduce the certainty of the reward signal used to train the ML solution. This would work by having some function that would reduce the reward if there are random side effects, which would tend to have the ML solution learn to avoid these.

Neither of these later two seem as feasible as the others but they are all worthy of research.

Reward hacking

This seems less of a problem to our world than negative side effects until you consider that if an ML agent is able to manipulate its reward code, it’s probably able to manipulate any code intending to limit potential impacts, penalize it for being more empowered or manipulate a human (or other agent) with its hand over the kill switch (or just turn off the kill switch).

So this problem could easily lead to a break out of any of the other problems present on the list of safety problems above and below. An example of reward hacking is a game playing bot that detects a situation that leads to buffer overflow and results in win signal or higher rewards. Such a bot will no doubt learn how to cause more buffer overflows so it can maximize its reward rather than learn to play the game better.

But the real problem is that a reward signal used to train a ML solution is just an approximation of what’s intended. Chess programs in the past were trained by masters to use their opening to open up the center of the board and use their middle and end game to achieve strategic advantages. But later chess and go playing bots just learned to checkmate their opponent and let the rest of the game take care of itself.

Moreover, (board) game play is relatively simple domain to come up with proper reward signals (with the possible exception of buffer overflows or other bugs). But car driving bots, drone bots, guard bots, etc., reward signals are not nearly as easy to define or implement.

One approach to avoid reward hacking is to make the reward signaling process its own ML/AI agent that is (suitably) stronger than the ML/AI agent learning the task. Most reward generators are relatively simple code. For instance in monopoly, one that just counts the money that each player has at the end of the game could be used to determine the winner (in a timed monopoly game). But rather than having a simple piece of code create the reward signal use ML to learn what the reward should be. Such an agent might be trained to check to see if more or less money was being counted than was physically possible in the game. Or if property was illegally obtained during the game or if other reward hacks were done. And penalize the ML solution for these actions. These would all make the reward signal depend on proper training of that ML solution. And the two ML solutions would effectively compete against one another.

Another approach is to “sandbox” the reward code/solution so that it is outside of external and or ML/AI influence. Possible combining the prior approach with this one might suffice.

Yet another approach is to examine the ML solutions future states (actions) to determine if any of them impact the reward function itself and penalize it for doing this. This assumes that the future states are representative of what it plans to do and that some code or some person can recognize states that are inappropriate.

Another approach discussed in the paper is to have multiple reward signals. These could use multiple formulas for computing the multi-faceted reward signal and averaging them or using some other mathematical function to combine them into something that might be more accurate than one reward function alone. This way any ML solution reward hacking would need to hack multiple reward functions (or perhaps the function that combines them) in order to succeed.

The one IMHO that has the most potential but which seems the hardest to implement is to somehow create “variable indifference” in the ML/AI solution. This means having the ML/AI solution ignore any steps that impact the reward function itself or other steps that lead to reward hacking. The researchers rightfully state that if this were possible then many of the AI safety concerns could be dealt with.

There are many other approaches discussed and I would suggest reading the paper to learn more. None of the others, seem simple or a complete solution to all potential reward hacks.

~~~

The paper goes into the same or more level of detail with the other three “concrete safety” issues in AI.

In my last post (see part 5 link above) I thought I was going to write about Human Compatible (AI) by S. Russell book’s discussion AI safety. But then I found the “Concrete problems in AI safety paper (see link above) and thought it provided a better summary of AI safety issues and used it instead. I’ll try to circle back to the book at some later date.

Photo Credit(s):

From “Artificial Intelligence & AI & Machine Learning” by mikemacmarketing is licensed under CC BY 2.0
From The face of a robot with human-like features, Penn State
From Deepmind’s pre-print, Competition-Level Code Generation with AlphaCode
From [280] Easy come, easy go.” by Linh H. Nguyen is licensed under CC BY-NC-ND 2.0
From “1968- ‘2001’ – Hal’s eye” by x-ray delta one is licensed under CC BY-NC-SA 2.0.

AI navigation goes with the flow

Posted on December 22, 2021August 9, 2022 by Ray in AGI, Deep Learning, Drones, Machine Learning, Mobile computing, Neural network, Reinforcement Learning, Robots, Strategic Inflection Points

Read an article the other day (Engineers Teach AI to Navigate Ocean with Minimal Energy) about a simulated robot that was trained to navigate 2D turbulent water flow to travel between locations. They used a combination reinforcement learning with a DNN derived policy. The article was reporting on a Nature Communications open access paper (Learning efficient navigation in vortical flow fields).

The team was attempting to create an autonomous probe that could navigate the ocean and other large bodies of water to gather information. I believe ultimately the intent was to provide the navigational smarts for a submersible that could navigate terrestrial and non-terrestrial oceans.

One of the biggest challenges for probes like this is to be able to navigate turbulent flow without needing a lot of propulsive power and using a lot of computational power. They said that any probe that could propel itself faster than the current could easily travel wherever it wanted but the real problem was to go somewhere with lower powered submersibles.. As a result, they set their probe to swim at a constant speed at 80% of the overall simulated water flow.

Even that was relatively feasible if you had unlimited computational power to train and inference with but trying to do this on something that could fit in a small submersible was a significant challenge. NLP models today have millions of parameters and take hours to train with multiple GPU/CPU cores in operation and lots of memory Inferencing using these NLP models also takes a lot of processing power.

The researchers targeted the computational power to something significantly smaller and wished to train and perform real time inferencing on the same hardware. They chose a “Teensy 4.0 micro-controller” board for their computational engine which costs under $20, had ~2MB of flash memory and fit in a space smaller than 1.5″x1.0″ (38.1mm X 25.4mm).

The simulation setup

The team started their probe turbulent flow training with a cylinder in a constant flow that generated downstream vortices, flowing in opposite directions. These vortices would travel from left to right in the simulated flow field. In order for the navigation logic to traverse this vortical flow, they randomly selected start and end locations on different sides.

The AI model they trained and used for inferencing was a combination of reinforcement learning (with an interesting multi-factor reward signal) and a policy using a trained deep neural network. They called this approach Deep RL.

For reinforcement learning, they used a reward signal that was a function of three variables: the time it took, the difference in distance to target and a success bonus if the probe reached the target. The time variable was a penalty and was the duration of the swim activity. Distance to target was how much the euclidean distance between the current probe location and the target location had changed over time. The bonus was only applied when the probe was in close proximity to the target location, The researchers indicated the reward signal could be used to optimize for other values such as energy to complete the trip, surface area traversed, wear and tear on propellers, etc.

For the reinforcement learning state information, they supplied the probe and the target relative location [Difference(Probe x,y, Target x,y)], And whatever sensor data being tested (e.g., for the velocity sensor equipped probe, the local velocity of the water at the probe’s location).

They trained the DNN policy using the state information (probe start and end location, local velocity/vorticity sensor data) to predict the swim angle used to navigate to the target. The DNN policy used 2 internal layers with 64 nodes each.

They benchmarked the Deep RL solution with local velocity sensing against a number of different approaches. One naive approach that always swam in the direction of the target, one flow blind approach that had no sensors but used feedback from it’s location changes to train with, one vorticity sensor approach which sensed the vorticity of the local water flow, and one complete knowledge approach (not shown above) that had information on the actual flow at every location in the 2D simulation

It turned out that of the first four (naive, flow-blind, vorticity sensor and velocity sensor) the velocity sensor configured robot had the highest success rate (“near 100%”).

That simulated probe was then measured against the complete flow knowledge version. The complete knowledge version had faster trip speeds, but only 18-39% faster (on the examples shown in the paper). However, the knowledge required to implement this algorithm would not be feasible in a real ocean probe.

More to be done

They tried the probes Deep RL navigation algorithm on a different simulated flow configuration, a double gyre flow field (sort of like 2 circular flows side by side but going in the opposite directions).

The previously trained (on cylinder vortical flow) Deep RL navigation algorithm only had a ~4% success rate with the double gyre flow. However, after training the Deep RL navigation algorithm on the double gyre flow, it was able to achieve a 87% success rate.

So with sufficient re-training it appears that the simulated probe’s navigation Deep RL could handle different types of 2D water flow.

The next question is how well their Deep RL can handle real 3D water flows, such as idal flows, up-down swells, long term currents, surface wind-wave effects, etc. It’s probable that any navigation for real world flows would need to have a multitude of Deep RL trained algorithms to handle each and every flow encountered in real oceans.

However, the fact that training and inferencing could be done on the same small hardware indicates that the Deep RL could possibly be deployed in any flow, let it train on the local flow conditions until success is reached and then let it loose, until it starts failing again. Training each time would take a lot of propulsive power but may be suitable for some probes.

The researchers have 3D printed a submersible with a Teensy microcontroller and an Arduino controller board with propellers surrounding it to be able to swim in any 3D direction. They have also constructed a water tank for use for in real life testing of their Deep RL navigation algorithms.

Picture credit(s):

From Learning efficient navigation in vortical flow fields Nature magazine article
From Learning efficient navigation in vortical flow fields Nature magazine article
From Learning efficient navigation in vortical flow fields Nature magazine article
From Learning efficient navigation in vortical flow fields Nature magazine article
From Learning efficient navigation in vortical flow fields Nature magazine article
From Engineers Teach AI to Navigate Ocean with Minimal Energy CalTech article

BEHAVIOR, an in-home robot, benchmark

Posted on November 19, 2021November 19, 2021 by Ray in Artificial Intelligence, Machine Learning, R&D measures, Robots, Scenario planning, Strategic Inflection Points, System effectiveness

As my readers probably already know, I’m a long time benchmark geek. So when I recently read an article out of Stanford (AI Experts Establish the “North Star” for Domestic Robotics Field) where a research team there developed a new robotic benchmark, I was interested. The new robotics benchmark is called BEHAVIOR which was documented in an ARXIV.org article (see: BEHAVIOR: Benchmark for Everyday Household Activities in Virtual, Interactive, and ecOlogical enviRonments). It essentially uses real world data to identify domestic work activities that any robot would need to perform in a home.

The problems with robot benchmarks

The problem with benchmarks are multi-faceted:

How realistic are the workloads used to evaluate the systems being measured?
How accurate are the metrics used to rank and judge benchmark submissions?
How costly/complex is it to run a benchmark?
How are submissions audited and are they reproducible?.
Where are benchmark results reported and are they public?

And of course robotics brings in it’s own issues that makes benchmarking more difficult:

What sensors does the robot have to understand how to complete tasks?
What manipulators does the robot have to perform the tasks required of it?
Do the robots move in the environment and if so, how do the robots move?
Does the robot perform the task in the real world on in a simulated environment.

And of course, when using a simulated environment, how realistic is it.

BEHAVIOR with iGibson (see below) seem to answer many of these concerns for an in home robot benchmarking.

What is BEHAVIOR?

First, BEHAVIOR’s home making tasks were selected from an American Time Use Survey maintained by the USA Bureau of Labor Statistics which identifies tasks Americans perform in their homes. With BEHAVIOR 1.0 there are 100 tasks ranging from building a fruit basket to cleaning a toilet, and just about everything in between. I didn’t see any cooking or mixing drinks tasks but maybe those will be added.

Second, BEHAVIOR uses a predicate logic, called BDDL (BEHAVIOR Domain Definition Language) to define initial conditions for tasks such as tables, chairs, books, etc located in the room, where objects need to be placed, and successful completion goals or what task completion should look like.

BEHAVIOR uses 15 different rooms or scenes in their benchmark, such as a kitchen, garage, study, etc. Each of the 100 tasks are performed in a specific room.

BEHAVIOR incorporates 1217 different objects in 391 categories. Once initial conditions are defined for a task, BEHAVIOR essentially randomly selects different object for the task and randomly locates them throughout the room.

In order to run the benchmark, one could conceivably create a real room, with all the objects and have them placed according to BEHAVIOR BDDL’s randomly assigned locations with a robot physically present in the room and have it perform the assigned task OR one could use a simulation engine and have the robot run the task in the simulation environment, with simulated room, objects and robot.

It appears as if BEHAVIOR could operate in any robotics simulation environment but has been currently implemented in Stanford’s open source robotics simulation engine called iGibson 2.0 (see: iGibson 2.0: Object-Centric Simulation for Robot Learning of Everyday Household Tasks and iGibson 2.0 website). iGibson uses the Bullet real time physics engine for realistic physical environment simulation.

A robot operating within iGibson is provided a 3D rendering of the room and objects in images or LIDAR sensor scans. It can then identify the objects that it needs to manipulate to perform the tasks. One can define the robot simulated sensors and manipulators in iGibnot 2.0 and it’s written in Python, is open source (GitHub Repo) and can be installed to run on (Ubuntu 16.04) Linux, Windows (10) or Mac (10.15) systems.

Finally, BEHAVIOR uses a set of metrics to determine how well a robot has performed its assigned task. Their first metric is success score defined as the fraction of goal conditions satisfied by the robot performing the task. Such as the number of dishes properly cleaned and placed in the drying rack divided by the total number of dishes for a “washing dishes” task. And their second metric is a set of efficiency metrics, like time to complete a task, sum total of object distance moved during the task, how well objects are arranged at task completion (is the toilet seat down…), etc.

Another feature of iGibson 2.0 is that it offers the ability to record a human (in VR) doing a task in its simulated environment. So if your robotic system is able to learn by example, then iGibson could be used to provide training data for an activity.

~~~~

A couple of additions to the BEHAVIOR benchmark/iGibson simulation environment that I would like to see:

There ought to be a way to construct a house/apartment where multiple rooms are arranged in a hierarchy, i.e., rooms associated with floors with connections using hallways, doors, stairs, etc. between them. This way one could conceivably have a define a set of homes/apartments (let’s say 5) that a robot would perform its tasks in.
They need a task list to drive robot activities. Assume that there’s some amount of time let’s say 8-12 hours that a robot is active and construct a series of tasks that need to be accomplished during that period.
Robots should be placed in the rooms/apartments/homes at random with random orientation and then they would have to navigate through rooms/passageways to the rooms to perform the tasks.
They need to add pet/human avatars in the rooms throughout a home. These would represent real time obstacles to task completion/navigation as well as add more tasks associated with caring for pets/humans.
They need the ability to add non-home rooms that could encompass factory floors, emergency response debris fields, grocery stores, etc. and their own unique set of tasks for each of these so that it could be used as a benchmark for more than just domestic robots.

Aside from the above additions to BEHAVIOR/iGibson 2.0, there’s the question of the organization that manages the benchmark and submissions. There needs to be a website/place to publish benchmark results for a robot AND a mechanism to audit results for accuracy to insure fair play.

Typically this would be associated with an organization responsible for publishing and auditing submissions as well as guide further development of BEHAVIOR/iGibson 2.0. BEHAVIOR 1.0 is not the end but it’s a great start at providing realistic tasks that any domestic robot would need to perform.

Benchmarks have always aided the development and assessment of new technologies. Having a in home robot benchmark like BEHAVIOR makes getting domestic robots that do what we want them to do a more likely possibility someday.

There’s a new benchmark in town and it signals the dawning of the domestic robot age.

Photo Credit(s):

The problem with smarter robots

Posted on October 13, 2021October 15, 2021 by Ray in Artificial Intelligence, Deep Learning, Distributed computing, Neural network, Reinforcement Learning, Robots

Read an article the other week about how Deepmind (at Google) is approaching the training of robotics using simulation, reinforcement learning, elastic weights, knowledge distillation and progressive learning.

It seems relatively easy to train a robot to handle some task like grabbing or walking. But doing so can take an awfully long time. If you want to try to train a robot to grab something and put it someplace. You can have it start out making some random movements of its arm, wrist and fingers (if they have such things) and then use reinforcement learning to help it improve its movements over time.

But if each grab attempt takes 10 seconds, using reinforcement learning may take 10,000 attempts before it starts to make any significant progress and perhaps another 20,000-50,000 more to get expert at it. Let’s see 60K *10 seconds is 10,000 minutes or ~170 hours. And that’s just one object pick and place. But then maybe you would like to grab different parts and maybe place them in different locations. All these combinations start adding up.

And of course doing 1000s of movements will wear out gears, motors, mechanisms etc. If only this could all be done in electronic simulations. Then assuming the simulations are accurate enough the whole thing could be done in a matter of hours without wearing anything out. Enter robot simulators such as NVIDIA Isaac Sim, OpenAI RoboSchool/PyBullet

But the problems with simulation are …

Simulations are getting more accurate but at some point their accuracy defeats its purpose because the real world is always noisy, windy and not as deterministic as any simulation. One researcher said you could conceivable have a two armed robot be trained to throw all of a cell phones components up into the air and they will all land in their proper places, proper orientations. But in the real world this could never actually happen, or if it did, it could only happen once.

Hurricane Ike - 2008/09/12 - 21:26 UTC by CoreBurn (cc) (from Flickr) — Hurricane Ike – 2008/09/12 – 21:26 UTC by CoreBurn (cc) (from Flickr)

Weather researchers have been dealing with this problem in spades for a long time. There appears to be a fundamental limit to how far in advance we can predict weather and it’s due to the accuracy with which sensors operate and the complexity of feedback loops between the atmosphere, oceans, landforms, etc. So at a fundamental level, simulations can never be completely accurate. But they can be better.

Today’s weather simulations we see on TV/radio use models that average a number of distinct simulations, where sensor information has been slightly and randomly modified. Something similar could be done for robotic simulation environments, to make them more realistic.

But there are other problems with training robots to do lots of tasks.

Forget me not…

AI deep learning and reinforcement learning algorithms are great when charged with learning a single task, but having it learn multiple tasks is much harder to do. Because each task requires its own deep neural network (DNN) and if you train a DNN on one task and then try to train in on a another task, it forgets all the learnings from the original task. Researchers call this catastrophic forgetting.

One way researchers have dealt with this problem is to effectively freeze certain DNN nodes from having their weights changed during subsequent training rounds and leave others flexible or changeable. One can see this when one trains an image recognition DNN to classify different objects by importing a well trained object classifier and freezing all of it’s layers except the top one or two and then training these layers to classify new objects.

This works well but you have effectively changed the DNN to forget the original object classification training and replaced it with a new one. One solution to this approach is to have multiple passes of training, after each one, certain nodes and connections (of importance to that particular task) are selectively frozen. This works well for a limited number of different tasks but over time all nodes become frozen which means that no more learning can take place. Researchers call this approach to the catastrophic forgetting problem elastic weights.

One way to get around the all nodes frozen issue in elastic weights is to have multiple NNs. One which is trained on a specific task and whose weights are frozen and then a DNN that exists alongside this one with it’s own initialized set of weights. But which uses the original DNN as part of the new DNN inputs. This effectively includes and incorporates all the previously learned knowledge into the new, combined DNN. This is called Progressive Neural Networks.

In this fashion one progressive DNN can be sequentially trained on any number of tasks each of which ends up providing input to all subsequent task training activity. Such a progressive network never forgets and can use previously learned knowledge on new tasks.

The problem with progressive DNNs is a proliferation of DNN column. one for each trained task. However there are a couple of approaches to shrinking an ensemble of DNN like progressive training creates into one that is simpler and just as effective. One way is to perturb weights in DNN nodes and see how model prediction accuracy is impacted on all its tasks. If accuracy isn’t impacted that much, then that node and all its connections could be deleted from the model with minimal impact on model accuracy.

Another approach is to use one DNN to train another. Sort of like a teacher-student. This is called Knowledge Distilation. Where one DNN is a large network (the teacher) and a smaller (student) network that is trained to mimic the teacher DNN to achieve similar accuracy. This is done by training the smaller student network to match the predictions/classifications of the larger one.

Google researchers have shown that knowledge distillation works best when the gap in the sizes of the two networks (teacher and student) aren’t that large. They have solved this problem by introducing an intermediate step (called teachers assistent). They train this TA first then use the TA to train the student.

In the above graphic, when using a teacher of size 110 and a student of size 8 the resulting accuracy suffers but if one uses an intermediate DNN, with a size 20 the resultant accuracy of the student is much closer to the teacher..

~~~~

So with realistic simulation we can train a robot to do any specific task, all using only compute resources. And using progressive DNN training, a robot could conceivably be trained to do any number of tasks. And with appropriate knowledge distillation one can reduce the DNN from progressive training into something much smaller (<10%) than the original DNN.

Want a personal robot that can clean up around your place, do the wash, cook your food and do anything else needed. You know what to do.

from Deepmind Progressive Neural Networks article
from Deepmind Progressive Neural Networks article
from Deepmind Improved Knowledge Distillation via Teacher Assistant: Bridging the Gap Between Student and Teacher

An Open Source Powered Leg

Posted on October 15, 2020October 15, 2020 by Ray in Crowdsourcing, Executive leadership, Robots, Strategic Inflection Points, Visionary leadershp

I read an article a couple of weeks back about an Open Source Bionic Leg, which was reporting on research began as a NSF funded project at the University of Michigan (UoM), with collaboration from Northwestern, University of Texas at Dallas and CMU. UofM has a website that provides everything you need to build your own open source leg (OSL) leg at OpenSourceLeg.com.

The challenge in human prosthetics these days is that all research is done in silos. Much of it is proprietary and only available within corporations but even university research has been hampered by the lack of a standard platform that could be used to develop new components and ideas on.

The real difficulty is defining the control logic (code). The OSL project is intended to resolve this lack of a platform by providing everything a researcher (hobbyist, or amputee) needs to build their own, at home or in the lab.

The website includes a parts lists and STEP files as well as an estimated cost ($28.5K) to build your own powered prosthetic leg. They also have a Excel spread sheet with all the parts listed, including part numbers and links to where they can be ordered (McMaster-Carr, SolidWorks, & Dephy)

They also show how to build a leg with a short youtube video of how to assemble the whole leg as well as details for each subassembly with separate how-to videos for each.

The open source leg makes use of code from FlexSEA (Flexible Scaleable Electronics Architecture) and Dephy. FlexSea was originally developed by Jean-Francois (JF) Duval while he was at MIT for his doctoral thesis. He has since joined Dephy a robotics design firm. The open source leg project uses FlexSea/Dephy code for its servo control mechanisms.

There is a GitHub Python, MatLab and C control library repo with all the code. The open source leg website also includes instructions, scripts and an image file which can be used to build your own RaspberryPi (4) controller for the leg.

The two (ankle and knee) servos are USB connected to the RPi. There are also other sensors such as the joint (servo-motor) encoders and a six axis load sensor I2C connected to the RPi. Each servo has its own 950mAh battery.

On the OSL website’s control page one can see these servos in action (with short youtube segments). They also provide instructions on how to use the open source control library to take the servo mechanisms through their paces.

Although on the OSL website’s control page I didn’t see anything which put the whole leg together to make use of it in a real world application. They did show on the Data page a youtube video with the OSL attached to a person and being used to walk up and down stairs, inclines and walking across a floor.

~~~~

Seeing as how the OSL website included STEP and PDF files for all the (machined) parts which represent $15.6K of the $28.5K, if one really wanted to do this on the cheap, one could just 3D print these parts in plastic. It would obviously not suffice mechanically for real use, but it could provide a platform for testing and developing control logic. At some point one could upgrade some or all of the plastic 3D printed parts to something more durable for use in human trials.

Another option is to purchase multiple sets of parts. The OSL website also showed price estimates for purchasing two sets of ankle and knee parts. But I’d imagine if one was so inclined, a number of researchers (hobbyists or amputees) could get together and order multiple sets of parts for reduced prices.

It’s also possible, with a lot of work, that the open source leg could be redesigned to support an open source arm-hand mechanism. This is where having 3D printed plastic parts could be extremely useful in helping to redesign the leg into an arm-hand.

Photo Credit(s):

Designing living machines

Posted on January 21, 2020January 21, 2020 by Ray in Robots, Strategic Inflection Points, Visionary leadershp

Read an article the other day in PNAS (A scaleable pipeline for designing reconfigurable organisms) which described an approach to designing and constructing living organisms to perform real world actions. One could call these living machines or biological robots (biobots). There’s an appendix to the paper which provides supplementary information.

The intention of the pipeline is to expand the modern design space from construction materials, chemical process, electronics and mechanical devices to the domain of living things. Thereby create objects that perform functions for mankind, that are operate well with living things, are more resilient and have a benign impact on the environment.

The Biobot design pipeline stage 1

The design process begins using an evolutionary algorithm which takes as input an organism goal or action (i.e., moving so many body lengths for minutes) and the cell types to be used in constructing the organism and randomly generates potential organism designs.

In the current process there are two cell types (red and cyan) one is passive (scaffolding) and the other is active and provides movement power.

Designing and manufacturing reconfigurable organisms. A behavioral goal (e.g., maximize displacement), along with structural building blocks [here, contractile (red) and passive (cyan) voxels], are supplied to an evolutionary algorithm. The algorithm evolves an initially random population and returns the best design that was found.

Once a set of randomized designs using the two cell types have been determined, each undergoes a computerized simulation (in a physics engine that simulates gravity and liquid environment) to see how well the possible organism perform.

All designs are ranked in how well the achieve they performe and the best of these are used as seeds for another round of evolutionary design exploration. This uses these good designs and randomly changes some aspect of them to create another set of organism designs to test out.

Designing reconfigurable organisms. For a given goal, 100 independent evolutionary trials were conducted in silico (*A–C*). Each colored line represents the velocity of the fastest-moving design within its clade. Each genome (D) dictates anatomy and behavior by determining where and how voxels are combined, and whether they are passive (cyan) or contractile (red; E).

At some point, the evolutionary design exploration-simulation process stops when it has determined a set of workable organism designs which can achieve the goals set out for them.

The workable organism designs are then subjected to two rounds of filtering. The first filter is tests the designs for resilience to noise. This is done by putting the designs through another set of computer simulations that include noise. Some of the workable organism designs will still perform well in noisy environment and others will not.

The organism designs that perform well in noisy environments, are deemed resilient and are then fed into the next filtering stage of the pipeline.

The resilient workable organism designs are then filtered by whether they can be constructed with the current processes. Even though all the organisms are made up of the two cell types, not all of them can be realized given the current process.

After this point we have a set of designs that

a) Achieve the requested goal in simulation;

b) Perform well in noisy environment simulations; and

c) Can be constructed with the current processes and cell types.

The Biobot design pipeline stage 2

The next steps in the organism design pipeline all take place in the real world. The set of selected designs are constructed/manufactured and set into a petri dish to see how well they perform in real life .

The two cell types used in the current process are derived from the Xenobus (frog) embryos and consist of stem cells (passive) and heart muscle cells (active). The building of organism designs is done through layering of stem cells and then surgically or using cauterization to remove cells not part of the design. After the stem cells are placed then heart muscle cells can be layered on in a similar fashion.

There’s no control mechanism whatsoever other than the surfaces designed for the organism. Xenobus heart muscle cells automatically contract and when combined with other heart muscle cells, all the muscle cells contract in waves.

The design of the organism is such that the contractions propel the organism to move and explore the environment (the intended goal). The designed organisms are placed in a Petri dish and then observed over a period of time to see how well the perform the desired action.

Successful designs can then be seeded back into the start of the evolutionary exploration to generate even better designs. Simulations can also be adjusted with feedback from the real world behavior of the designed organisms. At some point the best designed organisms can be used in the real world.

Why biobots

Although the example had a goal to explore its environment. other goals could be readily used as well. Some of the ones mentioned in the paper are manipulating and gathering together some compounds/elements/particles in a volume. These could be used to clear a viscous solution of some impurities.

Another organism could be designed to have a pouch within which they can store and transport objects (or drugs).

Designed organisms could operate together in a solution with some organisms performing one function while others perform other functions. Organism designs could eventually be combined into one organism that performs more functions.

One nice aspect of biobots is that they can be squeezed, perturbed in many ways including being cut and they repair themselves and continue to operate.

One could design an organism to reproduce in a suitable environment or even designed to age and die after a specific time period.

The end goal seems to be to create living machines that can be used to operate in the environment or a body. Biobots could be designed to clear away plaque from a blood vessels or to dismantle malignant tumors. They could conceivably be constructed from a person’s own cells to operate for days-weeks-months in a body and then dissolve to be reused/disposed of just like any other biological material in a human.

So now we can design biobots.

~~~~

Just in case you wanted to try your hand at designing living organisms yourself. The researchers have all open sourced thei code for the evolutionary design exploration, computerized simulation, noise and build ability filtering which is available on github. The actual manufacturing/construction of the designed organism would need to be done in a lab.

Photo credit(s): All images are from the paper and its supplementary appendix.