Roboticists are developing automated robots that can learn new tasks solely by observing humans. Source: MITRoboticists are developing automated robots that can learn new tasks solely by observing humans. Source: MITResearchers from MIT created a system that teaches robots complicated tasks that were previously too difficult to learn, like setting a dinner table.

The system, called Planning with Uncertain Specifications (PUnS), gives robots a human-like way to plan and weigh ambiguous requirements needed to reach a goal. The robot's decision is based on a given belief about certain specifications and that belief is used to dish out rewards and penalties.

To test the system, the team attempted to teach a robot how to set a table. They compiled a dataset with information about how eight dinner table objects should be arranged on a table. Before arranging those items, a robot arm observed randomly selected human demonstrations of setting a table. Following those demonstrations, the arm was tasked with automatically setting a table in a specific configuration. Designers selected one of four criteria to preset before training and testing, and each criteria had a tradeoff between flexibility and risk aversion, and the criteria choice was dependent on the task.

The robot had to weigh all the possible placements, even when items were purposely removed, stacked or hidden. The team’s robot made no mistakes over several real-world experiments and only a handful of mistakes over tens of thousands of simulated test runs.

The system is built on linear temporal logic (LTL), expressive language that enables robotic reasoning about current and future outcomes. The robot observed 30 human demonstrations for setting a table and yielded a probability distribution of over 25 LTL formulas. Each formula was encoded with a slightly different preference, with the probability distribution becoming its belief.

The algorithm was created to convert a robot’s belief into an equivalent reinforcement learning problem. The model gives the robot a reward or penalty for an action it takes based on the specification it is following. During the testing simulation of table arranging, the robot made six mistakes out of 20,000 tries and showed behavior similar to how a human would perform the same task.

Next, the team hopes to modify the system to help robots change their behavior based on verbal discussions and corrections from a user’s assessment of the robot’s performance systems.

A paper on this research was published in Advances in Neural Information Processing.