Robot motion control with Learning from Demonstration (LfD)

Learning from Demonstration

Every mobile robot requires motion control to efficiently perform complex tasks in a highly dynamic environment. Today, we use different ways to determine and control these robot behaviors, like voice control, keypad control, gesture control, etc. All these methods rely on robust motion control algorithms for the successful operation of mobile robots.

Motion control algorithms determine appropriate action and execute physical actions through actuation mechanisms based on observation of the world through sensors. However, sensors can sometimes be noisy and misleading, making it difficult and complicated to accurately determine mobile robots’ trajectories and sophisticated motion behaviors with traditional control approaches.

Thus, developing motion control algorithms for mobile robots poses a significant challenge, even for simple motion behaviors. As behaviors become more complex, the generation of appropriate control algorithms only becomes more challenging. Furthermore, developing motion behaviors or motion policy through traditional means proved tedious and demanded a high level of expertise.

Here are some of the critical challenges in using traditional methods to determine robot motion control.

The state-action mapping represented by a motion policy is typically complex to develop. One reason for this complexity is that the target observation-action mapping is unknown. What is known is the desired robot motion behavior, and this behavior must somehow be represented through an unknown observation-action mapping.

How accurately the policy derivation techniques then reproduce the mapping is a separate and additional challenge. A second reason for this complexity is the complications of motion policy execution in real-world environments, primarily because:

  1. The world is observed through sensors, which are typically noisy and may provide conflicting or misleading information.
  2. Models of world dynamics approximate the actual dynamics and are often further simplified due to computational or memory constraints. These models thus may inaccurately predict motion effects.
  3. Actions are motions executed with real hardware, which depends on many physical considerations such as calibration accuracy and necessarily performs actions with some level of imprecision.

All of these challenges contribute to the inherent uncertainty of policy execution in the real world. The net result is a difference between the expected and actual policy execution.

Motion control through demonstration (LfD)

One approach that mitigates many of these challenges is to develop motion control algorithms with a policy development technique called the Learning from Demonstration (LfD). In LfD, the control algorithm learns the desired robot behavior from examples, or demonstrations, provided by a teacher.

During the teacher’s demonstration, the sequences of state-action pairs are recorded. The algorithms then utilize this dataset of examples to derive a policy or map from world states to robot actions, which reproduces the demonstrated behavior. The learned policy constitutes a control algorithm for the behavior, and the robot uses this policy to select an action based on the observed world state.

Here, a behavior is represented as pairs of states and actions; more specifically, the states encountered and actions executed by a teacher during the motion behavior demonstration. The control algorithm is generated from the robot learning a policy, or mapping from world observations to robot actions, that can reproduce the demonstrated motion behavior.

Robot executions with any policy, learned from demonstration, may exhibit poor performance, especially when encountering areas of the state-space unseen during the demonstration. However, the execution experience of this sort can be used by a teacher to correct and update a policy and improve performance and robustness.

The demonstration has the attractive feature of being an intuitive medium for human communication and focusing the datasets to areas of the state-space actually encountered during behavior execution. Since it does not require expert knowledge of the system dynamics, the demonstration also opens policy development to non-robotics-experts.

LfD has many attractive points for both learners and teachers. The application LfD to motion control has the following advantages:

  • Implicit behavior to mapping translation: By demonstrating the desired motion behavior and recording the encountered states and actions, translating a behavior into a representative state-action mapping is immediate and implicit. This translation, therefore, does not need to be explicitly identified and defined by the policy developer.
  • Robustness under real-world uncertainty: The real world’s uncertainty means that multiple demonstrations of the same behavior will not execute identically. Therefore, generalization over examples produces a policy that does not depend on a strictly deterministic world and will perform more robustly under real-world uncertainty.
  • Focused policies: Demonstration has the practical feature of focusing the dataset of examples to areas of the state-action space actually encountered during behavior execution. This is particularly useful in continuous-valued action domains, with an infinite number of state-action combinations.
  • No need to recreate state: This is useful if the demonstration is required in places that are dangerous (e.g., lead to a collision) or difficult to access (e.g., in the middle of a motion trajectory).
  • Not limited by the demonstrator: Corrections are not limited to the demonstration teacher’s execution abilities, who may be suboptimal.
  • Unconstrained by correspondence: Corrections are not constrained by physical differences between the teacher and learner.
  • LfD formulations typically do not require expert knowledge of the domain dynamics, removing performance brittleness resulting from model simplifications. The relaxation of the specialist knowledge requirement also opens policy development to non-robotics-experts, satisfying a need that we expect will increase as robots become more common within general society.
  • Furthermore, the demonstration has the attractive feature of being an intuitive medium for communication from humans, who already use the demonstration to teach other humans.

LfD has enabled successful policy development for a variety of robot platforms and applications. This approach is not without its limitations, however. Common sources of LfD limitations include:

  1. Suboptimal or ambiguous teacher demonstrations.
  2. Uncovered areas of the state space, absent from the demonstration dataset.
  3. Poor translation from teacher to learner due to differences in sensing or actuation.