top of page

Task Agnostic and Ego-Centric Reinforcement Learning in Autonomous Driving

Introduction

  • Efficient and Effective Exploration in RL for Autonomous Driving

  • Most existing algorithms explore in raw actions, and result in inconsistent actions ,which are often data inefficient.

  • For human drivers, they tend to perform temporally extended actions to reflect a specific driving intention, such as overtaking or following.

  • Our motivetaion is to enable vehicles to behave like human drivers, who efficiently exploring the entire skill space in a structured manner.

Related Work

Most existing methods utilize expert demonstrations to solve the problem, which are usually:

  • Expensive and labor-intense

  • Unbalanced

  • Hard to transfer to new tasks

Motion Primitive methods, which are labor-efficient and task agnostic, can efficiently address the temporally-extended problems.

Based on their generation principles, motion primitive methods can be classified into serveral types:

  • Geometric-based

    • e.g. Dubbins, Redsshep, 5-order polinomial curves​

    • Simplest and most efficient path

    • Limited by the representation of predifiend shapes

  • Optimization-based

    • e.g. Netown, ​

    • Various motion modes by adjusting the objective function

    • Susceptible to getting trapped in local minimum

  • Parametic-based

    • e.g. Bezier, Clothoid Curves​

    • Smooth in shape, suitable for kinematic modeling

    • Difficult to represent in complex conditions

Taking into consideration the aforementioned analysis, our findings lead us to the conclusion:

  • Utilizing a variety of motion primitive methods can effectively represent a wider range of trajectory modes:

Solution: Task-Agnostic and Ego-Centric Library

  • There is no unified parameter representation of distribution pattern among different methods:

Solution: Skill Dsitilling

Method-P1: TaEc Library

  • A general-purpose to cover diverse motion skills and can be reused across tasks

  • ​Mutiple modes, intenstions and strategies

  • Spatial-temporal Planning

TaEc motion skill library generation procedure:

  • Path Samplingexhaustively explore in spatial dimensions

  • Speed Profile sample - exhaustively explore in temporal dimentions

  • Raw Trajectory Generation - spatial-temporal combination

  • Slicing - time based sliding window mechanism, slice long-horion trajectories into skills

  • Filtering - eliminate redundant skills, ensuiring a balanced distribution of trajectory library

After the completion of TaEc Library construction, these trajectories need to undergo skill distlliation, laying the groundwork of susequent exploration.​

Method-P2: Skill Distillation

The TaEc Libray is supposed to be embedded into latent skill space, under the effecto of motion encoder qm, and skill decoder qd.

  • The motion encoder qm, takes motion skill as input and output the parameters for Gaussian distribution of the latent skill z.

  • One sample of the latent skill distribution represents one abstract behavior.

  • The skill decoder qd, reconstructs the motion skill from the sampling results of the latent skill distribution.

  • To make sure the decoded trajectory is kinematically solvable for a vehicle, we propose to use a Akarman model to convert raw control sequences into trajectories.

In light of this reconstruction, the latent skill space can represent diverse and flexible task-agnostic and ego-centric motion skills. The skill decoder will be fixed and reused to generate future behaviros from a sample of the latent skill space.

Method-p3: RL with Exploration in Skill space

In RL phase, instead of directly learning a policy over raw actions, we learn a policy that outputs latent skill variables, which is then decoded to motion skill by decoder.

Result and demonstration

More Details can be found in the slide link.

bottom of page