
Task Agnostic and Ego-Centric Reinforcement Learning in Autonomous Driving









Introduction
-
Efficient and Effective Exploration in RL for Autonomous Driving
-
Most existing algorithms explore in raw actions, and result in inconsistent actions ,which are often data inefficient.
-
For human drivers, they tend to perform temporally extended actions to reflect a specific driving intention, such as overtaking or following.
-
Our motivetaion is to enable vehicles to behave like human drivers, who efficiently exploring the entire skill space in a structured manner.

Related Work
Most existing methods utilize expert demonstrations to solve the problem, which are usually:
-
Expensive and labor-intense
-
Unbalanced
-
Hard to transfer to new tasks
Motion Primitive methods, which are labor-efficient and task agnostic, can efficiently address the temporally-extended problems.

Based on their generation principles, motion primitive methods can be classified into serveral types:
-
Geometric-based
-
e.g. Dubbins, Redsshep, 5-order polinomial curves
-
Simplest and most efficient path
-
Limited by the representation of predifiend shapes
-
-
Optimization-based
-
e.g. Netown,
-
Various motion modes by adjusting the objective function
-
Susceptible to getting trapped in local minimum
-
-
Parametic-based
-
e.g. Bezier, Clothoid Curves
-
Smooth in shape, suitable for kinematic modeling
-
Difficult to represent in complex conditions
-
Taking into consideration the aforementioned analysis, our findings lead us to the conclusion:
-
Utilizing a variety of motion primitive methods can effectively represent a wider range of trajectory modes:
Solution: Task-Agnostic and Ego-Centric Library
-
There is no unified parameter representation of distribution pattern among different methods:
Solution: Skill Dsitilling
Method-P1: TaEc Library
-
A general-purpose to cover diverse motion skills and can be reused across tasks
-
Mutiple modes, intenstions and strategies
-
Spatial-temporal Planning

TaEc motion skill library generation procedure:
-
Path Sampling - exhaustively explore in spatial dimensions
-
Speed Profile sample - exhaustively explore in temporal dimentions
-
Raw Trajectory Generation - spatial-temporal combination
-
Slicing - time based sliding window mechanism, slice long-horion trajectories into skills
-
Filtering - eliminate redundant skills, ensuiring a balanced distribution of trajectory library
After the completion of TaEc Library construction, these trajectories need to undergo skill distlliation, laying the groundwork of susequent exploration.
Method-P2: Skill Distillation
The TaEc Libray is supposed to be embedded into latent skill space, under the effecto of motion encoder qm, and skill decoder qd.

-
The motion encoder qm, takes motion skill as input and output the parameters for Gaussian distribution of the latent skill z.
-
One sample of the latent skill distribution represents one abstract behavior.
-
The skill decoder qd, reconstructs the motion skill from the sampling results of the latent skill distribution.
-
To make sure the decoded trajectory is kinematically solvable for a vehicle, we propose to use a Akarman model to convert raw control sequences into trajectories.
In light of this reconstruction, the latent skill space can represent diverse and flexible task-agnostic and ego-centric motion skills. The skill decoder will be fixed and reused to generate future behaviros from a sample of the latent skill space.
Method-p3: RL with Exploration in Skill space
In RL phase, instead of directly learning a policy over raw actions, we learn a policy that outputs latent skill variables, which is then decoded to motion skill by decoder.

Result and demonstration


More Details can be found in the slide link.