TaEc-RL for Autonomous Driving | 20231005个人简介

Task Agnostic and Ego-Centric Reinforcement Learning in Autonomous Driving

Introduction

Efficient and Effective Exploration in RL for Autonomous Driving
Most existing algorithms explore in raw actions, and result in inconsistent actions ,which are often data inefficient.
For human drivers, they tend to perform temporally extended actions to reflect a specific driving intention, such as overtaking or following.
Our motivetaion is to enable vehicles to behave like human drivers, who efficiently exploring the entire skill space in a structured manner.

Related Work

Most existing methods utilize expert demonstrations to solve the problem, which are usually:

Expensive and labor-intense
Unbalanced
Hard to transfer to new tasks

Motion Primitive methods, which are labor-efficient and task agnostic, can efficiently address the temporally-extended problems.

Based on their generation principles, motion primitive methods can be classified into serveral types:

Geometric-based
- e.g. Dubbins, Redsshep, 5-order polinomial curves
- Simplest and most efficient path
- Limited by the representation of predifiend shapes
Optimization-based
- e.g. Netown,
- Various motion modes by adjusting the objective function
- Susceptible to getting trapped in local minimum
Parametic-based
- e.g. Bezier, Clothoid Curves
- Smooth in shape, suitable for kinematic modeling
- Difficult to represent in complex conditions

Taking into consideration the aforementioned analysis, our findings lead us to the conclusion:

Utilizing a variety of motion primitive methods can effectively represent a wider range of trajectory modes:

Solution: Task-Agnostic and Ego-Centric Library

There is no unified parameter representation of distribution pattern among different methods:

Solution: Skill Dsitilling

Method-P1: TaEc Library

A general-purpose to cover diverse motion skills and can be reused across tasks
Mutiple modes, intenstions and strategies
Spatial-temporal Planning

TaEc motion skill library generation procedure:

Path Sampling - exhaustively explore in spatial dimensions
Speed Profile sample - exhaustively explore in temporal dimentions
Raw Trajectory Generation - spatial-temporal combination
Slicing - time based sliding window mechanism, slice long-horion trajectories into skills
Filtering - eliminate redundant skills, ensuiring a balanced distribution of trajectory library

After the completion of TaEc Library construction, these trajectories need to undergo skill distlliation, laying the groundwork of susequent exploration.

Method-P2: Skill Distillation

The TaEc Libray is supposed to be embedded into latent skill space, under the effecto of motion encoder qm, and skill decoder qd.

The motion encoder qm, takes motion skill as input and output the parameters for Gaussian distribution of the latent skill z.
One sample of the latent skill distribution represents one abstract behavior.
The skill decoder qd, reconstructs the motion skill from the sampling results of the latent skill distribution.
To make sure the decoded trajectory is kinematically solvable for a vehicle, we propose to use a Akarman model to convert raw control sequences into trajectories.

In light of this reconstruction, the latent skill space can represent diverse and flexible task-agnostic and ego-centric motion skills. The skill decoder will be fixed and reused to generate future behaviros from a sample of the latent skill space.

Method-p3: RL with Exploration in Skill space

In RL phase, instead of directly learning a policy over raw actions, we learn a policy that outputs latent skill variables, which is then decoded to motion skill by decoder.

Result and demonstration

More Details can be found in the slide link.

return

Slide

Task Agnostic and Ego-Centric Reinforcement Learning in Autonomous Driving

Introduction

Related Work

Method-P1: TaEc Library

Method-P2: Skill Distillation

I'd love to hear from you.