Trajectory Planning with Adaptive Probabilistic Models · Trajectory Planning with Adaptive...

1
Trajectory Planning with Adaptive Probabilistic Models Marin Kobilarov ([email protected]) Johns Hopkins University I. PROBLEM FORMULATION Consider a control system with state x ∈X , inputs u ∈U and dynamics described by an ordinary differential equation subject to inequality constraints: ˙ x(t)= f (x(t),u(t)), F (x(t)) 0. (1) A complete trajectory of the system is written as π : [0,T ] X×U for some finite time T> 0 so that π(t)=(x(t),u(t)) encodes both the state and controls at time t. The goal is to compute the optimal control signal u * (t) driving the system from its initial state x 0 ∈X to a given goal region X g ⊂X while minimizing a given performance measure. Let P denote the space of all feasible trajectories satisfying the dynamics, con- straints, and boundary conditions. The objective is to compute π * = argmin π∈P J (π), where J (π)= Z τ (π) 0 C(π(t))dt, where τ (π) gives the trajectory time duration and C is a given cost typically encoding time and control effort, e.g. C(x(t),u(t)) = 1 + λku(t)k 2 with a chosen weight λ 0. II. DYNAMICAL SYSTEM REPRESENTATIONS x0 x1 x2 x3 xN Xf parametric trajectory z =(x1,x2,x3, ..., xN) ∈Z = X N local connections between xi and xi+1 stabilization closed-form basis funct. pre-optimized primitives primitives and 1) 2) 3a) 3b) (un)stable manifolds start goal set different types of numerical optimal control 4) discrete optimal control (DGC) Hex-rotor aerial vehicle Spacecraft 5-dof manipulator Target Multi-body system reduction shape-space polynomials (DGC) small obstacle maneuver trim primitive UAV maneuver automaton (DGC) strongly stable manifold stable manifold (GAIO) free motion maneuver a) b) xi xi+1 xi xi+1 xi+1 xi xi xi+1 thruster plume Fig. 1. a) various methods for local trajectory generation: starting with simple but suboptimal stabilization, including methods exploiting dynamical structures, to most general but least efficient numerical optimal control; b) examples of such techniques that we have constructed in the context of autonomous vehicle planning and control. Each trajectory is parametrized using N number of “way- points” and a mapping ϕ : Z→P reconstructing the continuous trajectory π, i.e. z =(x 1 , ..., x N ) ∈Z = X N π(t)= ϕ(z,t) The mapping implicitly encodes a local dynamically feasible connection method between states x i and x i+1 . Thus, a given parameter z corresponds to a unique trajectory composed of local connections between (x 0 ,x 1 ), (x 1 ,x 2 ), ..., and (x N , X g ). III. PROBABILISTIC TRAJECTORY OPTIMIZATION Our approach employs an importance density q(Z ) over the space of parametrized trajectories and adapts the density online until its mass becomes concentrated around the approximately optimal trajectory z * = arg min J (z). This is accomplished by computing the probabilities: P(J (Z ) γ ): cost of a trajectory is less than γ, P(F (Z ) 0) : trajectory is feasible, iteratively while automatically lowering the cost γ until conver- gence. Z p(Z; v1) low cost regions J(Z) 1 p(Z; v2) p(Z; v5) p(Z; v8) near optimum z * arg minz J(z) z * sampling density iteration: #1 #2 #5 #8 solution adaptation: AFI UAV Fig. 2. Randomized trajectory optimization using an adaptive distribution that automatically focuses in high-performance regions. The task is to compute a time-optimal obstacle-free trajectory for a helicopter modeled as a non-trivial underactuated systems in 3-D. A. Optimization through Density Estimation The first approach is to compute q(z) directly through the minimization min q KL(q * || q), where q * (z)= I {J (z)γF (z)0} p(z) P(J (Z ) γ ) · P(F (Z ) 0) , (2) where p(Z ) is some base measure on Z that for in- stance can incorporate prior knowledge about desirable tra- jectories. In computational convenience and efficiency we assume a parametric distribution q(z) = p(z; v) where v V is the parameter. Problem (2) is solved ap- proximately by finding the optimal parameter v * according to ˆ v * = argmax v∈V 1 N N i=1 I {J (Zi)γF (Zi)0} log p(Z i ,v), where Z 1 , ..., Z n are i.i.d. samples from a base measure p(·,v 0 ). B. Optimization through Function Approximation E[J(x)] Model of Trajectory Cost J(x) Model of Constraint Function F(x) Constraint Satisfaction Probability Importance Sampling Density E[F(x)] P(F(x) 0) q(x) x2 x1 x1 x1 x2 x2 x2 x1 x2 xstart xgoal obstacle computed near-optimal path x1 sampled paths iterate Fig. 3. A simple optimal planning problem solved using Gaussian Process models of the cost J (x) and constraints F (x). The plots show the evolved models after 20 iterations. Remarkably, the importance density q clearly indicates that the optimal region to select x are the states around the border of the obstacle that are reachable from both start and goal. The second approach is to construct probabilistic models of the functions J (z) and F (z) in order to predict the performance of unobserved trajectories. In this case the probability density will be artificially constructed according to q(z) P(J (z) ) · P(F (z) 0), (3) where here J and F are regarded as random functions for each fixed parameter z. We will assume that the processes J (z) and F (z) have normal marginal distributions, i.e. they will be mod- eled as Gaussian Processes (GP). This is particularly convenient for constructing q in (3) through the simple expressions P(J (z) γ )=Φ γ - E[J (z)] p V[J (z)] ! , P(F (z) 0)=Φ E[F (z)] p V[F (z)] ! , where Φ(·) is the standard unit-normal CDF, and E[·] and V[·] denote expectation and variance. A simple example of a preliminary study is shown on Figure 3.

Transcript of Trajectory Planning with Adaptive Probabilistic Models · Trajectory Planning with Adaptive...

Page 1: Trajectory Planning with Adaptive Probabilistic Models · Trajectory Planning with Adaptive Probabilistic Models Marin Kobilarov (marin@jhu.edu) Johns Hopkins University I. PROBLEM

Trajectory Planning with Adaptive Probabilistic ModelsMarin Kobilarov ([email protected])

Johns Hopkins University

I. PROBLEM FORMULATION

Consider a control system with state x ∈ X , inputs u ∈ U anddynamics described by an ordinary differential equation subjectto inequality constraints:

x(t) = f(x(t), u(t)), F (x(t)) ≥ 0. (1)

A complete trajectory of the system is written as π : [0, T ] →X × U for some finite time T > 0 so that π(t) = (x(t), u(t))encodes both the state and controls at time t. The goal is tocompute the optimal control signal u∗(t) driving the systemfrom its initial state x0 ∈ X to a given goal region Xg ⊂ Xwhile minimizing a given performance measure. Let P denotethe space of all feasible trajectories satisfying the dynamics, con-straints, and boundary conditions. The objective is to compute

π∗ = argminπ∈P

J(π), where J(π) =

∫ τ(π)

0

C(π(t))dt,

where τ(π) gives the trajectory time duration and C is agiven cost typically encoding time and control effort, e.g.C(x(t), u(t)) = 1 + λ‖u(t)‖2 with a chosen weight λ ≥ 0.

II. DYNAMICAL SYSTEM REPRESENTATIONS

x0

x1 x2x3

xN

Xf

parametric trajectory z = (x1, x2, x3, ..., xN ) ∈ Z = XN

local connectionsbetween xi and xi+1

stabilization

closed-form basis funct.

pre-optimized primitives

primitives and

1)

2)

3a)

3b)

(un)stable manifolds

start

goal set

different types of

numerical optimal control

4)

discrete optimal control (DGC)

Hex-rotoraerial vehicle Spacecraft

5-dof manipulator

Target

Multi-body system reductionshape-space polynomials (DGC)

small

obsta

cle maneuver

trim primitive

UAV

maneuver automaton (DGC)

strongly stable manifold

stable manifold (GAIO)

free moti

on

maneuver

a) b)

xi

xi+1

xi

xi+1xi+1

xi

xi

xi+1

thruster plume

Fig. 1. a) various methods for local trajectory generation: starting with simplebut suboptimal stabilization, including methods exploiting dynamical structures,to most general but least efficient numerical optimal control; b) examples ofsuch techniques that we have constructed in the context of autonomous vehicleplanning and control.

Each trajectory is parametrized using N number of “way-points” and a mapping ϕ : Z → P reconstructing the continuoustrajectory π, i.e.

z = (x1, ..., xN ) ∈ Z = XN ⇔ π(t) = ϕ(z, t)

The mapping implicitly encodes a local dynamically feasibleconnection method between states xi and xi+1. Thus, a givenparameter z corresponds to a unique trajectory composed of localconnections between (x0, x1), (x1, x2), ..., and (xN ,Xg).

III. PROBABILISTIC TRAJECTORY OPTIMIZATION

Our approach employs an importance density q(Z) over thespace of parametrized trajectories and adapts the density onlineuntil its mass becomes concentrated around the approximatelyoptimal trajectory z∗ = arg minJ(z). This is accomplished bycomputing the probabilities:

P(J(Z) ≤ γ) : cost of a trajectory is less than γ,P(F (Z) ≥ 0) : trajectory is feasible,

iteratively while automatically lowering the cost γ until conver-gence.

Z

p(Z; v1)

low cost regions J(Z) < γ1

p(Z; v2) p(Z; v5) p(Z; v8)

near optimum z∗ ≈ argminz J(z)

z∗

sampling density

iteration: #1 #2 #5 #8 solution

adaptation:

AFI UAV

Fig. 2. Randomized trajectory optimization using an adaptive distribution thatautomatically focuses in high-performance regions. The task is to compute atime-optimal obstacle-free trajectory for a helicopter modeled as a non-trivialunderactuated systems in 3-D.

A. Optimization through Density EstimationThe first approach is to compute q(z) directly through the

minimization minq KL(q∗ || q), where

q∗(z) =I{J(z)≤γ∧F (z)≥0}p(z)

P(J(Z) ≤ γ) · P(F (Z) ≥ 0), (2)

where p(Z) is some base measure on Z that for in-stance can incorporate prior knowledge about desirable tra-jectories. In computational convenience and efficiency weassume a parametric distribution q(z) = p(z; v) wherev ∈ V is the parameter. Problem (2) is solved ap-proximately by finding the optimal parameter v∗ accordingto v∗ = argmaxv∈V 1

N

∑Ni=1 I{J(Zi)≤γ∧F (Zi)≥0} log p(Zi, v),

where Z1, ..., Zn are i.i.d. samples from a base measure p(·, v0).

B. Optimization through Function Approximation

E[J(x)]

Model of Trajectory Cost J(x) Model of Constraint Function F (x)

Constraint Satisfaction Probability Importance Sampling Density

E[F (x)]

P (F (x) ≥ 0) q(x)

x2

x1 x1

x1

x2 x2

x2

x1

x2

xstart

xgoal

obstacle

computed near-optimal path

x1

sampled paths

iterate

Fig. 3. A simple optimal planning problem solved using Gaussian Processmodels of the cost J(x) and constraints F (x). The plots show the evolvedmodels after 20 iterations. Remarkably, the importance density q clearly indicatesthat the optimal region to select x are the states around the border of the obstaclethat are reachable from both start and goal.

The second approach is to construct probabilistic models ofthe functions J(z) and F (z) in order to predict the performanceof unobserved trajectories. In this case the probability densitywill be artificially constructed according to

q(z) ∝ P(J(z) < γ) · P(F (z) ≥ 0), (3)

where here J and F are regarded as random functions for eachfixed parameter z. We will assume that the processes J(z) andF (z) have normal marginal distributions, i.e. they will be mod-eled as Gaussian Processes (GP). This is particularly convenientfor constructing q in (3) through the simple expressions

P(J(z)≤γ)=Φ

(γ − E[J(z)]√

V[J(z)]

), P(F (z)≥0)=Φ

(E[F (z)]√V[F (z)]

),

where Φ(·) is the standard unit-normal CDF, and E[·] andV[·] denote expectation and variance. A simple example of apreliminary study is shown on Figure 3.