Chapter02b

1 Dana Nau and Vikas Shivashankar: Lecture slides for Automated Planning and Ac0ng Updated 2/5/15

Backward Search ●  Forward search starts at the initial state

Ø  Chooses an action that’s applicable Ø  Computes state transition s′ = γ (s,a)

●  Backward search starts at the goal Ø  Chooses an action that’s relevant

•  A possible “last action” before the goal Ø  Computes inverse state transition g′ = γ –1(g,a)

•  g′ = properties a state s′ should satisfy in order for γ (s′,a) to satisfy g ●  Why would we want to do this? ●  One possibility: sometimes has a lower branching factor

Ø  Forward: 10 applicable actions •  for each robot, two move actions and three load actions

Ø  Backward: g = {loc(r1)=d3} Ø  2 relevant actions: move(r1,d1,d3), move(r1,d2,d3)

•  Can eliminate move(r1,d2,d3); it requires a rigid condition that’s false

d2 d1

d3

r1 c1

r2 c2 c3 c4 c5 c6


Relevance ●  Idea: a is relevant for g if a could be the last action of a plan that achieves g ●  Definition:

Ø  Let g = {g1, …, gk} be a goal. An action a is relevant for g if 1. eff(a) makes at least one gi true, i.e., eff(a) ∩ g ≠ ∅ 2. eff(a) doesn’t make any gi false

▸  ∀ x, c, c′, if eff(a) contains (x,c) and g contains x = c′ then c = c′ 3. pre(a) doesn’t require any gi to be false unless eff(a) makes gi true

▸  ∀ x, c, c′, if (x,c) ∈ pre(a) and (x,c′) ∈ g – eff(a) then c = c′

●  What actions are relevant for loc(c1)=r2 ?

d2 d1

d3

r1 c1

r2 c2 c3 c4 c5 c6


Inverse State Transi5ons

●  If a is relevant for achieving g, then γ−1(g,a) = pre(a) ∪ (g – eff(a))

●  If a isn’t relevant for g, then γ–1(g,a) is undefined

●  Example: Ø  g = {loc(c1)=r2} Ø  What is γ –1(g, load(r2,c1,d3))? Ø  What is γ –1(g, load(r1,c1,d1))?

d2 d1

d3

r1 c1

r2 c2 c3 c4 c5 c6


Backward Search For cycle checking: ●  After line 1, put

Solved = {g} ●  After line 6, put

if g′ ∈ Solved then return failure Solved ← Solved ∪ {g′}

●  More powerful: if ∃g ∈ Solved s.t. g ⊆ g′ then return failure

●  Sound and complete Ø  If a planning problem is solvable

then at least one of Backward-search’s nondeterministic execution traces will find a solution

g

g1

g2

g3

a1

a2

a3

g4

g5

s0

a4

a5


Branching Factor ●  Our motivation for Backward-‐search

was to focus the search Ø  But as written, it doesn’t really

accomplish that ●  Solve this by lifting

Ø  Leave y uninstantiated

. . .

move(r1,d2,d3)

move(r1,d4,d3)

move(r1,d20,d3)

move(r1,d1,d3)

g = {loc(r1)=d3}

move(r1,y,d3) g = {loc(r1)=d3}

d2 d1

d3

r1 c1

r2 c2 c3 c4 c5 c6


Li:ed Backward Search ●  Like Backward-‐search but more complicated

Ø  Have to keep track of what values were substituted for which parameters Ø  But it has a much smaller branching factor

●  I won’t discuss the details


Plan-‐Space Planning ●  Another approach

Ø  formulate planning as a constraint satisfaction problem Ø  use constraint-satisfaction techniques to produce solutions that are more

flexible than ordinary plans •  E.g., plans in which the actions are partially ordered •  Postpone ordering decisions until the plan is being executed

▸  the actor may have a better idea about which ordering is best ●  First step toward planning concurrent execution of actions (Chapter 4)

Outline: •  Basic idea •  Open goals •  Threats •  The PSP algorithm •  Long example •  Comments


Plan-‐Space Planning -‐ Basic Idea ●  Backward search from the goal ●  Each node of the search space is a partial plan, π

•  A set of partially-instantiated actions •  Constraints on the actions

Ø  Keep making refinements, until we have a solution

●  Types of constraints: Ø  precedence constraints

indicated by solid arcs Ø  binding constraints

•  inequality constraints, e.g., z ≠ x or w ≠ p1 Ø  causal links:

•  indicated by dashed arcs •  use effect e of action a to establish precondition p of action b

●  How to tell we have a solution: no more flaws in the plan Ø  Two kinds of flaws …

foo(x) Pre: … Eff: loc(x)=p1

bar(x) Pre: loc(x)=p1 Eff: …

baz(z) Pre: loc(z)=p2 Eff: …

z ≠ x


Flaws: 1. Open Goals

●  A precondition p of an action b is an open goal if there is no causal link for p

●  Resolve the flaw by creating a causal link Ø  Find an action a (either already in π,

or can add it to π) that can establish p •  can precede b •  can have p as an effect

Ø  Do substitutions on variables to make a assert p •  e.g., replace y with x

Ø  Add an ordering constraint a ≺ b Ø  Create a causal link from a to p

Pre: loc(y)=p1

Pre: loc(y)=p1

bar(y)

foo(y) bar(y)

substitute y for x

Eff: loc(y)=p1

foo(x)

Eff: loc(x)=p1


Flaws: 2. Threats ●  Suppose we have a causal link from

action a to precondition p of action b ●  Action c threatens the link if c may affect p

and may come between a and b Ø  c is a threat even if it makes p true

rather than false •  Causal link means a, not c, is

supposed to establish p for b •  The plan in which c establishes p

will be generated on another path in the search space

●  Three possible ways to resolve the flaw: Ø  Require c ≺ a Ø  Require b ≺ c Ø  Constrain variable(s) to prevent

b from affecting p

Pre: loc(y)=p1

foo(y) bar(y)

clobber(z) Eff: loc(z)=p2

Eff: loc(y)=p1

Pre: loc(y,p1)

foo(y) bar(y)

clobber(z) Eff: ¬loc(z,p1)

Eff: loc(y,p1)

State variables:

Classical:


PSP Algorithm

●  Initial plan is always {Start, Finish} with Start ≺ Finish Ø  Start has no preconditions; effects are the initial state s0 Ø  Finish has no effects; its precondition is the goal g

●  PSP is sound and complete Ø  It returns a partially ordered solution π such that any

total ordering of π will achieve g Ø  In some environments, could execute actions in parallel

Start

Finish

Eff: s0

Pre: g


move(c, y, z) pre: pos(c)=y, clear(c)=T, clear(z)=T eff: pos(c)←z, clear(y)←T, clear(z)←F Range(c) = Containers; Range(y) = Range(z) = Container ∪ pallets

Finish

p3

p1

p2

a d

p4 b c

Start

pos(a)=d pos(b)=c

c d b a

●  Finish has two open goals: pos(a)=d, pos(b)=c

Example

clear(p1)=T clear(p2)=T clear(p3)=F clear(p4)=F clear(a)=F clear(b)=F clear(c)=T clear(d)=T pos(a)=p3 pos(b)=p4 pos(c)=b pos(d)=a



move(a,y1,d)

Start

Finish

clear(a)=T pos(a)=y1

pos(a)=d pos(b)=c

clear(d)=T

Example ●  For each open goal, add a new action

Ø  Every new action a must have Start ≺ a, a ≺ Finish

p3

p1

p2

a d

p4 b c

clear(b)=T pos(b)=y2 clear(c)=T

move(b,y2,c)


c d b a



move(a,p3,d)

Start

Finish

clear(a)=T clear(b)=T pos(b)=p4 pos(a)=p3

pos(a)=d pos(b)=c

Example ●  Resolve four more open goals: bind y1=p3, y2=p4

clear(c)=T clear(d)=T

p3

p1

p2

a d

p4 b c

move(b,p4,c)


c d b a



move(a,p3,d)

clear(d)=T pos(b)=p4

Start

Finish

clear(a)=T clear(b)=T

Example ●  1st threat requires z3≠d ●  2nd threat has two resolvers:

Ø  move(b,p4,c) ≺ move(x3,a,z3) Ø  z3≠c

pos(a)=p3 clear(c)=T

clear(x3)=T clear(z3)=T pos(x3)=a

pos(a)=d pos(b)=c

p3

p1

p2

a d

p4 b c

move(x3,a,z3)

move(b,p4,c)


c d b a



move(a,p3,d)


Start

Finish


Example

●  Threats resolved


clear(x3)=T clear(z3)=T pos(x3)=a

pos(a)=d pos(b)=c

p3

p1

p2

a d

p4 b c

move(x3,a,z3)

move(b,p4,c)

z3≠c z3≠d


c d b a



move(a,p3,d)


Start

Finish


move(b,p4,c)

Example ●  1st threat has two resolvers: Ø  An ordering constraint,

and z4≠d ●  2nd threat has three resolvers:

Ø  Two ordering constraints, and z4≠a

●  3rd threat has one: z4≠c


move(x4,b,z4)

clear(x3)=T clear(x4)=T clear(z3)=T clear(z4)=T pos(x4)=b pos(x3)=a

pos(a)=d pos(b)=c

Start

p3

p1

p2

a d

p4 b c

move(x3,a,z3)

z3≠c z3≠d

c d b a



move(x3,a,z3)

move(a,p3,d)


Start

Finish


move(b,p4,c)

Example ●  Resolve the three threats using

the binding constraints


move(x4,b,z4)


pos(a)=d pos(b)=c

p3

p1

p2

a d

p4 b c z4≠a

z4≠c z4≠d

z3≠c z3≠d

c d b a


p1≠c p1≠d


move(a,p3,d)


Start

Finish


move(b,p4,c)

Example ●  Resolve five open goals

Ø  Bind x3=d, x4=c, z3=p1


move(d,a,p1) move(c,b,z4)

clear(d)=T clear(c)=T clear(p1)=T clear(z4)=T pos(c)=b pos(d)=a

pos(a)=d pos(b)=c

p3

p1

p2

a d

p4 b c z4≠a

z4≠c z4≠d

c d b a



move(a,p3,d)


Start

Finish


move(b,p4,c)

Example ●  Threatened causal link ●  Resolvers:

Ø  move(d,a,p1) ≺ move(c,b,z4) Ø  z4≠p1




pos(a)=d pos(b)=c

p3

p1

p2

a d

p4 b c z4≠a

z4≠c z4≠d

c d b a



move(a,p3,d)


Start

Finish


move(b,p4,c)

Example ●  Threat resolved




pos(a)=d pos(b)=c

p3

p1

p2

a d

p4 b c z4≠a

z4≠c z4≠d z4≠p1

c d b a



move(a,p3,d)


Start

Finish


move(b,p4,c)

Example ●  Resolve open goal Ø  bind z4=p2

●  No more flaws, so we’re done!


move(d,a,p1) move(c,b,p2)

clear(d)=T clear(c)=T clear(p1)=T clear(p2)=T pos(c)=b pos(d)=a

pos(a)=d pos(b)=c

p3

p1

p2

a d

p4 b c p2≠a

p2≠c p2≠d p2≠p1

c d b a


move(a,p3,d)

Start

Finish

move(b,p4,c)

Example ●  PSP returns this solution:


p3

p1

p2

a d

p4 b c

c d b a


move(a,p3,d)


Start

Finish


move(b,p4,c)

Example ●  Go back to the last threat ●  Resolvers:

Ø  move(d,a,p1) ≺ move(c,b,z4) Ø  z4≠p1




pos(a)=d pos(b)=c

p3

p1

p2

a d

p4 b c z4≠a

z4≠c z4≠d

c d b a


move(a,p3,d)


Start

Finish


move(b,p4,c)

Example ●  Threat resolved




pos(a)=d pos(b)=c

p3

p1

p2

a d

p4 b c z4≠a

z4≠c z4≠d

c d b a


move(a,p3,d)


Start

Finish


move(b,p4,c)

Example ●  Resolve open goal Ø  bind z4=p2

●  No more flaws, so we’re done



clear(d)=T clear(c)=T clear(p1)=T clear(p2)=T pos(c)=b pos(d)=a

pos(a)=d pos(b)=c

p3

p1

p2

a d

p4 b c

c d b a


move(a,p3,d)

Start

Finish

move(b,p4,c)

Example ●  Same solution as before,

but with another ordering constraint


p3

p1

p2

a d

p4 b c

c d b a


Node-‐Selec5on Heuris5cs ●  Analogy to constraint-satisfaction problems

Ø  Resolving a flaw in PSP ≈ assigning a value to a variable in a CSP ●  What flaw to work on next?

Ø  Fewest Alternatives First (FAF): the flaw with the fewest resolvers ≈ Minimum Remaining Values (MRV) heuristic for CSPs

●  To resolve the flaw, which resolver to try first? Ø  Least Constraining Resolver (LCR): the resolver that rules out the fewest

resolvers for the other flaws ≈ Least Constraining Value (LCV) heuristic for CSPs

●  In PSP, introducing a new action introduces new flaws to resolve Ø  The plan can get arbitrarily large; want it to be as small as possible

•  Not like CSPs, where the search tree always has a fixed depth ●  Avoid introducing new actions unless necessary ●  To choose between actions a and b, estimate distance from s0 to Pre(a) and Pre(b)

Ø  We’ll discuss some heuristics for that later


move(a,p3,d)


Start

Finish


move(b,p4,c)

Example ●  Example of Fewest Alternatives First:

Ø  1st threat has two resolvers: an ordering constraint, and z4≠d Ø  2nd threat has three resolvers: 2 ordering constraints, and z4≠a Ø  3rd threat has one resolver: z4≠c

●  So resolve the 3rd threat first


move(x4,b,z4)


pos(a)=d pos(b)=c

Start

move(x3,a,z3)

z3≠c z3≠d


Discussion ●  Problem: how to prune infinitely long paths in the search space?

Ø  Loop detection is based on recognizing states or goals we’ve seen before

Ø  In a partially ordered plan, we don’t know the states

●  Can we prune a path if we see the same action more than once?

Ø  No. Sometimes we might need the same action several times in different states of the world

Ø  Example on next slide

s s' s

act1 act2 act1 … …

…


Example ●  3-digit binary counter d3 d2 d1

s0 = {d3=0, d2=0, d1=0}, i.e., 0 0 0

g = {d3=1, d2=1, d1=1}, i.e., 1 1 1

●  Actions to increment the counter

•  incr-‐xx0-‐to-‐xx1 Pre: d1=0 Eff: d1=1

•  incr-‐x01-‐to-‐x10 Pre: d2=0, d1=1 Eff: d2=1, d1=0

•  incr-‐011-‐to-‐100 Pre: d3=0, d2=1, d1=1 Eff: d3=1, d2=0, d1=0

●  Plan: d3 d2 d1

s0 : 0 0 0 incr-‐xx0-‐to-‐xx1 à 0 0 1 incr-‐x01-‐to-‐x10 à 0 1 0 incr-‐xx0-‐to-‐xx1 à 0 1 1 incr-‐011-‐to-‐100 à 1 0 0 incr-‐xx0-‐to-‐xx1 à 1 0 1 incr-‐x01-‐to-‐x10 à 1 1 0 incr-‐xx0-‐to-‐xx1 à 1 1 1


A Weak Pruning Technique

●  Can prune all partial plans of n or more actions, where n = |{all possible states}| Ø  This doesn’t help very much

●  I’m not sure whether there’s a good pruning technique for plan-space planning


Planning with Control Rules Motivation: ●  Given a state s and an action a ●  Sometimes domain-specific tests can

tell us we don’t want to use a, e.g., Ø  a doesn’t lead to a solution Ø  or a is dominated

•  there’s a better solution along some other path

Ø  or a doesn’t lead to a solution that’s acceptable according to domain-specific criteria

●  In such cases we can prune s (remove it from Act)

●  Approach: Ø  Write logical formulas giving conditions that states must satisfy Ø  Prune states that don’t satisfy the formulas


Quick Review of First Order Logic First Order Logic (FOL): ●  Syntax:

Ø  atomic formulas (or atoms) •  predicate symbol with arguments, e.g., clear(c) •  include ‘=’ as a binary predicate symbol, e.g., loc(r1)=d1

Ø  logical connectives (∨, ∧, ¬, ⇒, ⇔), quantifiers (∀, ∃), punctuation •  e.g., (loc(r1)=d1 ∧ ∀c clear(c)) ⇒ ¬∃c loc(c)=r1

●  First Order Theory T: Ø  “Logical” axioms and inference rules – encode logical reasoning in general Ø  Additional “nonlogical” axioms – talk about a particular domain Ø  Theorems: produced by applying the axioms and rules of inference

●  Model: a set of objects, functions, relations that the symbols refer to Ø  For our purposes, a model is a state of the world s Ø  In order for s to be a model, all theorems of T must be true in s Ø  s ⊨ loc(r1)=d1 read “s satisfies loc(r1)=d1” or “s entails loc(r1)=d1”

•  means that r1 is at d1 in the state s


Linear Temporal Logic ●  Modal logic: FOL plus modal operators

to express concepts that would be difficult to express within FOL ●  Linear Temporal Logic (LTL):

Ø  Purpose: to express a limited notion of time •  Infinite sequence 〈0, 1, 2, …〉 of time instants •  Infinite sequence M = 〈s0, s1, …〉 of states of the world

Ø  Modal operators to refer to states in M: X f “next f ” - f is true in the next state, e.g., F loc(a)=b F f “future f ” - f either is true now or in some future state G f “globally f ” - f is true now and in all future states f1 U f2 “f1 until f2” - f2 is true now or in a future state,

and f1 is true until then Ø  Propositional constant symbols True and False

•  Instead of T and F, to avoid confusion with the F operator


Linear Temporal Logic (con5nued) ●  Quantifiers cause problems with computability

Ø  Suppose f(x) is true for infinitely many values of x Ø  Problem evaluating truth of ∀x f(x) and ∃x f(x)

●  Bounded quantifiers Ø  Let g(x) be such that {x | g(x) is true} is finite and easily computed

∀[x: g(x)] f(x) ▸  means ∀x (g(x) ⇒ f(x)) ▸  expands into f(x1) ∧ f(x2) ∧ … ∧ f(xn)

∃[x: g(x)] f(x) ▸  means ∃x (g(x) ∧ f(x)) ▸  expands into f(x1) ∨ f(x2) ∨ … ∨ f(xn)


Nota5on ●  We can use state-variable assignments as logical propositions in LTL formulas

Ø  G (∀[x: clear(x)=T] final(x)=T ⇒ X(clear(x)=T ∨ ∃[y: loc(y)=x] final(y)=T))

●  For Boolean state variables, simpler to write them as logical propositions •  Instead of clear(x)=T, just write clear(x) •  Instead of clear(x)=F, write ¬clear(x)

Ø  G (∀[x: clear(x)] final(x) ⇒ X(clear(x) ∨ ∃[y: loc(y)=x] final(y)))


pickup(x) pre: loc(x)=floor, clear(x), holding=nil eff: loc(x)=crane, ¬clear(x), holding=x

b

stack(x,y) pre: holding=x, clear(y) eff: holding=nil, ¬clear(y), loc(x)=y, clear(x)

●  The “container stacking” domain Ø  Based on a classical planning domain

called the “blocks world”

unstack(x,y) pre: loc(x)=y, clear(x), holding=nil eff: loc(x)=crane, ¬clear(x), holding=x, clear(y)

putdown(x) pre: holding=x eff: holding=nil, loc(x)=floor, clear(x)

Example clear(e), loc(e)=d, loc(d)=floor, clear(c), loc(c)=a, loc(a)=floor, clear(b), loc(b)=floor, holding=nil

d e

a c

b

d e

a c

clear(e), loc(e)=d, loc(d)=floor, clear(c), loc(c)=a, loc(a)=floor, ¬clear(b), loc(b)=crane, holding=b


Models for Planning with LTL ●  A model is a pair M = (M, si)

Ø  M = 〈s0, s1, …〉 is a sequence of states Ø  si is the i’th state in M,

●  For planning, we also have a goal g = {g1, …, gn} Ø  To reason about it, add a modal operator called “Goal”

•  Not part of ordinary LTL, but I’ll call it LTL anyway

Ø  In an LTL formula, use “Goal(gi)” to refer to part of g •  ((M,si), g) ⊨ Goal(gi) iff g ⊨ gi

●  Planning problem: Ø  Initial state s0, a goal g, control formula f Ø  Find a plan π = 〈a1, …, an〉 that generates a sequence of states

M = 〈s0, s1, …sn〉 such that M ⊨ f and sn ⊨ g •  That’s not quite correct •  Do you know why?


Models for Planning with LTL ●  M needs to be an infinite sequence ●  Kluge: assume that the final state repeats infinitely after the plan ends

●  Planning problem: Ø  Initial state s0, a goal g, control formula f Ø  Find a plan π = 〈a1, …, an〉 that generates a sequence of states

M = 〈s0, s1, …, sn, sn, sn, …〉 such that M ⊨ f and sn ⊨ g


Examples ●  Suppose M = 〈s0, s1, …〉

●  (M,s2) ⊨ XX loc(a)=b

Ø  a is on b in state s2

●  Abbreviation: can omit the state, it defaults to s0 Ø  M ⊨ XX loc(a)=b means (M,s0) ⊨ XX loc(a)=b

●  Since loc(a)=b has no modal operators Ø  (M,s2) ⊨ loc(a)=b is equivalent to s2 ⊨ loc(a)=b

●  M ⊨ G holding ≠ c Ø  in every state in M, we aren’t holding c

●  M ⊨ G (clear(b) ⇒ (clear(b) U loc(a)=b)) Ø  whenever we enter a state in which b is clear,

b remains clear until a is on b


TLPlan ●  Nondeterministic forward search

Ø  s = current state, f = control formula, g = goal

●  If s satisfies g then we’re done ●  Otherwise, think about what kind of plan we need

Ø  It must generate a sequence of states M = 〈s, s+, s++, …〉 that satisfies f ●  Compute a formula f + such that

(M,s) ⊨ f iff (M,s+) ⊨ f + ●  Fail if f + = FALSE

Ø  No matter what s+ is, (M,s+) can’t satisfy f +

●  Fail if no applicable actions

●  Otherwise, nondeterministically choose one, compute s+, and call TLPlan with s+ and f +

TLPlan (s, f, g) if s satisfies g then return ⟨ ⟩ f + ← Progress (f, s) if f + = False then return failure A ← {actions applicable to s} if A is empty then return failure nondeterministically choose a ∈ A π+ ← TLPlan (γ (s,a), f +, g) if π+ ≠ failure then return π.π+ return failure


Progression ●  Procedure Progress(f,s)

◆  Case: 1. f contains no temporal ops : f + ← True if s ⊨ f, False otherwise 2. f = f1 ∧ f2 : f + ← Progress(f1, s) ∧ Progress(f2, s) 3. f = f1 ∨ f2 : f + ← Progress(f1, s) ∨ Progress(f2, s) 4. f =¬ f1 : f + ← ¬Progress(f1, s) 5. f = X f1 : f + ← f1 6. f = F f1 : f + ← Progress(f1, s) ∨ f 7. f = G f1 : f + ← Progress(f1, s) ∧ f 8. f = f1 U f2 : f + ← Progress(f2, s) ∨ (Progress(f1, s) ∧ f) 9. f = ∀[x:g(x)] h(x) : f + ← Progress(h(x1), s) ∧ … ∧ Progress(h(xn), s)

10. f = ∃ [x:g(x)] h(x) : f + ← Progress(h(x1), s) ∨ … ∨ Progress(h(xn), s)

◆  simplify f + and return it

False ∧ h = False, True ∧ h = h, ¬False = True, etc.

Compute the formula f + that M + must satisfy


Progressing ordinary formulas ●  Procedure Progress(f,s)




●  f = loc(a)=b ◆  if a is currently on b, then True (every possible M + is OK) ◆  otherwise False (there is no M + that’s OK)

Example:


●  f = XX loc(a)=b Ø  two states from now,

a must be on b Ø  f + = X loc(a)=b

●  f = X loc(a)=b ◆  in the next state,

a must be on b ◆  f + = loc(a)=b

Progressing X ●  Procedure Progress(f,s)




Examples:


Progressing ∧

●  f = clear(c) ∧ X loc(a)=c Ø  c must be clear now, and a must be on c in the next state

●  f + = Progress(clear(c), s) ∧ Progress(X loc(a)=c, s) = True ∧ loc(a)=c = loc(a)=c

●  Procedure Progress(f,s) ◆  Case:

1. f contains no temporal ops : f + ← True if s ⊨ f, False otherwise 2. f = f1 ∧ f2 : f + ← Progress(f1, s) ∧ Progress(f2, s) 3. f = f1 ∨ f2 : f + ← Progress(f1, s) ∨ Progress(f2, s) 4. f =¬ f1 : f + ← ¬Progress(f1, s) 5. f = X f1 : f + ← f1 6. f = F f1 : f + ← Progress(f1, s) ∨ f 7. f = G f1 : f + ← Progress(f1, s) ∧ f 8. f = f1 U f2 : f + ← Progress(f2, s) ∨ (Progress(f1, s) ∧ f) 9. f = ∀[x:g(x)] h(x) : f + ← Progress(h(x1), s) ∧ … ∧ Progress(h(xn), s)



Example:

a b

c


Progressing ∧

●  f = G loc(a)=c Ø  a must be on c now and must stay there in the future

●  f + = Progress(loc(a)=c, s) ∧ f = False ∧ G loc(a)=c = False





Example:

a b

c


Progressing ∧

●  f = loc(a)=b U clear(c) Ø  c must be clear, or a must be on b and stay there until c is clear

●  f + = Progress(clear(c), s) ∨ [Progress(loc(a)=b, s) ∧ f ] = True ∨ [ False ∧ (loc(a)=b) U clear(c))] = True





a b

c






Progressing ∀

●  f = ∀[x: clear(x)] X loc(x)=floor Ø  {x | clear(x)} = {a, c}

●  f + = Progress(X loc(a)=floor, s) ∧ Progress(X loc(c)=floor, s) = loc(a)=floor ∧ loc(c)=floor

Example:

xi is the i’th element of {x | s ⊨ g(x)}

a b

c






Progressing ∃

●  f = ∃[x: clear(x)] X loc(x)=floor Ø  {x | clear(x)} = {a, c}

●  f + = Progress(X loc(a)=floor, s) ∨ Progress(X loc(c)=floor, s) = loc(a)=floor ∨ loc(c)=floor

Example:

xi is the i’th element of {x | s ⊨ g(x)}

a b

c


TLPlan ●  Nondeterministic forward search

Ø  s = current state, f = control formula, g = goal

●  If s satisfies g then we’re done ●  Otherwise, think about what kind of plan we need

Ø  It must generate a sequence of states M = 〈s, s+, s++, …〉 that satisfies f ●  Compute a formula f + such that

(M,s) ⊨ f iff (M,s+) ⊨ f + ●  Fail if f + = FALSE

Ø  No matter what s+ is, (M,s+) can’t satisfy f +

●  Fail if no applicable actions

●  Otherwise, nondeterministically choose one, compute s+, and call TLPlan with s+ and f +



Example Planning Problem ●  s = {loc(a)=floor, loc(b)=floor, clear(a), clear(c), loc(c)=b} ●  g = {loc(b)=a}

●  f = G ∀[x: clear(x)] (loc(x)≠floor ∨ ∃[y: Goal(loc(x)=y)] ∨ X holding≠x)

Ø  never pick up a clear container from the floor unless it needs to be elsewhere

●  Run the TLPlan algorithm ●  Compute f +

Ø  Return failure if f + = FALSE

●  Two applicable actions: pickup(a) and unstack(c,b) Ø  Which one to use?

●  Try using pickup(a)

Ø  Call TLPlan recursively with γ (s, pickup(a)) and f +

●  If TLPlan returns failure, then try unstack(c,b)

a b

b a

c s0: g:



Example Planning Problem ●  s = {loc(a)=floor, loc(b)=floor, clear(a), clear(c), loc(c)=b} ●  g = {loc(b)=a}

●  f = G ∀[x: clear(x)] (loc(x)≠floor ∨ ∃[y: Goal(loc(x)=y)] ∨ X holding≠x)

●  f + = Progress(G f1,s) = Progress(f1,s) ∧ f = Progress(∀[x: clear(x)] h(x)), s) ∧ f = Progress(h(a) ∧ h(c)), s) ∧ f = Progress(h(a)), s) ∧ Progress(h(c)), s) ∧ f

•  Progress(h(a),s) = Progress(loc(a)≠floor ∨ ∃[y: Goal(loc(a)=y)] ∨ X holding≠a),s) = False ∨ False ∨ holding≠a = holding≠a

•  Progress(h(c),s) = Progress(loc(c)≠floor ∨ ∃[y: Goal(loc(c)=y)] ∨ X holding≠c),s) = False ∨ True ∨ holding≠c = True

●  f + = holding≠a ∧ True ∧ f = holding≠a ∧ f

●  Two applicable actions: pickup(a) and unstack(c,b) Ø  s1 = γ (s, pickup(a)): Progress(f +, s1) = False ⇒ backtrack Ø  s2 = γ (s, unstack(c,b)): Progress(f +, s2) = f ⇒ keep going

a b

b a

c s0: g:

h(x)

f1


Container-‐Stacking Problems ●  Define an inferred state variable final(x) ∈ Booleans, where x is a container

•  Never directly changed by any planning operator •  Produced by logical inference from the other state variables

●  Want final(x) to mean x is at the top of a stack that we’re finished moving Ø  Neither x nor the containers below x will ever need to be moved

●  Axioms to support this: Ø  final(x) ⇔ clear(x) ∧ ¬Goal(holding=x) ∧ finalbelow(x) Ø  finalbelow(x) ⇔

(loc(x)=floor ∧ ¬∃[y: Goal(loc(x)=y]) ∨ ∃[y: loc(x)=y] [ ¬Goal(loc(x)=floor) ∧ ¬Goal(holding=y) ∧ ¬Goal(clear(y)) ∧ ∀[z : Goal(loc(x)=z)] (z=y) ∧ ∀[z: Goal(loc(z)=y)] (z=x) ∧ finalbelow(y)]

Ø  nonfinal(x) ⇔ clear(x) ∧ ¬final(x)


Control Rules Try TLPlan with three different control formulas:

(1) If x is final, only put a container y onto x if it will make y final:

Ø  G ∀[x: clear(x)] (final(x) ⇒ X [clear(x) ∨ ∃[y: loc(y)=x] final(y)])

(2) Like (1), but also says never to put anything onto a container that isn’t final: Ø  G ∀[x: clear(x)] [

(final(x) ⇒ X [clear(x) ∨ ∃[y: loc(y)=x] final(y)]) ∧ (nonfinal(x) ⇒ X ¬∃[y: loc(y)=x])]

(3) Like (2), but also says never to pick up a nonfinal container from the floor

unless you can put it where it will be final: Ø  G ∀[x: clear(x)] [

(final(x) ⇒ X [clear(x) ∨ ∃[y: loc(y)=x] final(y)]) ∧ (nonfinal(x) ⇒ X ¬∃[y: loc(y)=x]) ∧ (onfloor(x) ∧ ∃[y: Goal(loc(x)=y)] ¬final(y) ⇒ X¬holding(x))]


Container Stacking


Domain-‐Specific Planning Algorithms

●  Sometimes we can write highly efficient planning algorithms for a specific class of problems Ø  Use special properties of that class

●  For container-stacking problems with n containers, we can easily get a solution of length O(n) Ø  Move all containers to the floor, then build up stacks from the bottom

●  With additional domain-specific knowledge, can do even better …

a e

b c d

e c b a

d s0 g


Container-‐Stacking Algorithm

●  The algorithm generates the following sequence of actions: Ø  ⟨move(e,a,floor), move(d,c,e), move(c,b,floor), move(b,floor,c),

move(a,floor,b)⟩

g e c

b a

d

●  c needs moving if ◆  s contains loc(c)=d and

g contains loc(c)=e, where e≠d

◆  s contains loc(c)=d and g contains loc(b)=d, where b≠c

◆  s contains loc(c)=d and d needs moving

loop if ∃ a clear container c that needs moving

& we can move c to a position d where c won’t need moving

then move c to d else if ∃ a clear container c that needs moving then move c to any clear pallet else if the goal is satisfied then return success else return failure

repeat

a e

b c d s0


Proper5es of the Algorithm

●  Sound, complete, guaranteed to terminate on all container-stacking problems

●  Runs in time O(n3) Ø  Can be modified (Slaney & Thiébaux) to run in time O(n)

●  Often finds optimal (shortest) solutions ●  But sometimes only near-optimal

Ø  For container-stacking problems, PLAN-LENGTH is NP-complete

●  I think what TLPlan does (with its 3rd control rule) is roughly similar to this algorithm


Using Determinis5c Domain Models ●  For planning with deterministic domain models, we made some assumptions that

aren’t necessarily true Assumption Problem Ø  Static world The world may change dynamically Ø  Perfect information We almost never have all of the information Ø  Instantaneous actions Actions take time; there may be time constraints Ø  Correct predictions Action models usually are just approximations Ø  Determinism Action model may just be the “nominal case” Ø  Flat search space There may further lower-level refinements

●  If enough of the assumptions are approximately true, the plans may still be useful

Ø  But can’t just take a plan π and start executing it Ø  Need to monitor π’s execution, detect problems as they occur, recover from

them


Ac5ng and Planning

●  Interaction is roughly as follows Ø  loop

•  from the planner, get the latest plan or partial plan

•  perform one or more actions, monitoring the current state ▸  if problems occur, replan

while performing some preplanned recovery actions

●  Performance could involve lower-level

refinement rather than direct execution Ø  The next chapter contains lots of

details

Acting

Planning

Performance


7

s s’Predict

Search

8

Planning stageActing stage

Ac5ng and Planning ●  What kind of information should the

planner provide? Ø  Depends on the planning domain

and the actor ●  Some possibilities

Ø  Complete plan, as in the algorithms we’ve discussed •  But usually for a subproblem

▸  example on next slide Ø  Partial plan

•  e.g., receding horizon Ø  Several partial plans, with

relative evaluations of each •  e.g., game-tree search

overall'problem

sub1 sub2 sub3


Example ●  Killzone 2

Ø  “First-person shooter” game ●  Special-purpose AI planner

Ø  Plans enemy actions at the squad level •  Subproblems; solution

plans are maybe 4–6 actions long

Ø  Different planning algorithm than what we’ve discussed so far, but it uses a deterministic domain model

Ø  Quickly generates a plan that would work if nothing interferes Ø  Replans several times per second as the world changes

●  Why it worked: Ø  Don’t want to get the best possible plan Ø  Need actions that appear believable and consistent to human users Ø  Need them very quickly

Chapter02b

Documents

Transcript of Chapter02b