Cost Based Satisficing Search Considered Harmful William Cushing J. Benton Subbarao Kambhampati.

29
Cost Based Satisficing Search Considered Harmful William Cushing J. Benton Subbarao Kambhampati

Transcript of Cost Based Satisficing Search Considered Harmful William Cushing J. Benton Subbarao Kambhampati.

Cost Based Satisficing Search Considered

Harmful

William CushingJ. Benton

Subbarao Kambhampati

Performance Bug: ε-Cost `Trap’

High cost variance: ε = $0.01 / $100.00 Board/Fly Load/Drive Labor/Precious Material Mode Switch/Machine Operation

Search depth: 0-1(heuristic-error)=∞ ε-1(heuristic-error)=huge

Optimal: cost=$1000.00, size=100,000 Runner-up: cost=$1000.10, size=20 Trillions of nodes expanded: When does depth 20 get exhausted?

Outline

Inevitability of e-cost Traps Cycle Trap Branching Trap Travel Domain

If Cost is Bad, then what? Surrogate Search Simple First: Size Then: Cost-Sensitive Size-Based

Search

Cycle Trap

Effective search graph g’ = f = g + h Edge weights = changes in

f 0 = ideal - = over-estimated earlier

Or under-estimating now + = under-estimated earlier

Or over-estimating now Simple subgraph

Heuristic plateau 1 choice: Which way?

1

1 1

1

Cycle Trap

2

2 0

1)2(

Even providing a heuristic perfect for all but 1 edge…

Cost-based search fails

Reversible operators are one way in which heuristic penalty can end up being bounded from above “Unbounded f along unbounded

paths”, to have completeness, also forces a heuristic upper bound

Fantastically over-estimating (weighting) could help, but: Suppose the right edge actually

costs 1 – ε Then both directions would have

identical heuristic value Weighting would be fruitless

x = # of 1 cost childreny = # of ε cost childrend/2 + dε/2 = Cd = 2C/(1+ε)

Branching Trap

x+y1/ε = ways to spend 1(x+y1/ε)C = ways to spend C(x+y)d = # of paths at same

depth(x+y)2C/(1+ε) << (x+y1/ε)C

Travel

A B

1 2

R

Straight Fly = 10,000 cents Diag. Fly = 7,000 cents Board/Debark = 1 cent Various Solutions:

Cheapest Plan Fastest Plan Smallest Plan

Travel – Cheapest Plan

A B

1 2

R

Travel – Cheapest Plan

A B

1 2

R

Travel – Decent Start

A B

1 2

R

g = 1 fly + 4 board + 1 debark

h = 2 fly + 4 debark + 1 board

f ~ 3 fly

Travel – Begin Backtracking

A B

1 2

R

g = 2 fly + 4 board + 1 debark

h = 2 fly + 4 debark + 1 board

f ~ 4 fly

Travel – Backtracking

A B

1 2

R

g = 1 fly + 4 board + 2 debarkh = 2 fly + 4 debark + 2 board

Travel – Backtracking

A B

1 2

R

g = 1 fly + 4 board + 2 debarkh = 2 fly + 4 debark + 2 board

Travel – Backtracking

A B

1 2

R

g = 1 fly + 4 board + 3 debarkh = 2 fly + 3 debark + 2 board

Fly 1-2-B Then teleport passengers

Travel – Backtracking

A B

1 2

R

g = 1 fly + 6 board + 3 debarkh = 2 fly + 4 debark + 1 board

8 people: = 1296 1, 256, 6561, 390625 (1+0)8, (1+1)8, (1+2)38, (1+4)8

4423

Travel Calculations

4 planes located in 5 cities 54 = 625 plane assignments

4k passengers, located in 9 places 94k passenger assignments globally

Cheap subspace Product over each city (1 + city-local planes) (city-local passengers)

e.g., (1+2)4(1+1)4 = 1296 Stop exploring

Large evaluation Exhaustion of possibilities

Cost-based search exhausts cheap subspaces Eventually Assuming an upper bound on the heuristic

Outline

Inevitability of e-cost Traps Cycle Trap Branching Trap Travel Domain

If Cost is Bad, then what? Surrogate Search Simple First: Size Then: Cost-Sensitive Size-Based

Search

Surrogate Search

Replace ill-behaved Objective with a well-behaved Evaluation Tradeoff: Trap Defense versus Quality Focus

Evaluation Function: “Go no further” Force ε ~ 1

Make g and f grow fast enough: in o(size) Normalize costs for hybrid methods

Heuristic: “Go this way” Calculate h in the same units as g

Retain true Objective branch-and-bound duplicates elimination + re-expansion

Re-expansion of duplicates should be done carefully Can wait till future iterations, cache heuristics, use path-max, …

Size-based Search

Replace ill-behaved Objective with a well-behaved Evaluation Pure Size

Evaluation Function: “Go no further” Force ε = 1

Heuristic: “Go this way” Replace cost metric with size metric in relaxed

problem

Retain true Objective, for pruning Resolve heuristic with real objective branch-and-bound: gcost+hcost >= best-known-cost duplicates: new.gcost >= old.gcost Re-expand better cost paths discovered

Cost-sensitive Size-Based Heuristic

Replace ill-behaved Objective with a well-behaved Evaluation

Evaluation Function: “Go no further” Heuristic: “Go this way”

Estimate cheapest/best, but, calculate size sum/max/… propagation of real objective for

heuristic make minimization choices with respect to real objective

Last minute change: Recalculate value of minimization choices by surrogate

Retain true Objective, for pruning Calculate relaxed solution’s cost, also

Faster than totally resolving heuristic

branch-and-bound: gcost+hcost >= best-known-cost If heuristic is inadmissible, force it to be admissible eventually

Results – LAMA

LAMA Greedy best-first: bad plans (iterative) WA*: no plan, time out

LAMA-size Greedy best-first: same bad plans (iterative) WA*: direct plans, time out

Better cost! … but no rendezvous

Expected Result:Only one kind of objectCosts not widely varying

Portfolio approach possible

Results – SapaReplan

WA*-cost Weight 5: one bad plan, time out Weight 2: no plan, memory out

WA*-size Weight 1-2: better plans, memory out

Quality-sensitive evaluation function: cost+size

Conclusion

ε-cost traps are inevitable Typical: Large variation in cost Large cheap subspaces Upper-bounded heuristics Large plateaus in objective

Cost-based systematic approaches are susceptible Even with all kinds of search

enhancements: LAMA Because search depth is

“unbounded” by cost-based evaluation function

ε-1(h-error) ~ 0-1(h-error) That is, search depth is bounded

only by duplicate checking

Force good behavior: Evaluation ≠ Objective Force ε~1 Quality Focus versus Trap

Defense Simplest surrogate:

Size-based Search Force ε=1 Performs surprisingly well Despite total lack of Quality

Focus Easy variation:

Cost-sensitive Size-based Heuristic

Still force ε=1 Recalculate heuristic by

surrogate Performs yet better

Conclusion (Polemic)

Lessons best learnt and then forgotten: goto is how computers work efficiently A* is how search works efficiently

Both are indispensible Both are best-possible

In just the right context Both are fragile

If the context changes

Go enthusiasts: joseki

If size doesn’t work…

Speed Everything Up Reduce All Memory Consumption Improve anytime approach: Iterated, Portfolio, Multi-

Queue Guess (search over) upper bounds Decrease weights Delay duplicate detection Delay re-expansion Delay heuristic computation Exploit external memory Use symbolic methods Learn better heuristics: from search, from inference Precompute/Memoize anything slow: the heuristic Impose hierarchy (state/task abstraction) Accept knowledge (LTL) Use more hardware: (multi-)core/processor/computer,

GPU

Related Work: The Best Approach?

The Best Surrogate? The Best Approach Over All?

Improve Exploitation (Dynamic) Heuristic Weighting (Pohl, Thayer+Ruml) Real-time A* (Korf) Beam search (Zhou) Quality-sensitive probing/lookahead (Benton et al, PROBE)

Improve Exploration Path-max, A** (Dechter+Pearl) Multi-queue approaches (Thayer+Ruml, Richter+Westphal, Helmert) Iterated search (Richter+Westphal) Portfolio methods (Rintanen, Streeter) Breadth-first search [as a serious contender] (Edelkamp)

Directly Address Heuristic Error h_cea, h_ff, h_lama, h_vhpop, h_lpg, h_crikey, h_sapa, … Pattern Databases (Culbertson+Schaeffer, Edelkamp) Limited Discrepancy Search (Ginsberg) Negative Result: “How Good is Almost Perfect?” (Helmert+Röger)

`See’ the Structure (remove the traps) Factored Planning (Brafman+Domshalak) Direct Symmetry Reductions (Korf, Long+Fox) Symbolic Methods, Indirect Symmetry Reduction (Edelkamp)

Related Fields Reinforcement Learning: Exploration/Exploitation Markov Decision Processes: Off-policy/On-policy

Reward Shaping, Potential Field Methods (Path-search) Prioritized Value Iteration

Decision Theory: Heuristic Errors “Decision-Theoretic Search” (?) k-armed Bandit Problems (UCB)

Game-tree Search: Traps, Huge Spaces Without traps, game-tree pathology (Pearl) Upper Confidence Bounds on Trees (UCT) Quiescent Search Proof-number search (Allis?)

Machine Learning: Really Huge Spaces Surrogate Loss Functions Continuous/Differentiable relaxations of 0/1

Probabilistic Reasoning: Extreme Values are Dangerous that 0/1 is bad is well known but also ε is numerically unstable

What isn’t closely related?

Typical Puzzles: Rubik’s Cube, Sliding Tiles, … Prove Optimality/Small Problems Tightly Bounded Memory: IDDFS, IDA*, SMA* Unbounded Memory, but:

Delayed/Relaxed Duplicate Detection (Zhou, Korf) External Memory (Edelkamp, Korf)

More than one problem: D*, D*-Lite, Lifelong Planning A* (Koenig) Case-based planning Learned heuristics

State-space isn’t a blackbox: Bidirectional/Perimeter Search Randomly expanding trees for continuous path planning in low dimensions Waypoint/abstraction methods Any-angle path planning (Koenig)

State-space is far from a blackbox: Explanation Based Learning Theorem Proving (Clause/Constraint Learning) Forward Checking (Unit Propagation)

Planning isn’t (only) State-space search (Kambhampati) Engineering:

Subroutine speedup via Precomputation/Memoization Python vs C Priority Queue implementation (bucket heaps!)

Quotes

“… if in some problem instance we were to allow B to skip even one node that is expanded by A, one could immediately present an infinite set of instances when B grossly outperforms A. (This is normally done by appending to the node skipped a variety of trees with negligible costs and very low h.)” Rina Dechter, Judea Pearl

“I strongly advise that you do not make road movement free (zero-cost). This confuses pathfinding algorithms such as A*, …” Amit Patel

“Then we could choose an somewhat larger than the one defined by (3). The algorithm would no longer be admissible, but it might be more desirable, from a heuristic point of view, than any admissible algorithm.” Peter Hart, Nils Nilsson, Bertram Raphael

Roughly: `… inordinate amount of time selecting among equally meritorious options’ – Ira Pohl

h