Monte-Carlo Search Algorithms

33
Monte-Carlo Search Algorithms Daniel Bjorge and John Schaeffer

description

Monte-Carlo Search Algorithms. Daniel Bjorge and John Schaeffer. Problem. Where it needs to be recognized. Important stuff. Solution. Best early prediction?. Which is faster?. Search Algorithms. Best β parameter?. Comparing Search Algorithms. Option 1: Computer Go. - PowerPoint PPT Presentation

Transcript of Monte-Carlo Search Algorithms

Page 1: Monte-Carlo Search Algorithms

Monte-Carlo Search AlgorithmsDaniel Bjorge and John Schaeffer

Page 2: Monte-Carlo Search Algorithms

Problem

Important stuff

Where it needs to

be recognize

d

Page 3: Monte-Carlo Search Algorithms

Solution

Search Algorithms

Which isfaster?

Best early prediction?

Best β parameter?

Page 4: Monte-Carlo Search Algorithms

ComparingSearch Algorithms

Page 5: Monte-Carlo Search Algorithms

Option 1: Computer Go

State expansion

SlowTermination detection

Slower

Heuristics

Lassú

Page 6: Monte-Carlo Search Algorithms

Option 2: Artificial Trees

State expansion

FastTermination detection

Faster

Heuristics

Fast

010110

010110

010110

010110

010110

010110

010110

010110

010110

Page 7: Monte-Carlo Search Algorithms

Existing FrameworksP-Game

Kocsis2006Artificial

Fast

Simple

FuegoEnzenberger and

Müller

2009

Generic

FastScalabl

e

Page 8: Monte-Carlo Search Algorithms

Existing FrameworksP-Game

Kocsis2006

Small Scale

FuegoEnzenberger and

Müller

2009

Complex

Only Go Implement

ed

Page 9: Monte-Carlo Search Algorithms

Gomba Search Framework(Go Multi-Armed Bandit Analysis)

Fast

Scalable

Multiple Metrics

Tweakable Trees

Swappable Search Strategies

Finom a Paprikásban!

Page 10: Monte-Carlo Search Algorithms

Eager Game Tree Generation

Tree Generat

orMemory

Search Algorith

m

(Small trees)

Page 11: Monte-Carlo Search Algorithms

Eager Game Tree Generation

Tree Generato

r Memory

(Big trees)

Page 12: Monte-Carlo Search Algorithms

Lazy Game Tree Generation

Tree Generat

or

Memory

Search Algorith

m

(Any trees)

Page 13: Monte-Carlo Search Algorithms

Minimax Values

7 0 -4 9 6 3 4 -91

Maximizing

Minimizing

Page 14: Monte-Carlo Search Algorithms

Minimax Values

4

7 9 4

7 0 1 -4 9 6 3 4 -9

Maximizing

Minimizing

Page 15: Monte-Carlo Search Algorithms

4

7 9 4

7 0 1 -4 9 6 3 4 -9

Minimax Values

O (bd /2 )Maximizing

Minimizing

Go:

Page 16: Monte-Carlo Search Algorithms

Downward Minimax Evaluation

4

7 9 4

7 0 1 4 9 6 3 4 0

Page 17: Monte-Carlo Search Algorithms

15

Example

5 0

0Maximizing

Minimizing

Page 18: Monte-Carlo Search Algorithms

Difficulty

Minimax

Optimal

White to play

Easier

Page 19: Monte-Carlo Search Algorithms

15

Quantified Difficulty

5 0

0Maximizing

Minimizing

Page 20: Monte-Carlo Search Algorithms

Application to Search Algorithms

Page 21: Monte-Carlo Search Algorithms

Many-Armed Bandit Solutions

10837...

0100...

1419-27-13...

1222...

-12-4-2-8…

Page 22: Monte-Carlo Search Algorithms

UCTUpper Confidence Bound for Trees

Opti

mal arm

choic

e r

ate

(Kocsis and Szepesvári, 2006)

Page 23: Monte-Carlo Search Algorithms

UCT-FPUUCT with First Play Urgency

Opti

mal arm

choic

e r

ate

(Hartland et al., 2006)

Page 24: Monte-Carlo Search Algorithms

Infinitely-Armed Bandit Solutions

10837...

-12-4-2-8…

Page 25: Monte-Carlo Search Algorithms

Infinite-Armed Bandit SolutionsAdapting to Trees

…for treesK-

Failure

UCT-V ()UCB-V ()

UCT-V () AIRUCB-V () AIR

(Wang, Audibert, and Munos, 2006)

Page 26: Monte-Carlo Search Algorithms

K-Failure

Random

UCT-FPU

1-Failure

3-Failure

10-Failure

Opti

mal arm

choic

es

(out

of

2000)

Page 27: Monte-Carlo Search Algorithms

UCT-V ()

… … … … … … … … …

Page 28: Monte-Carlo Search Algorithms

UCT-V () with AIR

… … … … … … … … …

Page 29: Monte-Carlo Search Algorithms

UCT-V () with AIR: Gomba

Opti

mal arm

choic

es

(out

of

1000)

Page 30: Monte-Carlo Search Algorithms

In Fuego

UCT UCT-V UCT-V (∞), β=.1

UCT-V (∞), β=.2

62

64

66

68

70

72

74

Fuego vs GnuGo, 40s moves

Upper 95% Con-fidence BoundLower 95% Con-fidence Bound

Search Algorithm

Win

% v

s G

nu

Go

Page 31: Monte-Carlo Search Algorithms

Wrapping Up

Gomba• Fast• Scalable• Reasonably accurate• Simple• InterchangeableInfinite Bandit-Based Solutions• Improvement

Page 32: Monte-Carlo Search Algorithms

Questions?

Page 33: Monte-Carlo Search Algorithms

Köszönjük szépen!

Advisors• Kocsis Levente• Sárközy Gábor• Stanley Selkow

SZTAKI Info Lab• Benczúr András• Recski Gábor• Zséder Attila• Kornai András• Erdélyi Miklós

Fuego Development Team