Monte-Carlo Search Algorithms

Monte-Carlo Search AlgorithmsDaniel Bjorge and John Schaeffer

Problem

Important stuff

Where it needs to

be recognize

d

Solution

Search Algorithms

Which isfaster?

Best early prediction?

Best β parameter?

ComparingSearch Algorithms

Option 1: Computer Go

State expansion

SlowTermination detection

Slower

Heuristics

Lassú

Option 2: Artificial Trees

State expansion

FastTermination detection

Faster

Heuristics

Fast

010110

010110

010110

010110

010110

010110

010110

010110

010110

Existing FrameworksP-Game

Kocsis2006Artificial

Fast

Simple

FuegoEnzenberger and

Müller

2009

Generic

FastScalabl

e

Existing FrameworksP-Game

Kocsis2006

Small Scale

FuegoEnzenberger and

Müller

2009

Complex

Only Go Implement

ed

Gomba Search Framework(Go Multi-Armed Bandit Analysis)

Fast

Scalable

Multiple Metrics

Tweakable Trees

Swappable Search Strategies

Finom a Paprikásban!

Eager Game Tree Generation

Tree Generat

orMemory

Search Algorith

m

(Small trees)

Eager Game Tree Generation

Tree Generato

r Memory

(Big trees)

Lazy Game Tree Generation

Tree Generat

or

Memory

Search Algorith

m

(Any trees)

Minimax Values

7 0 -4 9 6 3 4 -91

Maximizing

Minimizing

Minimax Values

4

7 9 4

7 0 1 -4 9 6 3 4 -9

Maximizing

Minimizing

4

7 9 4

7 0 1 -4 9 6 3 4 -9

Minimax Values

O (bd /2 )Maximizing

Minimizing

Go:

Downward Minimax Evaluation

4

7 9 4

7 0 1 4 9 6 3 4 0

15

Example

5 0

0Maximizing

Minimizing

Difficulty

Minimax

Optimal

White to play

Easier

15

Quantified Difficulty

5 0

0Maximizing

Minimizing

Application to Search Algorithms

Many-Armed Bandit Solutions

10837...

0100...

1419-27-13...

1222...

-12-4-2-8…

UCTUpper Confidence Bound for Trees

Opti

mal arm

choic

e r

ate

(Kocsis and Szepesvári, 2006)

UCT-FPUUCT with First Play Urgency

Opti

mal arm

choic

e r

ate

(Hartland et al., 2006)

Infinitely-Armed Bandit Solutions

10837...

-12-4-2-8…

…

∞

Infinite-Armed Bandit SolutionsAdapting to Trees

…for treesK-

Failure

UCT-V ()UCB-V ()

UCT-V () AIRUCB-V () AIR

(Wang, Audibert, and Munos, 2006)

K-Failure

Random

UCT-FPU

1-Failure

3-Failure

10-Failure

Opti

mal arm

choic

es

(out

of

2000)

UCT-V ()

… … … … … … … … …

UCT-V () with AIR

… … … … … … … … …

UCT-V () with AIR: Gomba

Opti

mal arm

choic

es

(out

of

1000)

In Fuego

UCT UCT-V UCT-V (∞), β=.1

UCT-V (∞), β=.2

62

64

66

68

70

72

74

Fuego vs GnuGo, 40s moves

Upper 95% Con-fidence BoundLower 95% Con-fidence Bound

Search Algorithm

Win

% v

s G

nu

Go

Wrapping Up

Gomba• Fast• Scalable• Reasonably accurate• Simple• InterchangeableInfinite Bandit-Based Solutions• Improvement

Questions?

Köszönjük szépen!

Advisors• Kocsis Levente• Sárközy Gábor• Stanley Selkow

SZTAKI Info Lab• Benczúr András• Recski Gábor• Zséder Attila• Kornai András• Erdélyi Miklós

Fuego Development Team

Monte-Carlo Search Algorithms

Documents

Transcript of Monte-Carlo Search Algorithms