Monte-Carlo Search Algorithms
-
Author
dennis-leach -
Category
Documents
-
view
27 -
download
0
Embed Size (px)
description
Transcript of Monte-Carlo Search Algorithms

Monte-Carlo Search AlgorithmsDaniel Bjorge and John Schaeffer

Problem
Important stuff
Where it needs to
be recognize
d

Solution
Search Algorithms
Which isfaster?
Best early prediction?
Best β parameter?

ComparingSearch Algorithms

Option 1: Computer Go
State expansion
SlowTermination detection
Slower
Heuristics
Lassú

Option 2: Artificial Trees
State expansion
FastTermination detection
Faster
Heuristics
Fast
010110
010110
010110
010110
010110
010110
010110
010110
010110

Existing FrameworksP-Game
Kocsis2006Artificial
Fast
Simple
FuegoEnzenberger and
Müller
2009
Generic
FastScalabl
e

Existing FrameworksP-Game
Kocsis2006
Small Scale
FuegoEnzenberger and
Müller
2009
Complex
Only Go Implement
ed

Gomba Search Framework(Go Multi-Armed Bandit Analysis)
Fast
Scalable
Multiple Metrics
Tweakable Trees
Swappable Search Strategies
Finom a Paprikásban!

Eager Game Tree Generation
Tree Generat
orMemory
Search Algorith
m
(Small trees)

Eager Game Tree Generation
Tree Generato
r Memory
(Big trees)

Lazy Game Tree Generation
Tree Generat
or
Memory
Search Algorith
m
(Any trees)

Minimax Values
7 0 -4 9 6 3 4 -91
Maximizing
Minimizing

Minimax Values
4
7 9 4
7 0 1 -4 9 6 3 4 -9
Maximizing
Minimizing

4
7 9 4
7 0 1 -4 9 6 3 4 -9
Minimax Values
O (bd /2 )Maximizing
Minimizing
Go:

Downward Minimax Evaluation
4
7 9 4
7 0 1 4 9 6 3 4 0

15
Example
5 0
0Maximizing
Minimizing

Difficulty
Minimax
Optimal
White to play
Easier

15
Quantified Difficulty
5 0
0Maximizing
Minimizing

Application to Search Algorithms

Many-Armed Bandit Solutions
10837...
0100...
1419-27-13...
1222...
-12-4-2-8…

UCTUpper Confidence Bound for Trees
Opti
mal arm
choic
e r
ate
(Kocsis and Szepesvári, 2006)

UCT-FPUUCT with First Play Urgency
Opti
mal arm
choic
e r
ate
(Hartland et al., 2006)

Infinitely-Armed Bandit Solutions
10837...
-12-4-2-8…
…
∞

Infinite-Armed Bandit SolutionsAdapting to Trees
…for treesK-
Failure
UCT-V ()UCB-V ()
UCT-V () AIRUCB-V () AIR
(Wang, Audibert, and Munos, 2006)

K-Failure
Random
UCT-FPU
1-Failure
3-Failure
10-Failure
Opti
mal arm
choic
es
(out
of
2000)

UCT-V ()
… … … … … … … … …

UCT-V () with AIR
… … … … … … … … …

UCT-V () with AIR: Gomba
Opti
mal arm
choic
es
(out
of
1000)

In Fuego
UCT UCT-V UCT-V (∞), β=.1
UCT-V (∞), β=.2
62
64
66
68
70
72
74
Fuego vs GnuGo, 40s moves
Upper 95% Con-fidence BoundLower 95% Con-fidence Bound
Search Algorithm
Win
% v
s G
nu
Go

Wrapping Up
Gomba• Fast• Scalable• Reasonably accurate• Simple• InterchangeableInfinite Bandit-Based Solutions• Improvement

Questions?

Köszönjük szépen!
Advisors• Kocsis Levente• Sárközy Gábor• Stanley Selkow
SZTAKI Info Lab• Benczúr András• Recski Gábor• Zséder Attila• Kornai András• Erdélyi Miklós
Fuego Development Team