Monte-Carlo Search AlgorithmsDaniel Bjorge and John Schaeffer
Problem
Important stuff
Where it needs to
be recognize
d
Solution
Search Algorithms
Which isfaster?
Best early prediction?
Best β parameter?
ComparingSearch Algorithms
Option 1: Computer Go
State expansion
SlowTermination detection
Slower
Heuristics
Lassú
Option 2: Artificial Trees
State expansion
FastTermination detection
Faster
Heuristics
Fast
010110
010110
010110
010110
010110
010110
010110
010110
010110
Existing FrameworksP-Game
Kocsis2006Artificial
Fast
Simple
FuegoEnzenberger and
Müller
2009
Generic
FastScalabl
e
Existing FrameworksP-Game
Kocsis2006
Small Scale
FuegoEnzenberger and
Müller
2009
Complex
Only Go Implement
ed
Gomba Search Framework(Go Multi-Armed Bandit Analysis)
Fast
Scalable
Multiple Metrics
Tweakable Trees
Swappable Search Strategies
Finom a Paprikásban!
Eager Game Tree Generation
Tree Generat
orMemory
Search Algorith
m
(Small trees)
Eager Game Tree Generation
Tree Generato
r Memory
(Big trees)
Lazy Game Tree Generation
Tree Generat
or
Memory
Search Algorith
m
(Any trees)
Minimax Values
7 0 -4 9 6 3 4 -91
Maximizing
Minimizing
Minimax Values
4
7 9 4
7 0 1 -4 9 6 3 4 -9
Maximizing
Minimizing
4
7 9 4
7 0 1 -4 9 6 3 4 -9
Minimax Values
O (bd /2 )Maximizing
Minimizing
Go:
Downward Minimax Evaluation
4
7 9 4
7 0 1 4 9 6 3 4 0
15
Example
5 0
0Maximizing
Minimizing
Difficulty
Minimax
Optimal
White to play
Easier
15
Quantified Difficulty
5 0
0Maximizing
Minimizing
Application to Search Algorithms
Many-Armed Bandit Solutions
10837...
0100...
1419-27-13...
1222...
-12-4-2-8…
UCTUpper Confidence Bound for Trees
Opti
mal arm
choic
e r
ate
(Kocsis and Szepesvári, 2006)
UCT-FPUUCT with First Play Urgency
Opti
mal arm
choic
e r
ate
(Hartland et al., 2006)
Infinitely-Armed Bandit Solutions
10837...
-12-4-2-8…
…
∞
Infinite-Armed Bandit SolutionsAdapting to Trees
…for treesK-
Failure
UCT-V ()UCB-V ()
UCT-V () AIRUCB-V () AIR
(Wang, Audibert, and Munos, 2006)
K-Failure
Random
UCT-FPU
1-Failure
3-Failure
10-Failure
Opti
mal arm
choic
es
(out
of
2000)
UCT-V ()
… … … … … … … … …
UCT-V () with AIR
… … … … … … … … …
UCT-V () with AIR: Gomba
Opti
mal arm
choic
es
(out
of
1000)
In Fuego
UCT UCT-V UCT-V (∞), β=.1
UCT-V (∞), β=.2
62
64
66
68
70
72
74
Fuego vs GnuGo, 40s moves
Upper 95% Con-fidence BoundLower 95% Con-fidence Bound
Search Algorithm
Win
% v
s G
nu
Go
Wrapping Up
Gomba• Fast• Scalable• Reasonably accurate• Simple• InterchangeableInfinite Bandit-Based Solutions• Improvement
Questions?
Köszönjük szépen!
Advisors• Kocsis Levente• Sárközy Gábor• Stanley Selkow
SZTAKI Info Lab• Benczúr András• Recski Gábor• Zséder Attila• Kornai András• Erdélyi Miklós
Fuego Development Team
Top Related