Post on 30-Dec-2015
description
Monte-Carlo Search AlgorithmsDaniel Bjorge and John Schaeffer
Problem
Important stuff
Where it needs to
be recognize
d
Solution
Search Algorithms
Which isfaster?
Best early prediction?
Best β parameter?
ComparingSearch Algorithms
Option 1: Computer Go
State expansion
SlowTermination detection
Slower
Heuristics
Lassú
Option 2: Artificial Trees
State expansion
FastTermination detection
Faster
Heuristics
Fast
010110
010110
010110
010110
010110
010110
010110
010110
010110
Existing FrameworksP-Game
Kocsis2006Artificial
Fast
Simple
FuegoEnzenberger and
Müller
2009
Generic
FastScalabl
e
Existing FrameworksP-Game
Kocsis2006
Small Scale
FuegoEnzenberger and
Müller
2009
Complex
Only Go Implement
ed
Gomba Search Framework(Go Multi-Armed Bandit Analysis)
Fast
Scalable
Multiple Metrics
Tweakable Trees
Swappable Search Strategies
Finom a Paprikásban!
Eager Game Tree Generation
Tree Generat
orMemory
Search Algorith
m
(Small trees)
Eager Game Tree Generation
Tree Generato
r Memory
(Big trees)
Lazy Game Tree Generation
Tree Generat
or
Memory
Search Algorith
m
(Any trees)
Minimax Values
7 0 -4 9 6 3 4 -91
Maximizing
Minimizing
Minimax Values
4
7 9 4
7 0 1 -4 9 6 3 4 -9
Maximizing
Minimizing
4
7 9 4
7 0 1 -4 9 6 3 4 -9
Minimax Values
O (bd /2 )Maximizing
Minimizing
Go:
Downward Minimax Evaluation
4
7 9 4
7 0 1 4 9 6 3 4 0
15
Example
5 0
0Maximizing
Minimizing
Difficulty
Minimax
Optimal
White to play
Easier
15
Quantified Difficulty
5 0
0Maximizing
Minimizing
Application to Search Algorithms
Many-Armed Bandit Solutions
10837...
0100...
1419-27-13...
1222...
-12-4-2-8…
UCTUpper Confidence Bound for Trees
Opti
mal arm
choic
e r
ate
(Kocsis and Szepesvári, 2006)
UCT-FPUUCT with First Play Urgency
Opti
mal arm
choic
e r
ate
(Hartland et al., 2006)
Infinitely-Armed Bandit Solutions
10837...
-12-4-2-8…
…
∞
Infinite-Armed Bandit SolutionsAdapting to Trees
…for treesK-
Failure
UCT-V ()UCB-V ()
UCT-V () AIRUCB-V () AIR
(Wang, Audibert, and Munos, 2006)
K-Failure
Random
UCT-FPU
1-Failure
3-Failure
10-Failure
Opti
mal arm
choic
es
(out
of
2000)
UCT-V ()
… … … … … … … … …
UCT-V () with AIR
… … … … … … … … …
UCT-V () with AIR: Gomba
Opti
mal arm
choic
es
(out
of
1000)
In Fuego
UCT UCT-V UCT-V (∞), β=.1
UCT-V (∞), β=.2
62
64
66
68
70
72
74
Fuego vs GnuGo, 40s moves
Upper 95% Con-fidence BoundLower 95% Con-fidence Bound
Search Algorithm
Win
% v
s G
nu
Go
Wrapping Up
Gomba• Fast• Scalable• Reasonably accurate• Simple• InterchangeableInfinite Bandit-Based Solutions• Improvement
Questions?
Köszönjük szépen!
Advisors• Kocsis Levente• Sárközy Gábor• Stanley Selkow
SZTAKI Info Lab• Benczúr András• Recski Gábor• Zséder Attila• Kornai András• Erdélyi Miklós
Fuego Development Team