Last update: March 9, 2010 - University Of Marylandnau/cmsc421/chapter06.pdf · 1997 (Hsu et al):...
Transcript of Last update: March 9, 2010 - University Of Marylandnau/cmsc421/chapter06.pdf · 1997 (Hsu et al):...
Finite perfect-information zero-sum games
Finite: finitely many agents, actions, states
Perfect information: every agent knows the current state,all of the actions, and what they doNo simultaneous actions – players move one-at-a-time
Constant-sum: regardless of how the game ends, Σ{agents’ utilities} = k.For every such game, there’s an equivalent game in which (k = 0).Thus constant-sum games usually are called zero-sum games
Examples:
Deterministic: chess, checkers, go, othello (reversi), connect-four,
qubic, mancala (awari, kalah), 9 men’s morris (merelles, morels, mill)
Stochastic: backgammon, monopoly, yahtzee, parcheesi, roulette, craps
We’ll start with deterministic games
CMSC 421, Chapter 6 2
Outline
♦ A brief history of work on this topic
♦ The minimax theorem
♦ Game trees
♦ The minimax algorithm
♦ α-β pruning
♦ Resource limits and approximate evaluation
CMSC 421, Chapter 6 3
A brief history
1846 (Babbage): machine to play tic-tac-toe
1928 (von Neumann): minimax theorem
1944 (von Neumann & Morgenstern): backward-induction algorithm(produces perfect play)
1950 (Shannon): minimax algorithm (finite horizon, approximateevaluation)
1951 (Turing): program (on paper) for playing chess
1952–7 (Samuel): checkers program, capable of beating its creator
1956 (McCarthy): pruning to allow deeper search
1957 (Bernstein): first complete chess program, on an IBM 704 vacuum-tube computer, could examine about 350 positions/minute
CMSC 421, Chapter 6 4
A brief history, continued
1967 (Greenblatt): first program to compete in human chess tournaments:3 wins, 3 draws, 12 losses
1992 (Schaeffer): Chinook won the 1992 US Open checkers tournament
1994 (Schaeffer): Chinook became world checkers champion;Tinsley (human champion) withdrew for health reasons
1997 (Hsu et al): Deep Blue won 6-game chess match againstworld chess champion Gary Kasparov
2007 (Schaeffer et al, 2007): Checkers solved: with perfect play,it’s a draw. This took 1014 calculations over 18 years
CMSC 421, Chapter 6 5
Basics
♦ A strategy specifies what an agent will do in every possible situation♦ Strategies may be pure (deterministic) or mixed (probabilistic)
Suppose agents A and B use strategies s and t to play a two-person zero-sumgame G. Then
• A’s expected utility is UA(s, t)From now on, we’ll just call this U(s, t)
• Since G is zero-sum, UB(s, t) = −U(s, t)
Instead of A and B, we’ll call the agents Max and Min
Max wants to maximize U and Min wants to minimize it
CMSC 421, Chapter 6 6
The Minimax Theorem (von Neumann, 1928)
Minimax theorem: Let G be a two-person finite zero-sum game withplayers Max and Min. Then there are strategies s∗ and t∗, and a number VGcalled G’s minimax value, such that
• If Min uses t∗, Max’s expected utility is ≤ VG, i.e., maxsU(s, t∗) = VG
• If Max uses s∗, Max’s expected utility is ≥ VG, i.e., mintU(s∗, t) = VG
Corollary 1: U(s∗, t∗) = VG.
Corollary 2: If G is a perfect-information game,then there are pure strategies s∗ and t∗ that satisfy the theorem.
CMSC 421, Chapter 6 7
Game trees
XXXX
XX
X
XX
MAX (X)
MIN (O)
X X
O
OOX O
OO O
O OO
MAX (X)
X OX OX O XX X
XX
X X
MIN (O)
X O X X O X X O X
. . . . . . . . . . . .
. . .
. . .
. . .
TERMINALXX
−1 0 +1Utility
The name game tree comes from AI.Mathematical game theorists callthis the extensive form of a game.
Root node ⇔ the initial state
Children of a node ⇔ the states aplayer can move to
CMSC 421, Chapter 6 8
Strategies on game trees
XXXX
XX
X
XX
MAX (X)
MIN (O)
X X
O
OOX O
OO O
O OO
MAX (X)
X OX OX O XX X
XX
X X
MIN (O)
X O X X O X X O X
. . . . . . . . . . . .
. . .
. . .
. . .TERMINAL
XX−1 0 +1Utility
To construct a pure strategy for Max:
• At each node where it’s Max’smove, choose one branch
• At each node where it’s Min’smove, include all branches
Let b = the branching factor (max. number of children of any node)h = the tree’s height (max. depth of any node)
The number of pure strategies for Max ≤ bdh/2e,with equality if every node of height < h node has b children
CMSC 421, Chapter 6 9
Strategies on game trees
XXXX
XX
X
XX
MAX (X)
MIN (O)
X X
O
OOX O
OO O
O OO
MAX (X)
X OX OX O XX X
XX
X X
MIN (O)
X O X X O X X O X
. . . . . . . . . . . .
. . .
. . .
. . .TERMINAL
XX−1 0 +1Utility
To construct a pure strategy for Min:
• At each node where it’s Min’smove, choose one branch
• At each node where it’s Max’smove, include all branches
The number of pure strategies for Min ≤ bdh/2e
with equality if every node of height < h node has b children
CMSC 421, Chapter 6 10
Finding the best strategy
Brute-force way to find Max’s and Min’s best strategies:
Construct the sets S and T of all of Max’s and Min’s pure strategies,then choose
s∗ = arg maxs∈S
mint∈T
UMax(s, t)
t∗ = arg mint∈T
maxs∈S
UMax(s, t)
Complexity analysis:
• Need to construct and store O(bh/2 + bh/2) = O(bh/2) strategies• Each strategy is a tree that has O(bh/2) nodes• Thus space complexity is O(bh/2bh/2) = O(bh)
• Time complexity is slightly worse
But there’s an easier way to find the strategies
CMSC 421, Chapter 6 11
Minimax Algorithm
Compute a game’s minimax value recursively from the minimax values of itssubgames:
MAX
3 12 8 642 14 5 2
MIN
3A 1 A 3A 2
A 13A 12A 11 A 21 A 23A 22 A 33A 32A 31
3 2 2
function Minimax(s) returns a utility value
if s is a terminal state then return Max’s payoff at s
else if it is Max’s move in s then
return max{Minimax(result(a, s)) : a is applicable to s}else return min{Minimax(result(a, s)) : a is applicable to s}
To get the next action, return argmax and argmin instead of max and min
CMSC 421, Chapter 6 12
Properties of the minimax algorithm
Is it sound? I.e., when it returns answers, are they correct?
CMSC 421, Chapter 6 13
Properties of the minimax algorithm
Is it sound? I.e., when it returns answers, are they correct?Yes (can prove this by induction)
Is it complete? I.e., does it always return an answer when one exists?
CMSC 421, Chapter 6 14
Properties of the minimax algorithm
Is it sound? I.e., when it returns answers, are they correct?Yes (can prove this by induction)
Is it complete? I.e., does it always return an answer when one exists?Yes on finite trees (e.g., chess has specific rules for this).
Space complexity?
CMSC 421, Chapter 6 15
Properties of the minimax algorithm
Is it sound? I.e., when it returns answers, are they correct?Yes (can prove this by induction)
Is it complete? I.e., does it always return an answer when one exists?Yes on finite trees (e.g., chess has specific rules for this).
Space complexity? O(bh), where b and h are as defined earlier
Time complexity?
CMSC 421, Chapter 6 16
Properties of the minimax algorithm
Is it sound? I.e., when it returns answers, are they correct?Yes (can prove this by induction)
Is it complete? I.e., does it always return an answer when one exists?Yes on finite trees (e.g., chess has specific rules for this).
Space complexity? O(bh), where b and h are as defined earlier
Time complexity? O(bh)
For chess, b ≈ 35, h ≈ 100 for “reasonable” games35100 ≈ 10135 nodes
This is about 1055 times the number of particles in the universe (about 1087)⇒ no way to examine every node!
But do we really need to examine every node?
CMSC 421, Chapter 6 17
Pruning example 1
MAX
3 12 8
MIN 3
2
2
X X
3
Max will never move to this node, because Max can do better by moving tothe first one
Thus we don’t need to figure out this node’s minimax value
CMSC 421, Chapter 6 19
Pruning example 1
MAX
3 12 8
MIN 3
2
2
X X14
14
3
This node might be better than the first one
CMSC 421, Chapter 6 20
Pruning example 1
MAX
3 12 8
MIN 3
2
2
X X14
14
5
5
3
It still might be better than the first one
CMSC 421, Chapter 6 21
Pruning example 2
a
b
≥ 7
c
d
≤ 5
=5
e j
f i
• • •
= 7
• • •
• • •
Same idea works farther down in the tree
Max won’t move to e, because Max can do better by going to bDon’t need e’s exact value, because it won’t change minimax(a)So stop searching below e
CMSC 421, Chapter 6 23
Alpha-beta pruning
Start a minimax search at node c
Let α = biggest lower bound on any ancestor of f
α = max(−2, 4, 0) = 4 in the example
If the game reaches f , Max will get utility ≤ 3
To reach f , the game must go through d
But if the game reaches d, Max can getutility ≥ 4 by moving off of the path to f
So the game will never reach f
We can stop trying to compute u∗(f ),because it can’t affect u∗(c)
This is called an alpha cutoff
α = 4 0
≥ 0 e
≥ –2c
f
3
–2 d
4
≥ 4
≤ 3
CMSC 421, Chapter 6 24
Alpha-beta pruning
Start a minimax search at node a
Let β = smallest upper bound on any ancestor of d
β = min(5,−2, 3) = −2 in the example
If the game reaches d, Max will get utility ≥ 0
To reach d, the game must go through b
But if the game reaches b, Min can make Max’sutility ≤ −2 by moving off of the path to d
So the game will never reach d
We can stop trying to compute u∗(d),because it can’t affect u∗(a)
This is called a beta cutoff
β = –2 3
≤ 3 c
≤ 5a
d
0
5 b
–2
≤ –2
≥ 0
CMSC 421, Chapter 6 25
The alpha-beta algorithm
function Alpha-Beta(s,α,β) returns a utility value
inputs: s, current state in game
α, the value of the best alternative for max along the path to s
β, the value of the best alternative for min along the path to s
if s is a terminal state then return Max’s payoff at s
else if it is Max’s move at s then
v←−∞for every action a applicable to s do
v←max(v ,Alpha-Beta(result(a, s), α, β))
if v ≥ β then return v
α←max(α, v)
else
v←∞for every action a applicable to s do
v←min(v ,Alpha-Beta(result(a, s), α, β))
if v ≤ α then return v
β←min(β, v)
return v
CMSC 421, Chapter 6 26
α-β pruning example
a
b c
d m
e j
f i k l
g h
• • •
• • •
α = –∞β = ∞
α = –∞β = ∞
CMSC 421, Chapter 6 27
α-β pruning example
a
b c
d m
e j
f i k l
g h
• • •
• • •
α = –∞β = ∞
α = –∞β = ∞
7
X 7
7
CMSC 421, Chapter 6 28
α-β pruning example
a
b c
d
e
f
g
• • •
7
α = –∞β = ∞
α = 7β = ∞
α = 7β = ∞
α = 7β = ∞
α = 7β = ∞
X 7
α = –∞β = ∞
m
j
i k l
• • •
7
h
α = 7β = ∞
CMSC 421, Chapter 6 29
α-β pruning example
a
b c
d
5
5 -3
e
f
g h
• • •
7
α = –∞β = ∞
α = 7β = ∞
α = 7β = ∞
α = 7β = ∞
α = 7β = ∞
X 7
α = –∞β = ∞
m
j
i k l
• • •
7
α = 7β = ∞
CMSC 421, Chapter 6 30
α-β pruning example
a
b c
d
5
5
5 -3
e
f
g h
• • •
7
α = –∞β = ∞
α = 7β = ∞
α = 7β = ∞
α = 7β = ∞
α = 7β = ∞
X 7
α = –∞β = ∞
m
j
i k l
• • • alpha cutoff
5
7
α = 7β = ∞
CMSC 421, Chapter 6 31
α-β pruning example
a
b c
d
5
5
5 -3
e j
f i k
g h
• • •
7
α = –∞β = ∞
α = 7β = ∞
α = 7β = ∞
α = 7β = ∞
α = 7β = ∞
α = 7β = ∞
α = 7β = ∞
X 7
α = –∞β = ∞
alpha cutoff
m
• • •
l
5
7
α = 7β = ∞
α = 7β = ∞
CMSC 421, Chapter 6 32
α-β pruning example
a
b c
d
5
5
5 -3
e j
f i k l
g h
• • •
7
α = –∞β = ∞
0 8
8
α = 7β = ∞
α = 7β = ∞
α = 7β = ∞
α = 7β = ∞
α = 7β = ∞
α = 7β = ∞
X 8
X 7
α = 7β = 8
α = –∞β = ∞
alpha cutoff
m
• • •
5
8
7
α = 7β = ∞
α = 7β = ∞
CMSC 421, Chapter 6 33
α-β pruning example
a
b c
d
5
5
5 -3
e j
f i k l
g h
• • •
7
α = –∞β = ∞
α = 7β = ∞
α = 7β = ∞
α = 7β = ∞
α = 7β = ∞
α = 7β = ∞
α = 7β = ∞
X 8
X 7
α = 7β = 8
α = –∞β = ∞
alpha cutoff
beta cutoff 9
9
m
• • •
0 8
8
5
8
7
α = 7β = ∞
α = 7β = ∞
CMSC 421, Chapter 6 34
α-β pruning example
a
b c
d
5
5
5 -3
e j
f i k l
g h
• • •
7
α = –∞β = ∞
α = 7β = ∞
α = 7β = ∞
α = 7β = ∞
α = 7β = ∞
α = 7β = ∞
α = 7β = ∞
X 8
X 7
α = 7β = 8
X 8
α = –∞β = ∞
alpha cutoff
beta cutoff 9
9
0 8
8
5
8
X 8
7
m
• • •
α = 7β = ∞
α = 7β = ∞
X 8
CMSC 421, Chapter 6 35
α-β pruning example
a
b c
d
5
5
5 -3
e j
f i k l
g h
• • •
7
α = –∞β = ∞
α = 7β = ∞
α = 7β = ∞
α = 7β = ∞
α = 7β = ∞
α = 7β = ∞
α = 7β = ∞
X 8
X 7
α = 7β = 8
X 8
X 8α = –∞β = ∞
alpha cutoff
m
• • • • • •
m
α = 7β = 8
beta cutoff 9
9
0 8
8
5
8
X 8
7
α = 7β = ∞
α = 7β = ∞
CMSC 421, Chapter 6 36
Properties of α-β
α-β is a simple example of the value of reasoning about which computationsare relevant (a form of metareasoning)
♦ if α ≤ minimax(s) ≤ β, then alpha-beta returns minimax(s)♦ if minimax(s) ≤ α, then alpha-beta returns a value ≤ α♦ if minimax(s) ≥ β, then alpha-beta returns a value ≥ β
If we start with α = −∞ and β =∞, thenalpha-beta will always return minimax(s)
Good move ordering can enable us to prune more nodes.Best case is if♦ at nodes where it’s Max’s move, children are largest-value first♦ at nodes where it’s Min’s move, children are smallest-value first
In this case time complexity = O(bh/2) ⇒ doubles the solvable depth
Worst case is the reverseIn this case, α-β will search every node
CMSC 421, Chapter 6 37
Resource limits
Even with alpha-beta, it can still be infeasible to search the entire game tree(e.g., recall chess has about 10135 nodes)⇒ need to limit the depth of the search
Basic approach: let d be a positive integerWhenever we reach a node of depth > d• If we’re at a terminal state, then return Max’s payoff• Otherwise return an estimate of the node’s utility value,
computed by a static evaluation function
CMSC 421, Chapter 6 38
α-β with a bound d on the search depth
function Alpha-Beta(s,α,β,d) returns a utility value
inputs: s,α,β, same as before
d, an upper bound on the search depth
if s is a terminal state then return Max’s payoff at s
else if d = 0 then return Eval(s)
else if it is Max’s move at s then
v←−∞for every action a applicable to s do
v←max(v ,Alpha-Beta(result(a, s), α, β,d − 1))
if v ≥ β then return v
α←max(α, v )
else
v←∞for every action a applicable to s do
v←min(v ,Alpha-Beta(result(a, s), α, β,d − 1))
if v ≤ α then return v
β←min(α, v )
return v
CMSC 421, Chapter 6 39
Evaluation functions
Eval(s) is supposed to return an approximation of s’s minimax valueEval is often a weighted sum of features
Eval(s) = w1f1(s) + w2f2(s) + . . . + wnfn(s)
Black to move
White slightly better
White to move
Black winningE.g.,
1(number of white pawns− number of black pawns)+ 3(number of white knights− number of black knights)+ . . .
CMSC 421, Chapter 6 40
Exact values for Eval don’t matter
MIN
MAX
21
1
42
2
20
1
1 40020
20
Behavior is preserved under any monotonic transformation of Eval
Only the order matters:In deterministic games, payoff acts as an ordinal utility function
CMSC 421, Chapter 6 41
Discussion
Deeper lookahead (i.e., larger depth bound d) usually gives better decisions
Exceptions do existMain result in my PhD dissertation (30 years ago!):“pathological” games in which deeper lookahead gives worse decisions
But such games hardly ever occur in practice
Suppose we have 100 seconds, explore 104 nodes/second⇒ 106 ≈ 358/2 nodes per move⇒ α-β reaches depth 8 ⇒ pretty good chess program
Some modifications that can improve the accuracy or computation time:node ordering (see next slide)quiescence searchbiasingtransposition tablesthinking on the opponent’s time. . .
CMSC 421, Chapter 6 42
Node ordering
Recall that I said:
Best case is if♦ at nodes where it’s Max’s move, children are largest first♦ at nodes where it’s Min’s move, children are smallest first
In this case time complexity = O(bh/2) ⇒ doubles the solvable depth
Worst case is the reverse
How to get closer to the best case:♦ Every time you expand a state s, apply Eval to its children♦ When it’s Max’s move, sort the children in order of largest Eval first♦ When it’s Min’s move, sort the children in order of smallest Eval first
CMSC 421, Chapter 6 43
Quiescence search and biasing
In a game like checkers or chessThe evaluation is based greatly on material piecesIt’s likely to be inaccurate if there are pending captures
e.g., if someone is about to take your queen
♦ Search deeper to reach a position where there aren’t pending capturesEvaluations will be more accurate here
But it creates another problem
♦ You’re searching some paths to an even depth, others to an odd depth
♦ Paths that end just after your opponent’s movewill generally look worse than paths that end just after your move
♦ Add or subtract a number called the “biasing factor” to try to fix this
CMSC 421, Chapter 6 44
Transposition tables
Often there are multiple paths to the same state (i.e., the state space is areally graph rather than a tree)
Idea:♦ when you compute s’s minimax value, store it in a hash table♦ visit s again ⇒ retrieve its value rather than computing it again
The hash table is called a transposition table
Problem: far too many states to store all of them s
Store some of the states, rather than all of them
Try to store the ones that you’re most likely to need
CMSC 421, Chapter 6 45
Thinking on the opponent’s time
Current state s, children s1, . . . , sn
Compute their minimax values, move to the one that looks bestsay, si
You computed si’s minimax value as the minimum of the values of its chil-dren, si1, . . . , sim
Let sij be the one that has the smallest minimax valueThat’s where the opponent is most likely to move to
Do a minimax search below sij while waiting for the opponent to move
If your opponent moves to sij then you’ve already done a lot of the work offiguring out your next move
CMSC 421, Chapter 6 46
Game-tree search in practice
Checkers: Chinook ended 40-year-reign of human world champion MarionTinsley in 1994.
Checkers was solved in April 2007: from the standard starting position, bothplayers can guarantee a draw with perfect play. This took 1014 calculationsover 18 years. Checkers has a search space of size 5× 1020.
Chess: Deep Blue defeated human world champion Gary Kasparov in a six-game match in 1997. Deep Blue searches 200 million positions per second,uses very sophisticated evaluation, and undisclosed methods for extendingsome lines of search up to 40 ply.
Othello: human champions refuse to compete against computers, who aretoo good.
Go: until recently, human champions didn’t compete against computersbecause the computers were too bad. But that has changed . . .
CMSC 421, Chapter 6 47
Game-tree search in the game of go
b =2
b =3
b =4
A game tree’s size grows exponentially with both its depth and its branchingfactor
Go is much too big for a normal game-tree search:branching factor = about 200game length = about 250 to 300 movesnumber of paths in the game tree = 10525 to 10620
For comparison:Number of atoms in universe = about 1080
Number of particles in universe = about 1087
CMSC 421, Chapter 6 48
Game-tree search in the game of go
During the past couple years, go programs have gotten much better
Main reason: Monte Carlo roll-outs
Basic idea: do a minimax search of a randomly selected subtree
At each node that the algorithm visits,
♦ It randomly selects some of the childrenThere are some heuristics for deciding how many
♦ Calls itself recursively on these, ignores the others
CMSC 421, Chapter 6 49
Forward pruning in chess
Back in the 1970s, some similar ideas were tried in chess
The approach was called forward pruningMain difference: select the children heuristically rather than randomlyIt didn’t work as well as brute-force alpha-beta, so people abandoned it
Why does a similar idea work so much better in go?
CMSC 421, Chapter 6 50
Perfect-information nondeterministic games
Backgammon: chance is introduced by dice
1 2 3 4 5 6 7 8 9 10 11 12
24 23 22 21 20 19 18 17 16 15 14 13
0
25
CMSC 421, Chapter 6 51
Expectiminimax
MIN
MAX
2
CHANCE
4 7 4 6 0 5 −2
2 4 0 −2
0.5 0.5 0.5 0.5
3 −1
function ExpectiMinimax(s) returns an expected utility
if s is a terminal state then return Max’s payoff at s
if s is a “chance” node then
return Σs ′ P (s ′|s)ExpectiMinimax(s ′)
else if it is Max’s move at s then
return max{ExpectiMinimax(result(a, s)) : a is applicable to s}else return min{ExpectiMinimax(result(a, s)) : a is applicable to s}
This gives optimal play (i.e., highest expected utility)
CMSC 421, Chapter 6 52
With nondeterminism, exact values do matter
DICE
MIN
MAX
2 2 3 3 1 1 4 4
2 3 1 4
.9 .1 .9 .1
2.1 1.3
20 20 30 30 1 1 400 400
20 30 1 400
.9 .1 .9 .1
21 40.9
At “chance” nodes, we need to compute weighted averages
Behavior is preserved only by positive linear transformations of EvalHence Eval should be proportional to the expected payoff
CMSC 421, Chapter 6 53
In practice
Dice rolls increase b: 21 possible rolls with 2 diceGiven the dice roll, ≈ 20 legal moves on average
(for some dice roles, can be much higher)
depth 4 = 20× (21× 20)3 ≈ 1.2× 109
As depth increases, probability of reaching a given node shrinks⇒ value of lookahead is diminished
α-β pruning is much less effective
TDGammon uses depth-2 search + very good Eval≈ world-champion level
CMSC 421, Chapter 6 54
Summary
We looked at games that have the following characteristics:two playerszero sumperfect informationdeterministicfinite
In these games, can do a game-tree searchminimax values, alpha-beta pruning
In sufficiently complicated games, perfection is unattainable⇒ must approximate: limited search depth, static evaluation function
In games that are even more complicated, further approximation is needed⇒ Monte Carlo roll-outs
If we add an element of chance (e.g., dice rolls), expectiminimax
CMSC 421, Chapter 6 55