Last update: March 9, 2010 - University Of Marylandnau/cmsc421/chapter06.pdf · 1997 (Hsu et al):...

55
Last update: March 9, 2010 Game playing CMSC 421, Chapter 6 CMSC 421, Chapter 6 1

Transcript of Last update: March 9, 2010 - University Of Marylandnau/cmsc421/chapter06.pdf · 1997 (Hsu et al):...

Last update: March 9, 2010

Game playing

CMSC 421, Chapter 6

CMSC 421, Chapter 6 1

Finite perfect-information zero-sum games

Finite: finitely many agents, actions, states

Perfect information: every agent knows the current state,all of the actions, and what they doNo simultaneous actions – players move one-at-a-time

Constant-sum: regardless of how the game ends, Σ{agents’ utilities} = k.For every such game, there’s an equivalent game in which (k = 0).Thus constant-sum games usually are called zero-sum games

Examples:

Deterministic: chess, checkers, go, othello (reversi), connect-four,

qubic, mancala (awari, kalah), 9 men’s morris (merelles, morels, mill)

Stochastic: backgammon, monopoly, yahtzee, parcheesi, roulette, craps

We’ll start with deterministic games

CMSC 421, Chapter 6 2

Outline

♦ A brief history of work on this topic

♦ The minimax theorem

♦ Game trees

♦ The minimax algorithm

♦ α-β pruning

♦ Resource limits and approximate evaluation

CMSC 421, Chapter 6 3

A brief history

1846 (Babbage): machine to play tic-tac-toe

1928 (von Neumann): minimax theorem

1944 (von Neumann & Morgenstern): backward-induction algorithm(produces perfect play)

1950 (Shannon): minimax algorithm (finite horizon, approximateevaluation)

1951 (Turing): program (on paper) for playing chess

1952–7 (Samuel): checkers program, capable of beating its creator

1956 (McCarthy): pruning to allow deeper search

1957 (Bernstein): first complete chess program, on an IBM 704 vacuum-tube computer, could examine about 350 positions/minute

CMSC 421, Chapter 6 4

A brief history, continued

1967 (Greenblatt): first program to compete in human chess tournaments:3 wins, 3 draws, 12 losses

1992 (Schaeffer): Chinook won the 1992 US Open checkers tournament

1994 (Schaeffer): Chinook became world checkers champion;Tinsley (human champion) withdrew for health reasons

1997 (Hsu et al): Deep Blue won 6-game chess match againstworld chess champion Gary Kasparov

2007 (Schaeffer et al, 2007): Checkers solved: with perfect play,it’s a draw. This took 1014 calculations over 18 years

CMSC 421, Chapter 6 5

Basics

♦ A strategy specifies what an agent will do in every possible situation♦ Strategies may be pure (deterministic) or mixed (probabilistic)

Suppose agents A and B use strategies s and t to play a two-person zero-sumgame G. Then

• A’s expected utility is UA(s, t)From now on, we’ll just call this U(s, t)

• Since G is zero-sum, UB(s, t) = −U(s, t)

Instead of A and B, we’ll call the agents Max and Min

Max wants to maximize U and Min wants to minimize it

CMSC 421, Chapter 6 6

The Minimax Theorem (von Neumann, 1928)

Minimax theorem: Let G be a two-person finite zero-sum game withplayers Max and Min. Then there are strategies s∗ and t∗, and a number VGcalled G’s minimax value, such that

• If Min uses t∗, Max’s expected utility is ≤ VG, i.e., maxsU(s, t∗) = VG

• If Max uses s∗, Max’s expected utility is ≥ VG, i.e., mintU(s∗, t) = VG

Corollary 1: U(s∗, t∗) = VG.

Corollary 2: If G is a perfect-information game,then there are pure strategies s∗ and t∗ that satisfy the theorem.

CMSC 421, Chapter 6 7

Game trees

XXXX

XX

X

XX

MAX (X)

MIN (O)

X X

O

OOX O

OO O

O OO

MAX (X)

X OX OX O XX X

XX

X X

MIN (O)

X O X X O X X O X

. . . . . . . . . . . .

. . .

. . .

. . .

TERMINALXX

−1 0 +1Utility

The name game tree comes from AI.Mathematical game theorists callthis the extensive form of a game.

Root node ⇔ the initial state

Children of a node ⇔ the states aplayer can move to

CMSC 421, Chapter 6 8

Strategies on game trees

XXXX

XX

X

XX

MAX (X)

MIN (O)

X X

O

OOX O

OO O

O OO

MAX (X)

X OX OX O XX X

XX

X X

MIN (O)

X O X X O X X O X

. . . . . . . . . . . .

. . .

. . .

. . .TERMINAL

XX−1 0 +1Utility

To construct a pure strategy for Max:

• At each node where it’s Max’smove, choose one branch

• At each node where it’s Min’smove, include all branches

Let b = the branching factor (max. number of children of any node)h = the tree’s height (max. depth of any node)

The number of pure strategies for Max ≤ bdh/2e,with equality if every node of height < h node has b children

CMSC 421, Chapter 6 9

Strategies on game trees

XXXX

XX

X

XX

MAX (X)

MIN (O)

X X

O

OOX O

OO O

O OO

MAX (X)

X OX OX O XX X

XX

X X

MIN (O)

X O X X O X X O X

. . . . . . . . . . . .

. . .

. . .

. . .TERMINAL

XX−1 0 +1Utility

To construct a pure strategy for Min:

• At each node where it’s Min’smove, choose one branch

• At each node where it’s Max’smove, include all branches

The number of pure strategies for Min ≤ bdh/2e

with equality if every node of height < h node has b children

CMSC 421, Chapter 6 10

Finding the best strategy

Brute-force way to find Max’s and Min’s best strategies:

Construct the sets S and T of all of Max’s and Min’s pure strategies,then choose

s∗ = arg maxs∈S

mint∈T

UMax(s, t)

t∗ = arg mint∈T

maxs∈S

UMax(s, t)

Complexity analysis:

• Need to construct and store O(bh/2 + bh/2) = O(bh/2) strategies• Each strategy is a tree that has O(bh/2) nodes• Thus space complexity is O(bh/2bh/2) = O(bh)

• Time complexity is slightly worse

But there’s an easier way to find the strategies

CMSC 421, Chapter 6 11

Minimax Algorithm

Compute a game’s minimax value recursively from the minimax values of itssubgames:

MAX

3 12 8 642 14 5 2

MIN

3A 1 A 3A 2

A 13A 12A 11 A 21 A 23A 22 A 33A 32A 31

3 2 2

function Minimax(s) returns a utility value

if s is a terminal state then return Max’s payoff at s

else if it is Max’s move in s then

return max{Minimax(result(a, s)) : a is applicable to s}else return min{Minimax(result(a, s)) : a is applicable to s}

To get the next action, return argmax and argmin instead of max and min

CMSC 421, Chapter 6 12

Properties of the minimax algorithm

Is it sound? I.e., when it returns answers, are they correct?

CMSC 421, Chapter 6 13

Properties of the minimax algorithm

Is it sound? I.e., when it returns answers, are they correct?Yes (can prove this by induction)

Is it complete? I.e., does it always return an answer when one exists?

CMSC 421, Chapter 6 14

Properties of the minimax algorithm

Is it sound? I.e., when it returns answers, are they correct?Yes (can prove this by induction)

Is it complete? I.e., does it always return an answer when one exists?Yes on finite trees (e.g., chess has specific rules for this).

Space complexity?

CMSC 421, Chapter 6 15

Properties of the minimax algorithm

Is it sound? I.e., when it returns answers, are they correct?Yes (can prove this by induction)

Is it complete? I.e., does it always return an answer when one exists?Yes on finite trees (e.g., chess has specific rules for this).

Space complexity? O(bh), where b and h are as defined earlier

Time complexity?

CMSC 421, Chapter 6 16

Properties of the minimax algorithm

Is it sound? I.e., when it returns answers, are they correct?Yes (can prove this by induction)

Is it complete? I.e., does it always return an answer when one exists?Yes on finite trees (e.g., chess has specific rules for this).

Space complexity? O(bh), where b and h are as defined earlier

Time complexity? O(bh)

For chess, b ≈ 35, h ≈ 100 for “reasonable” games35100 ≈ 10135 nodes

This is about 1055 times the number of particles in the universe (about 1087)⇒ no way to examine every node!

But do we really need to examine every node?

CMSC 421, Chapter 6 17

Pruning example 1

MAX

3 12 8

MIN 3

2

2

X X

3

CMSC 421, Chapter 6 18

Pruning example 1

MAX

3 12 8

MIN 3

2

2

X X

3

Max will never move to this node, because Max can do better by moving tothe first one

Thus we don’t need to figure out this node’s minimax value

CMSC 421, Chapter 6 19

Pruning example 1

MAX

3 12 8

MIN 3

2

2

X X14

14

3

This node might be better than the first one

CMSC 421, Chapter 6 20

Pruning example 1

MAX

3 12 8

MIN 3

2

2

X X14

14

5

5

3

It still might be better than the first one

CMSC 421, Chapter 6 21

Pruning example 1

MAX

3 12 8

MIN

3

3

2

2

X X14

14

5

5

2

2

3

No, it isn’t

CMSC 421, Chapter 6 22

Pruning example 2

a

b

≥ 7

c

d

≤ 5

=5

e j

f i

• • •

= 7

• • •

• • •

Same idea works farther down in the tree

Max won’t move to e, because Max can do better by going to bDon’t need e’s exact value, because it won’t change minimax(a)So stop searching below e

CMSC 421, Chapter 6 23

Alpha-beta pruning

Start a minimax search at node c

Let α = biggest lower bound on any ancestor of f

α = max(−2, 4, 0) = 4 in the example

If the game reaches f , Max will get utility ≤ 3

To reach f , the game must go through d

But if the game reaches d, Max can getutility ≥ 4 by moving off of the path to f

So the game will never reach f

We can stop trying to compute u∗(f ),because it can’t affect u∗(c)

This is called an alpha cutoff

α = 4 0

≥ 0 e

≥ –2c

f

3

–2 d

4

≥ 4

≤ 3

CMSC 421, Chapter 6 24

Alpha-beta pruning

Start a minimax search at node a

Let β = smallest upper bound on any ancestor of d

β = min(5,−2, 3) = −2 in the example

If the game reaches d, Max will get utility ≥ 0

To reach d, the game must go through b

But if the game reaches b, Min can make Max’sutility ≤ −2 by moving off of the path to d

So the game will never reach d

We can stop trying to compute u∗(d),because it can’t affect u∗(a)

This is called a beta cutoff

β = –2 3

≤ 3 c

≤ 5a

d

0

5 b

–2

≤ –2

≥ 0

CMSC 421, Chapter 6 25

The alpha-beta algorithm

function Alpha-Beta(s,α,β) returns a utility value

inputs: s, current state in game

α, the value of the best alternative for max along the path to s

β, the value of the best alternative for min along the path to s

if s is a terminal state then return Max’s payoff at s

else if it is Max’s move at s then

v←−∞for every action a applicable to s do

v←max(v ,Alpha-Beta(result(a, s), α, β))

if v ≥ β then return v

α←max(α, v)

else

v←∞for every action a applicable to s do

v←min(v ,Alpha-Beta(result(a, s), α, β))

if v ≤ α then return v

β←min(β, v)

return v

CMSC 421, Chapter 6 26

α-β pruning example

a

b c

d m

e j

f i k l

g h

• • •

• • •

α = –∞β = ∞

α = –∞β = ∞

CMSC 421, Chapter 6 27

α-β pruning example

a

b c

d m

e j

f i k l

g h

• • •

• • •

α = –∞β = ∞

α = –∞β = ∞

7

X 7

7

CMSC 421, Chapter 6 28

α-β pruning example

a

b c

d

e

f

g

• • •

7

α = –∞β = ∞

α = 7β = ∞

α = 7β = ∞

α = 7β = ∞

α = 7β = ∞

X 7

α = –∞β = ∞

m

j

i k l

• • •

7

h

α = 7β = ∞

CMSC 421, Chapter 6 29

α-β pruning example

a

b c

d

5

5 -3

e

f

g h

• • •

7

α = –∞β = ∞

α = 7β = ∞

α = 7β = ∞

α = 7β = ∞

α = 7β = ∞

X 7

α = –∞β = ∞

m

j

i k l

• • •

7

α = 7β = ∞

CMSC 421, Chapter 6 30

α-β pruning example

a

b c

d

5

5

5 -3

e

f

g h

• • •

7

α = –∞β = ∞

α = 7β = ∞

α = 7β = ∞

α = 7β = ∞

α = 7β = ∞

X 7

α = –∞β = ∞

m

j

i k l

• • • alpha cutoff

5

7

α = 7β = ∞

CMSC 421, Chapter 6 31

α-β pruning example

a

b c

d

5

5

5 -3

e j

f i k

g h

• • •

7

α = –∞β = ∞

α = 7β = ∞

α = 7β = ∞

α = 7β = ∞

α = 7β = ∞

α = 7β = ∞

α = 7β = ∞

X 7

α = –∞β = ∞

alpha cutoff

m

• • •

l

5

7

α = 7β = ∞

α = 7β = ∞

CMSC 421, Chapter 6 32

α-β pruning example

a

b c

d

5

5

5 -3

e j

f i k l

g h

• • •

7

α = –∞β = ∞

0 8

8

α = 7β = ∞

α = 7β = ∞

α = 7β = ∞

α = 7β = ∞

α = 7β = ∞

α = 7β = ∞

X 8

X 7

α = 7β = 8

α = –∞β = ∞

alpha cutoff

m

• • •

5

8

7

α = 7β = ∞

α = 7β = ∞

CMSC 421, Chapter 6 33

α-β pruning example

a

b c

d

5

5

5 -3

e j

f i k l

g h

• • •

7

α = –∞β = ∞

α = 7β = ∞

α = 7β = ∞

α = 7β = ∞

α = 7β = ∞

α = 7β = ∞

α = 7β = ∞

X 8

X 7

α = 7β = 8

α = –∞β = ∞

alpha cutoff

beta cutoff 9

9

m

• • •

0 8

8

5

8

7

α = 7β = ∞

α = 7β = ∞

CMSC 421, Chapter 6 34

α-β pruning example

a

b c

d

5

5

5 -3

e j

f i k l

g h

• • •

7

α = –∞β = ∞

α = 7β = ∞

α = 7β = ∞

α = 7β = ∞

α = 7β = ∞

α = 7β = ∞

α = 7β = ∞

X 8

X 7

α = 7β = 8

X 8

α = –∞β = ∞

alpha cutoff

beta cutoff 9

9

0 8

8

5

8

X 8

7

m

• • •

α = 7β = ∞

α = 7β = ∞

X 8

CMSC 421, Chapter 6 35

α-β pruning example

a

b c

d

5

5

5 -3

e j

f i k l

g h

• • •

7

α = –∞β = ∞

α = 7β = ∞

α = 7β = ∞

α = 7β = ∞

α = 7β = ∞

α = 7β = ∞

α = 7β = ∞

X 8

X 7

α = 7β = 8

X 8

X 8α = –∞β = ∞

alpha cutoff

m

• • • • • •

m

α = 7β = 8

beta cutoff 9

9

0 8

8

5

8

X 8

7

α = 7β = ∞

α = 7β = ∞

CMSC 421, Chapter 6 36

Properties of α-β

α-β is a simple example of the value of reasoning about which computationsare relevant (a form of metareasoning)

♦ if α ≤ minimax(s) ≤ β, then alpha-beta returns minimax(s)♦ if minimax(s) ≤ α, then alpha-beta returns a value ≤ α♦ if minimax(s) ≥ β, then alpha-beta returns a value ≥ β

If we start with α = −∞ and β =∞, thenalpha-beta will always return minimax(s)

Good move ordering can enable us to prune more nodes.Best case is if♦ at nodes where it’s Max’s move, children are largest-value first♦ at nodes where it’s Min’s move, children are smallest-value first

In this case time complexity = O(bh/2) ⇒ doubles the solvable depth

Worst case is the reverseIn this case, α-β will search every node

CMSC 421, Chapter 6 37

Resource limits

Even with alpha-beta, it can still be infeasible to search the entire game tree(e.g., recall chess has about 10135 nodes)⇒ need to limit the depth of the search

Basic approach: let d be a positive integerWhenever we reach a node of depth > d• If we’re at a terminal state, then return Max’s payoff• Otherwise return an estimate of the node’s utility value,

computed by a static evaluation function

CMSC 421, Chapter 6 38

α-β with a bound d on the search depth

function Alpha-Beta(s,α,β,d) returns a utility value

inputs: s,α,β, same as before

d, an upper bound on the search depth

if s is a terminal state then return Max’s payoff at s

else if d = 0 then return Eval(s)

else if it is Max’s move at s then

v←−∞for every action a applicable to s do

v←max(v ,Alpha-Beta(result(a, s), α, β,d − 1))

if v ≥ β then return v

α←max(α, v )

else

v←∞for every action a applicable to s do

v←min(v ,Alpha-Beta(result(a, s), α, β,d − 1))

if v ≤ α then return v

β←min(α, v )

return v

CMSC 421, Chapter 6 39

Evaluation functions

Eval(s) is supposed to return an approximation of s’s minimax valueEval is often a weighted sum of features

Eval(s) = w1f1(s) + w2f2(s) + . . . + wnfn(s)

Black to move

White slightly better

White to move

Black winningE.g.,

1(number of white pawns− number of black pawns)+ 3(number of white knights− number of black knights)+ . . .

CMSC 421, Chapter 6 40

Exact values for Eval don’t matter

MIN

MAX

21

1

42

2

20

1

1 40020

20

Behavior is preserved under any monotonic transformation of Eval

Only the order matters:In deterministic games, payoff acts as an ordinal utility function

CMSC 421, Chapter 6 41

Discussion

Deeper lookahead (i.e., larger depth bound d) usually gives better decisions

Exceptions do existMain result in my PhD dissertation (30 years ago!):“pathological” games in which deeper lookahead gives worse decisions

But such games hardly ever occur in practice

Suppose we have 100 seconds, explore 104 nodes/second⇒ 106 ≈ 358/2 nodes per move⇒ α-β reaches depth 8 ⇒ pretty good chess program

Some modifications that can improve the accuracy or computation time:node ordering (see next slide)quiescence searchbiasingtransposition tablesthinking on the opponent’s time. . .

CMSC 421, Chapter 6 42

Node ordering

Recall that I said:

Best case is if♦ at nodes where it’s Max’s move, children are largest first♦ at nodes where it’s Min’s move, children are smallest first

In this case time complexity = O(bh/2) ⇒ doubles the solvable depth

Worst case is the reverse

How to get closer to the best case:♦ Every time you expand a state s, apply Eval to its children♦ When it’s Max’s move, sort the children in order of largest Eval first♦ When it’s Min’s move, sort the children in order of smallest Eval first

CMSC 421, Chapter 6 43

Quiescence search and biasing

In a game like checkers or chessThe evaluation is based greatly on material piecesIt’s likely to be inaccurate if there are pending captures

e.g., if someone is about to take your queen

♦ Search deeper to reach a position where there aren’t pending capturesEvaluations will be more accurate here

But it creates another problem

♦ You’re searching some paths to an even depth, others to an odd depth

♦ Paths that end just after your opponent’s movewill generally look worse than paths that end just after your move

♦ Add or subtract a number called the “biasing factor” to try to fix this

CMSC 421, Chapter 6 44

Transposition tables

Often there are multiple paths to the same state (i.e., the state space is areally graph rather than a tree)

Idea:♦ when you compute s’s minimax value, store it in a hash table♦ visit s again ⇒ retrieve its value rather than computing it again

The hash table is called a transposition table

Problem: far too many states to store all of them s

Store some of the states, rather than all of them

Try to store the ones that you’re most likely to need

CMSC 421, Chapter 6 45

Thinking on the opponent’s time

Current state s, children s1, . . . , sn

Compute their minimax values, move to the one that looks bestsay, si

You computed si’s minimax value as the minimum of the values of its chil-dren, si1, . . . , sim

Let sij be the one that has the smallest minimax valueThat’s where the opponent is most likely to move to

Do a minimax search below sij while waiting for the opponent to move

If your opponent moves to sij then you’ve already done a lot of the work offiguring out your next move

CMSC 421, Chapter 6 46

Game-tree search in practice

Checkers: Chinook ended 40-year-reign of human world champion MarionTinsley in 1994.

Checkers was solved in April 2007: from the standard starting position, bothplayers can guarantee a draw with perfect play. This took 1014 calculationsover 18 years. Checkers has a search space of size 5× 1020.

Chess: Deep Blue defeated human world champion Gary Kasparov in a six-game match in 1997. Deep Blue searches 200 million positions per second,uses very sophisticated evaluation, and undisclosed methods for extendingsome lines of search up to 40 ply.

Othello: human champions refuse to compete against computers, who aretoo good.

Go: until recently, human champions didn’t compete against computersbecause the computers were too bad. But that has changed . . .

CMSC 421, Chapter 6 47

Game-tree search in the game of go

b =2

b =3

b =4

A game tree’s size grows exponentially with both its depth and its branchingfactor

Go is much too big for a normal game-tree search:branching factor = about 200game length = about 250 to 300 movesnumber of paths in the game tree = 10525 to 10620

For comparison:Number of atoms in universe = about 1080

Number of particles in universe = about 1087

CMSC 421, Chapter 6 48

Game-tree search in the game of go

During the past couple years, go programs have gotten much better

Main reason: Monte Carlo roll-outs

Basic idea: do a minimax search of a randomly selected subtree

At each node that the algorithm visits,

♦ It randomly selects some of the childrenThere are some heuristics for deciding how many

♦ Calls itself recursively on these, ignores the others

CMSC 421, Chapter 6 49

Forward pruning in chess

Back in the 1970s, some similar ideas were tried in chess

The approach was called forward pruningMain difference: select the children heuristically rather than randomlyIt didn’t work as well as brute-force alpha-beta, so people abandoned it

Why does a similar idea work so much better in go?

CMSC 421, Chapter 6 50

Perfect-information nondeterministic games

Backgammon: chance is introduced by dice

1 2 3 4 5 6 7 8 9 10 11 12

24 23 22 21 20 19 18 17 16 15 14 13

0

25

CMSC 421, Chapter 6 51

Expectiminimax

MIN

MAX

2

CHANCE

4 7 4 6 0 5 −2

2 4 0 −2

0.5 0.5 0.5 0.5

3 −1

function ExpectiMinimax(s) returns an expected utility

if s is a terminal state then return Max’s payoff at s

if s is a “chance” node then

return Σs ′ P (s ′|s)ExpectiMinimax(s ′)

else if it is Max’s move at s then

return max{ExpectiMinimax(result(a, s)) : a is applicable to s}else return min{ExpectiMinimax(result(a, s)) : a is applicable to s}

This gives optimal play (i.e., highest expected utility)

CMSC 421, Chapter 6 52

With nondeterminism, exact values do matter

DICE

MIN

MAX

2 2 3 3 1 1 4 4

2 3 1 4

.9 .1 .9 .1

2.1 1.3

20 20 30 30 1 1 400 400

20 30 1 400

.9 .1 .9 .1

21 40.9

At “chance” nodes, we need to compute weighted averages

Behavior is preserved only by positive linear transformations of EvalHence Eval should be proportional to the expected payoff

CMSC 421, Chapter 6 53

In practice

Dice rolls increase b: 21 possible rolls with 2 diceGiven the dice roll, ≈ 20 legal moves on average

(for some dice roles, can be much higher)

depth 4 = 20× (21× 20)3 ≈ 1.2× 109

As depth increases, probability of reaching a given node shrinks⇒ value of lookahead is diminished

α-β pruning is much less effective

TDGammon uses depth-2 search + very good Eval≈ world-champion level

CMSC 421, Chapter 6 54

Summary

We looked at games that have the following characteristics:two playerszero sumperfect informationdeterministicfinite

In these games, can do a game-tree searchminimax values, alpha-beta pruning

In sufficiently complicated games, perfection is unattainable⇒ must approximate: limited search depth, static evaluation function

In games that are even more complicated, further approximation is needed⇒ Monte Carlo roll-outs

If we add an element of chance (e.g., dice rolls), expectiminimax

CMSC 421, Chapter 6 55