Chapter5ce.sharif.edu › courses › 96-97 › 1 › ce417-2 › resources › ... · Chapter5 2....

Game playing

Chapter 5

Chapter 5 1

Outline

♦ Games

♦ Perfect play– minimax decisions– α–β pruning

♦ Resource limits and approximate evaluation

♦ Games of chance

♦ Games of imperfect information

Chapter 5 2

Games

Reminder: Multi-agent environment is an environment in which each agentneeds to consider the actions of other agents and how they affect its ownwelfare.

In AI, the most common games are of a rather specialized kind: deterministic,turn-taking, two-player, zero-sum games of perfect information.For example, if one player wins a game of chess, the other player necessarilyloses.

Why games? The state of a game is easy to represent, and agents areusually restricted to a small number of actions whose outcomes are definedby precise rules.

With the exception of robot soccer, physical games have not attractedmuch interest in the AI community.

Chapter 5 3

Games formulation

We first consider games with two players, whom we call MAX and MIN.MAX moves first, and then they take turns moving until the game is over.

A game can be formally defined as a kind of search problem with the followingelements:

♦ S0 : The initial state, which specifies how the game is set up at the start.

♦ PLAY ER(s): Defines which player has the move in a state.

♦ ACTIONS(s): Returns the set of legal moves in a state.

♦ RESULT (s, a): The transition model, which defines the result of amove.

Chapter 5 4

Games formulation

♦ TERMINAL − TEST (s): A terminal test, which is true when thegame is over and false otherwise. States where the game has ended arecalled terminal states.

♦ UTILITY (s, p): A utility function (also called an objective function),defines the final numeric value for a game that ends in terminal state s for aplayer p.→ In chess, the outcome is a win, loss, or draw, with values +1, 0, or12. Zero-sum game?? Constant-sum would have been a better term, butzero-sum is traditional.

Chapter 5 5

Game tree

A tree where the nodes are game states and the edges are moves.

XX

XX

X

X

X

XX

X X

O

OX O

O

X OX O

X

. . . . . . . . . . . .

. . .

. . .

. . .

XX

–1 0 +1

XX

X XO

X XOX XO

O

O

X

X XO

OO

O O X X

MAX (X)

MIN (O)

MAX (X)

MIN (O)

TERMINAL

Utility

Chapter 5 6

Types of games

deterministic chance

perfect information

imperfect information

chess, checkers,go, othello

backgammonmonopoly

bridge, pokerbattleships,blind tictactoe

Chapter 5 7

Minimax

Perfect play for deterministic, perfect-information games

Idea: choose move to position with highest minimax value= best achievable payoff against best play

E.g., 2-ply game:MAX

3 12 8 642 14 5 2

MIN

3

A1

A3

A2

A13A

12A

11A

21 A23

A22

A33A

32A

31

3 2 2

Chapter 5 8

Minimax algorithm

function Minimax-Decision(state) returns an action

inputs: state, current state in game

return the a in Actions(state) maximizing Min-Value(Result(a, state))

function Max-Value(state) returns a utility value

if Terminal-Test(state) then return Utility(state)

v←−∞

for a, s in Successors(state) do v←Max(v, Min-Value(s))

return v

function Min-Value(state) returns a utility value

if Terminal-Test(state) then return Utility(state)

v←∞

for a, s in Successors(state) do v←Min(v, Max-Value(s))

return v

Chapter 5 9

Properties of minimax

Complete?? Yes, if tree is finite (chess has specific rules for this)

Optimal?? Yes, against an optimal opponent. Otherwise??

Time complexity?? O(bm)

Space complexity?? O(bm) (depth-first exploration)

For chess, b ≈ 35, m ≈ 100 for “reasonable” games⇒ exact solution completely infeasible

But do we need to explore every path?

Chapter 5 10

Optimal decisions in multiplayer games

Many popular games allow more than two players. How to extend the con-cepts of MiniMax algorithm to those?

The single value for each node is replaced with a vector of values.For example, in a three-player game with players A, B, and C, a vector< vA, vB, vC > is associated with each node.

For terminal states: Design a utility function that returns a vector ofvalues.

For non-terminal states: How to compute the value of each parent nodefrom the values of its child?

Chapter 5 11

Optimal decisions in multiplayer games

to moveA

B

C

A

(1, 2, 6) (4, 2, 3) (6, 1, 2) (7, 4,1) (5,1,1) (1, 5, 2) (7, 7,1) (5, 4, 5)

(1, 2, 6) (6, 1, 2) (1, 5, 2) (5, 4, 5)

(1, 2, 6) (1, 5, 2)

(1, 2, 6)

X

Chapter 5 12

α–β pruning

The problem with minimax search is that the number of game states it hasto examine is exponential in the depth of the tree.

α–β pruning technique can effectively cut it in half. It cant eliminate theexponent.

Chapter 5 13

α–β pruning example

MAX

3 12 8

MIN 3

3

Chapter 5 14


MAX

3 12 8

MIN 3

2

2

X X

3

Chapter 5 15


MAX

3 12 8

MIN 3

2

2

X X14

14

3

Chapter 5 16


MAX

3 12 8

MIN 3

2

2

X X14

14

5

5

3

Chapter 5 17


MAX

3 12 8

MIN

3

3

2

2

X X14

14

5

5

2

2

3

Chapter 5 18

The α–β algorithm

Chapter 5 19

α–β pruning

..

..

..

MAX

MIN

MAX

MIN V

α is the best value (to max) found so far off the current path

If V is worse than α, max will avoid it ⇒ prune that branch

Define β similarly for min

Chapter 5 20

Properties of α–β

The effectiveness of alphabeta pruning is highly dependent on the order inwhich the states are examined.

MAX

3 12 8

MIN

3

3

2

2

X X14

14

5

5

2

2

3

This suggests that it might be worthwhile to try to examine first the succes-sors that are likely to be best (Obviously, it cannot be done.)

With “perfect ordering,” time complexity = O(bm/2)⇒ doubles solvable depth

Chapter 5 21

Resource limits

Standard approach:

• Use Cutoff-Test instead of Terminal-Test

e.g., depth limit

• Use Eval instead of Utility

i.e., evaluation function that estimates desirability of position

Suppose we have 100 seconds, explore 104 nodes/second⇒ 106 nodes per move ≈ 358/2

⇒ α–β reaches depth 8 ⇒ pretty good chess program

Chapter 5 22

Evaluation functions

An evaluation function returns an estimate of the expected utility of thegame from a given position.

The performance of a game-playing program depends strongly on the qualityof its evaluation function.

What is the properties of a good evaluation function?(1) The evaluation function should order the terminal states in the same wayas the true utility function.(2) The computation must not take too long!(3) For non-terminal states, the evaluation function should be strongly cor-related with the actual chances of winning.

Chapter 5 23

Evaluation functions

Black to move

White slightly better

White to move

Black winning

For chess, typically linear weighted sum of features

Eval(s) = w1f1(s) + w2f2(s) + . . . + wnfn(s)

e.g., w1 = 9 withf1(s) = (number of white queens) – (number of black queens), etc.

Chapter 5 24

Cutting off search

(b) White to move(a) White to move

The evaluation function should be applied only to positions that are quies-centthat is, unlikely to exhibit wild swings in value in the near future.

Non-quiescent positions can be expanded further until quiescent positionsare reached. This extra search is called a quiescence search.

Chapter 5 25

Some other techniques to improve performance

♦ Using transposition table: It is worthwhile to store the evaluation ofthe resulting position in a hash table the first time it is encountered so thatwe dont have to recompute it on subsequent occurrences.

♦ Forward pruning: On each turn, consider only a beam of the n bestmoves (according to the evaluation function) rather than considering allpossible moves.

♦ Table lookup: Specifically for the opening and ending of games. Usetable look up at the first and the switch to search to continue. Near the endof the game there are again fewer possible positions, and thus more chanceto do lookup.In 2016, Bourzutschky solved all pawn-less six-piece. there is a KQNKRBNendgame that with best play requires 517 moves until a capture, which thenleads to a mate.

Chapter 5 26

Digression: Exact values don’t matter

MIN

MAX

21

1

42

2

20

1

1 40020

20

Behaviour is preserved under any monotonic transformation of Eval

Only the order matters...

Chapter 5 27

Nondeterministic games: backgammon

1 2 3 4 5 6 7 8 9 10 11 12

24 23 22 21 20 19 18 17 16 15 14 13

0

25

Chapter 5 28

Nondeterministic games in general

In nondeterministic games, chance introduced by dice, card-shuffling

Simplified example with coin-flipping:

MIN

MAX

2

CHANCE

4 7 4 6 0 5 −2

2 4 0 −2

0.5 0.5 0.5 0.5

3 −1

Chapter 5 29

Algorithm for nondeterministic games

Expectiminimax gives perfect play

Just like Minimax, except we must also handle chance nodes:

. . .if state is a Max node then

return the highestExpectiMinimax-Value of Successors(state)if state is a Min node then

return the lowestExpectiMinimax-Value of Successors(state)if state is a chance node then

return average ofExpectiMinimax-Value of Successors(state). . .

Chapter 5 30

Nondeterministic games in practice

Time complexity: O(bmnm), where n is the number of distinct rolls.

Dice rolls increase b: 21 possible rolls with 2 diceBackgammon ≈ 20 legal moves (can be 6,000 with 1-1 roll)

depth 4 = 20× (21× 20)3 ≈ 1.2× 109

As depth increases, probability of reaching a given node shrinks⇒ value of lookahead is diminished

α–β pruning is much less effective

TDGammon uses depth-2 search + very good Eval

≈ world-champion level

Chapter 5 31

Digression: Exact values DO matter

DICE

MIN

MAX

2 2 3 3 1 1 4 4

2 3 1 4

.9 .1 .9 .1

2.1 1.3

20 20 30 30 1 1 400 400

20 30 1 400

.9 .1 .9 .1

21 40.9

Behaviour is preserved only by positive linear transformation of Eval

Chapter 5 32

Pruning in nondeterministic game trees

A version of α-β pruning is possible:

0.5 0.5

[ − , + ]

[ − , + ]

[ − , + ]

0.5 0.5

[ − , + ]

[ − , + ]

[ − , + ]

Chapter 5 33



2

[ − , 2 ]

0.5 0.5

[ − , + ]

[ − , + ]

[ − , + ]

0.5 0.5

[ − , + ]

[ − , + ]

Chapter 5 34



0.5 0.5

[ − , + ]

[ − , + ]

[ − , + ]

0.5 0.5

[ − , + ]

[ − , + ]

2 2

[ 2 , 2 ]

Chapter 5 35



[ − , 2 ]

2

[ − , 2 ]

0.5 0.5

[ − , + ]

[ − , + ]

[ − , + ]

0.5 0.5

2 2

[ 2 , 2 ]

Chapter 5 36



2

0.5 0.5

[ − , + ]

[ − , + ]

[ − , + ]

0.5 0.5

2 2

[ 2 , 2 ]

1

[ 1 , 1 ]

[ 1.5 , 1.5 ]

Chapter 5 37



[ − , 0 ]

2

0.5 0.5

[ − , + ]

[ − , + ]

0.5 0.5

2 2

[ 2 , 2 ]

1

[ 1 , 1 ]

[ 1.5 , 1.5 ]

0

Chapter 5 38



2

0.5 0.5

[ − , + ]

[ − , + ]

0.5 0.5

2 2

[ 2 , 2 ]

1

[ 1 , 1 ]

[ 1.5 , 1.5 ]

0 1

[ 0 , 0 ]

Chapter 5 39



[ − , 0.5 ]

[ − , 1 ]

2

0.5 0.50.5 0.5

2 2

[ 2 , 2 ]

1

[ 1 , 1 ]

[ 1.5 , 1.5 ]

0 1

[ 0 , 0 ]

1

Chapter 5 40

Pruning contd.

More pruning occurs if we can bound the leaf values

0.5 0.50.5 0.5

[ −2 , 2 ] [ −2 , 2 ]

[ −2 , 2 ][ −2 , 2 ][ −2 , 2 ][ −2 , 2 ]

Chapter 5 41

Pruning contd.


0.5 0.50.5 0.5

[ −2 , 2 ] [ −2 , 2 ]

[ −2 , 2 ][ −2 , 2 ][ −2 , 2 ][ −2 , 2 ]

2

Chapter 5 42

Pruning contd.


0.5 0.50.5 0.5

[ −2 , 2 ]

[ −2 , 2 ][ −2 , 2 ][ −2 , 2 ]

2 2

[ 2 , 2 ]

[ 0 , 2 ]

Chapter 5 43

Pruning contd.


0.5 0.50.5 0.5

[ −2 , 2 ]

[ −2 , 2 ][ −2 , 2 ][ −2 , 2 ]

2 2

[ 2 , 2 ]

[ 0 , 2 ]

2

Chapter 5 44

Pruning contd.


0.5 0.50.5 0.5

[ −2 , 2 ]

[ −2 , 2 ][ −2 , 2 ]

2 2

[ 2 , 2 ]

2 1

[ 1 , 1 ]

[ 1.5 , 1.5 ]

Chapter 5 45

Pruning contd.


0.5 0.50.5 0.5

[ −2 , 2 ]

2 2

[ 2 , 2 ]

2 1

[ 1 , 1 ]

[ 1.5 , 1.5 ]

0

[ −2 , 0 ]

[ −2 , 1 ]

Chapter 5 46

Partially observable games

Kriegspiel: A partially observable variant of chess in which pieces can movebut are completely invisible to the opponent.

White and Black each see a board containing only their own pieces.

A referee, who can see all the pieces, adjudicates the game and periodicallymakes announcements that are heard by both players (legal/illegal moves,captures, mate in one direction, checkmate)

Chapter 5 47

Partially observable games

Recall belief state: the set of all logically possible board states given thecomplete history of percepts to date.

A winning strategy, or guaranteed checkmate, is one that, for each pos-sible percept sequence, leads to an actual checkmate for every possible boardstate in the current belief state, regardless of how the opponent moves.

If a guaranteed checkmate found, the opponent will lose even if he/she cansee all the pieces.

Chapter 5 48

Example of guaranteed checkmate

Chapter 5 49

Probabilistic checkmate

Such checkmates are still required to work in every board state in the beliefstate; they are probabilistic with respect to randomization of the winningplayers moves.

To get the basic idea, consider the problem of finding a lone black kingusing just the white king. Simply by moving randomly, the white king willeventually bump into the black king even if the latter tries to avoid this fate,since Black cannot keep guessing the right evasive moves indefinitely. In theterminology of probability theory, detection occurs with probability 1.

Example: The KBNK endgame

What about The KBBK endgame?

Chapter 5 50

Summary

Games are fun to work on! (and dangerous)

They illustrate several important points about AI

♦ perfection is unattainable ⇒ must approximate

♦ good idea to think about what to think about

♦ uncertainty constrains the assignment of values to states

♦ optimal decisions depend on information state, not real state

Games are to AI as grand prix racing is to automobile design

Chapter 5 51

Chapter5ce.sharif.edu › courses › 96-97 › 1 › ce417-2 › resources › ... · Chapter5 2....

Documents

Transcript of Chapter5ce.sharif.edu › courses › 96-97 › 1 › ce417-2 › resources › ... · Chapter5 2....