1 Chapter 6, Sec. 1 - 4 Adversarial Search. 2 Outline Optimal decisions α-β pruning Imperfect,...

25
1 Chapter 6, Sec. Chapter 6, Sec. 1 - 4 1 - 4 Adversarial Adversarial Search Search

Transcript of 1 Chapter 6, Sec. 1 - 4 Adversarial Search. 2 Outline Optimal decisions α-β pruning Imperfect,...

Page 1: 1 Chapter 6, Sec. 1 - 4 Adversarial Search. 2 Outline Optimal decisions α-β pruning Imperfect, real-time decisions.

11

Chapter 6, Sec. 1 - 4Chapter 6, Sec. 1 - 4

AdversarialAdversarial SearchSearch

Page 2: 1 Chapter 6, Sec. 1 - 4 Adversarial Search. 2 Outline Optimal decisions α-β pruning Imperfect, real-time decisions.

22

OutlineOutline

Optimal decisionsOptimal decisions

α-β pruningα-β pruning

Imperfect, real-time decisionsImperfect, real-time decisions

Page 3: 1 Chapter 6, Sec. 1 - 4 Adversarial Search. 2 Outline Optimal decisions α-β pruning Imperfect, real-time decisions.

33

Games vs. search problemsGames vs. search problems

Competitive multiagent environmentCompetitive multiagent environment in which the agents’ goals are in in which the agents’ goals are in conflict: conflict: adversarial searchadversarial search problem - problem - gamesgames..

Games in AI: deterministic, turn-taking, 2-player, zero-sum games of Games in AI: deterministic, turn-taking, 2-player, zero-sum games of perfect information perfect information – i.e. deterministic, fully observable environments in which – i.e. deterministic, fully observable environments in which there two agents whose actions must alternate and in which the utility values at the there two agents whose actions must alternate and in which the utility values at the end of the game are always equal and opposite.end of the game are always equal and opposite.

"Unpredictable" opponent "Unpredictable" opponent solution is a solution is a strategystrategy specifying a move for specifying a move for every possible opponent reply.every possible opponent reply.

Time limits Time limits unlikely to find goal, must approximate. unlikely to find goal, must approximate. – how to make the best possible use of time?how to make the best possible use of time?– The The optimal moveoptimal move and an algorithm for finding it and an algorithm for finding it– the techniques for choosing a good move within the time limitsthe techniques for choosing a good move within the time limits– Pruning allows us to ignore portions of the search tree that make no difference allows us to ignore portions of the search tree that make no difference

to the final choiceto the final choice– Heuristic evaluation functions allow us to approximate the true utility allow us to approximate the true utility

of a state without doing a complete search.of a state without doing a complete search.

Page 4: 1 Chapter 6, Sec. 1 - 4 Adversarial Search. 2 Outline Optimal decisions α-β pruning Imperfect, real-time decisions.

44

Types of gamesTypes of games

deterministicdeterministic chancechance

Perfect Perfect informationinformation

Chess, checkers, Chess, checkers, go, othellogo, othello

backgammon, backgammon, monopolymonopoly

Imperfect Imperfect informationinformation

Bridge, poker, Bridge, poker, scrabble, scrabble,

nuclear warnuclear war

Page 5: 1 Chapter 6, Sec. 1 - 4 Adversarial Search. 2 Outline Optimal decisions α-β pruning Imperfect, real-time decisions.

55

Optimal decisions in gameOptimal decisions in game

A game can be formally defined as a kind of A game can be formally defined as a kind of search problemsearch problem with with the following components:the following components:– Initial state: includes the board position and identifies the player to : includes the board position and identifies the player to

movemove– Successor function: returns a list of (move, state) pairs, each : returns a list of (move, state) pairs, each

indicating a legal move and resulting state.indicating a legal move and resulting state.– Terminal test: determines when the game is over – terminal states.: determines when the game is over – terminal states.– Utility function ( an objective function or payoff function): ( an objective function or payoff function):

gives a numeric value for the terminal states. -- +x, -x, 0gives a numeric value for the terminal states. -- +x, -x, 0

A game tree for tic-tac-toe:A game tree for tic-tac-toe:– Play alternates b/t MAX’s placing an X and MIN’s placing an O until Play alternates b/t MAX’s placing an X and MIN’s placing an O until

we reach leaf nodes of terminal states.we reach leaf nodes of terminal states.– The # on each leaf node: the The # on each leaf node: the utility valueutility value of the terminal state from the of the terminal state from the

point of view of MAX. point of view of MAX.

Page 6: 1 Chapter 6, Sec. 1 - 4 Adversarial Search. 2 Outline Optimal decisions α-β pruning Imperfect, real-time decisions.

66

Continued... Optimal decisionsContinued... Optimal decisions

Cf) In a normal search problem, the optimal solution would be a sequence Cf) In a normal search problem, the optimal solution would be a sequence of moves leading to a goal state in the minimum path cost.of moves leading to a goal state in the minimum path cost.

In a game, a player must find a In a game, a player must find a contingent strategycontingent strategy for every possible for every possible response by opponentresponse by opponent

An An optimal strategyoptimal strategy leads to outcomes at last as good as any other leads to outcomes at last as good as any other strategy when one is playing an infallible opponent.strategy when one is playing an infallible opponent.

In a game tree of tic-tac-toe, the optimal strategy can be determined by In a game tree of tic-tac-toe, the optimal strategy can be determined by examining the minimax value of each node, which is the utility (for MAX) examining the minimax value of each node, which is the utility (for MAX) of being in the corresponding state, assuming that both players play of being in the corresponding state, assuming that both players play optimally from there to the end.optimally from there to the end.

if if nn is a terminal state is a terminal state

if if nn is a MAX node is a MAX node

if if nn is a MIN node is a MIN node

Given a choice, MAX prefer to move to a state of maximum value, Given a choice, MAX prefer to move to a state of maximum value, whereas MIN prefers a state of minimum value.whereas MIN prefers a state of minimum value.

( )

( )

( )

( ) max ( )

min ( )

s Successor n

s Successor n

UTILITY n

MINIMAX VALUE n MINIMAX VALUE s

MINIMAX VALUE s

Page 7: 1 Chapter 6, Sec. 1 - 4 Adversarial Search. 2 Outline Optimal decisions α-β pruning Imperfect, real-time decisions.

77

Game tree Game tree (2-player, deterministic, turns)(2-player, deterministic, turns)

The top node: the initial state Alternative moves by MIN(O) and MAX(X), until we eventually reach terminal states, which can be assigned utilities according to the rules of the game.

Page 8: 1 Chapter 6, Sec. 1 - 4 Adversarial Search. 2 Outline Optimal decisions α-β pruning Imperfect, real-time decisions.

88

Minimax algorithmMinimax algorithm

Compute the minimax decision from the current state.Compute the minimax decision from the current state.

Perfect play for deterministic games, perfect-information Perfect play for deterministic games, perfect-information game.game.

Idea: choose move to position with highest Idea: choose move to position with highest minimax valueminimax value = best achievable payoff against best play= best achievable payoff against best play

E.g., 2-ply game:E.g., 2-ply game:

Page 9: 1 Chapter 6, Sec. 1 - 4 Adversarial Search. 2 Outline Optimal decisions α-β pruning Imperfect, real-time decisions.

99

Minimax algorithm Minimax algorithm -- the minimax decision from the current state-- the minimax decision from the current state

Page 10: 1 Chapter 6, Sec. 1 - 4 Adversarial Search. 2 Outline Optimal decisions α-β pruning Imperfect, real-time decisions.

1010

Properties of minimaxProperties of minimax

Complete?Complete? Yes (if tree is finite) Yes (if tree is finite)

Optimal?Optimal? Yes (against an optimal opponent) Yes (against an optimal opponent)

Time complexity?Time complexity? O(bO(bmm))

Space complexity?Space complexity? O(bm) O(bm) (depth-first exploration) (depth-first exploration)

The # of game states it has to examine is exponential in The # of game states it has to examine is exponential in the # of moves.the # of moves.

For chess, For chess, b b ≈≈ 35 35, , m m ≈≈100100 for "reasonable" games for "reasonable" games exact solution completely infeasible exact solution completely infeasible

Page 11: 1 Chapter 6, Sec. 1 - 4 Adversarial Search. 2 Outline Optimal decisions α-β pruning Imperfect, real-time decisions.

1111

α-βα-β pruning pruning

Compute the correct minimax decision w/o looking at every node Compute the correct minimax decision w/o looking at every node in the game tree => reduce the # of states examined.in the game tree => reduce the # of states examined.

Prunes away branches of the game tree that can’t possibly Prunes away branches of the game tree that can’t possibly influence the final decision.influence the final decision.

General principle: General principle:

Consider a node Consider a node nn somewhere in the tree, s.t. Player has a choice somewhere in the tree, s.t. Player has a choice of moving to that node. If Player has a better choice of moving to that node. If Player has a better choice mm either at either at the parent node of the parent node of nn or at any choice point further up, then or at any choice point further up, then nn will will never be reached in actual play.never be reached in actual play.

Page 12: 1 Chapter 6, Sec. 1 - 4 Adversarial Search. 2 Outline Optimal decisions α-β pruning Imperfect, real-time decisions.

1212

α-βα-β pruning example pruning example

Page 13: 1 Chapter 6, Sec. 1 - 4 Adversarial Search. 2 Outline Optimal decisions α-β pruning Imperfect, real-time decisions.

1313

Continued…Continued… α-β α-β pruning example pruning example

Page 14: 1 Chapter 6, Sec. 1 - 4 Adversarial Search. 2 Outline Optimal decisions α-β pruning Imperfect, real-time decisions.

1414

Continued…Continued… α-β α-β pruning example pruning example

Page 15: 1 Chapter 6, Sec. 1 - 4 Adversarial Search. 2 Outline Optimal decisions α-β pruning Imperfect, real-time decisions.

1515

Continued…Continued… α-β α-β pruning example pruning example

Page 16: 1 Chapter 6, Sec. 1 - 4 Adversarial Search. 2 Outline Optimal decisions α-β pruning Imperfect, real-time decisions.

1616

Continued…Continued… α-β α-β pruning example pruning example

Page 17: 1 Chapter 6, Sec. 1 - 4 Adversarial Search. 2 Outline Optimal decisions α-β pruning Imperfect, real-time decisions.

1717

Properties of Properties of α-β α-β pruning

Pruning does not affect final result

Good move ordering improves effectiveness of pruning-- highly dependent on the order of successors examined.

It might be worthwhile to try to examine first the successor that are likely to be best.

With "perfect ordering – the best-first” of successor’s examination, time complexity = O(bm/2) Effective branching factor: b doubles depth of search w/i the same amount of time. Can easily reach depth 8 and play good chess.

With random examination, the total # of nodes examined will be O(b3m/4)

A simple example of the value of reasoning about which computations are relevant (a form of metareasoning)

Page 18: 1 Chapter 6, Sec. 1 - 4 Adversarial Search. 2 Outline Optimal decisions α-β pruning Imperfect, real-time decisions.

1818

Why is it called Why is it called α-β?α-β?

αα is the value of the best (i.e., highest-value) choice found so far at is the value of the best (i.e., highest-value) choice found so far at any choice point along the path for any choice point along the path for MAXMAX

is the value of the best (i.e. lowest-value) choice found so far at any is the value of the best (i.e. lowest-value) choice found so far at any choice point along the path for choice point along the path for MIN.MIN.If If VV is worse than is worse than αα, , MAXMAX will avoid it will avoid it prune that branch prune that branchDefine Define ββ similarly for similarly for MINMIN

Page 19: 1 Chapter 6, Sec. 1 - 4 Adversarial Search. 2 Outline Optimal decisions α-β pruning Imperfect, real-time decisions.

1919

The The α-βα-β algorithm algorithm

Page 20: 1 Chapter 6, Sec. 1 - 4 Adversarial Search. 2 Outline Optimal decisions α-β pruning Imperfect, real-time decisions.

2020

Resource limitsResource limits

Suppose we have 100 seconds, explore 10Suppose we have 100 seconds, explore 1044 nodes/sec nodes/sec 101066 nodes per movenodes per move

Alpha-beta pruning still has to search all the way to terminal Alpha-beta pruning still has to search all the way to terminal states for at least a portion of the search space.states for at least a portion of the search space.

Standard approach:Standard approach:– cutoff test: cutoff test:

Cut off the search earlier and decide when to apply Cut off the search earlier and decide when to apply evaluation function, turning nonterminal nodes into terminal evaluation function, turning nonterminal nodes into terminal leaves.leaves.e.g., depth limit (perhaps add e.g., depth limit (perhaps add quiescence searchquiescence search))

– evaluation function evaluation function = estimated desirability(utility) of position= estimated desirability(utility) of position

Page 21: 1 Chapter 6, Sec. 1 - 4 Adversarial Search. 2 Outline Optimal decisions α-β pruning Imperfect, real-time decisions.

2121

Evaluation functionsEvaluation functions

Returns an Returns an estimateestimate of the of the expected utilityexpected utility of the game from a position of the game from a position..RequirementsRequirements– The eval-fn should order the terminal states in the same way as the true utility The eval-fn should order the terminal states in the same way as the true utility

function; otherwise, an agent using it might select suboptimal moves function; otherwise, an agent using it might select suboptimal moves even if it can see ahead all the way to the end of the game.even if it can see ahead all the way to the end of the game.

– The computation must not take too long.The computation must not take too long.– For nonterminal states, the eval-fn should be strongly correlated with the actual For nonterminal states, the eval-fn should be strongly correlated with the actual

chances of winning. => Make a guess a/b the final outcome.chances of winning. => Make a guess a/b the final outcome.

Categories of states defined by features which are calculated by eval-fn and Categories of states defined by features which are calculated by eval-fn and the states in each category have the same values for all the features the states in each category have the same values for all the features as a single value which reflects the proportion of states with each outcome as a single value which reflects the proportion of states with each outcome – – the the weighted averageweighted average or or expected valueexpected value..The expected value determined for each category, resulting in an eval-fn that The expected value determined for each category, resulting in an eval-fn that works for any state. works for any state. – – too much to estimate all the probabilities of winning for too much to estimate all the probabilities of winning for too may categoriestoo may categories Compute separate numerical contributions from each feature and then combine Compute separate numerical contributions from each feature and then combine them to find the total value. -- a them to find the total value. -- a weighted linear functionweighted linear function

Eval(s)=wEval(s)=w1 1 ff1 1 (s)+w(s)+w2 2 ff2 2 (s)+ …. +w(s)+ …. +wn n ffn n (s)(s)

Page 22: 1 Chapter 6, Sec. 1 - 4 Adversarial Search. 2 Outline Optimal decisions α-β pruning Imperfect, real-time decisions.

2222

Evaluation functionsEvaluation functions

For chess, typically For chess, typically linearlinear weighted sum of weighted sum of featuresfeatures

Eval(s) = wEval(s) = w11 f f11(s) + w(s) + w22 f f22(s) + … + w(s) + … + wnn f fnn(s)(s)

e.g., e.g., ww11 = 9 = 9 with with

ff11(s)(s) = (number of white queens) – (number of black queens), etc. = (number of white queens) – (number of black queens), etc.

Deciding the features and weights are from human chess-playing Deciding the features and weights are from human chess-playing experience.experience.The weights of the The weights of the EvalEval-fn can be estimated by the machine learning -fn can be estimated by the machine learning techniques.techniques.

Page 23: 1 Chapter 6, Sec. 1 - 4 Adversarial Search. 2 Outline Optimal decisions α-β pruning Imperfect, real-time decisions.

2323

Cutting off searchCutting off search

Modify Modify -SEARCH-SEARCH so that it’ll call the so that it’ll call the Eval-fnEval-fn when to when to cutt off the searchcutt off the search..

MinimaxCutoffMinimaxCutoff is identical to is identical to MinimaxValueMinimaxValue except except1.1. Terminal?Terminal? is replaced by is replaced by Cutoff?Cutoff?2.2. UtilityUtility is replaced by is replaced by EvalEval..

E.g.) E.g.) Apply a depth-limit. => the amount of time used Apply a depth-limit. => the amount of time used

won’t exceed what the rules of the game allow.won’t exceed what the rules of the game allow.

Does it work in practice?Does it work in practice?bbmm = 10 = 1066, b=35 , b=35 m=4 m=4

Page 24: 1 Chapter 6, Sec. 1 - 4 Adversarial Search. 2 Outline Optimal decisions α-β pruning Imperfect, real-time decisions.

2424

Deterministic games in practiceDeterministic games in practice

Checkers: Checkers: Chinook ended 40-year-reign of human world champion Marion Tinsley Chinook ended 40-year-reign of human world champion Marion Tinsley in 1994. Used a precomputed endgame database defining perfect play in 1994. Used a precomputed endgame database defining perfect play for all positions involving 8 or fewer pieces on the board, a total of 444 for all positions involving 8 or fewer pieces on the board, a total of 444 billion positions.billion positions.

Chess: Chess: Deep Blue defeated human world champion Garry Kasparov in a six-Deep Blue defeated human world champion Garry Kasparov in a six-game match in 1997. Deep Blue searches 200 million positions per game match in 1997. Deep Blue searches 200 million positions per second, uses very sophisticated evaluation, and undisclosed methods second, uses very sophisticated evaluation, and undisclosed methods for extending some lines of search up to 40 ply.for extending some lines of search up to 40 ply.

Othello: Othello: human champions refuse to compete against computers, who are too human champions refuse to compete against computers, who are too good.good.

Go: Go: human champions refuse to compete against computers, who are too human champions refuse to compete against computers, who are too bad. In go, bad. In go, b > 300b > 300, so most programs use pattern knowledge bases to , so most programs use pattern knowledge bases to suggest plausible moves.suggest plausible moves.

Page 25: 1 Chapter 6, Sec. 1 - 4 Adversarial Search. 2 Outline Optimal decisions α-β pruning Imperfect, real-time decisions.

2525

SummarySummary

Games are fun to work on!Games are fun to work on!

They illustrate several important points about AIThey illustrate several important points about AI– perfection is unattainable perfection is unattainable must approximate must approximate– good idea to think about what to think aboutgood idea to think about what to think about