Chapter 6 Instructor : Miss Mahreen Nasir Butt. Outline Games Optimal decisions Minimax algorithm...

Chapter 6

Instructor : Miss Mahreen Nasir Butt

OutlineGamesOptimal decisionsMinimax algorithmα-β pruningImperfect, real-time decisions

2

GamesMulti agent environments : any given agent will need to

consider the actions of other agents and how they affect

its own welfare

The unpredictability of these other agents can introduce

many possible contingencies

There could be competitive or cooperative environments

Competitive environments, in which the agent’s goals are in

conflict require adversarial search – these problems are

called as games

3

Games

4

In game theory (economics), any multiagent environment

(either cooperative or competitive) is a game provided

that the impact of each agent on the other is significant

AI games are a specialized kind - deterministic, turn

taking, two-player, zero sum games of perfect information

In our terminology – deterministic, fully observable

environments with two agents whose actions alternate

and the utility values at the end of the game are always

equal and opposite (+1 and –1)

Optimal Decisions in Games

5

Consider games with two players (MAX, MIN)

Initial State

Board position and identifies the player to move

Successor Function

Returns a list of (move, state) pairs; each a legal move and resulting

stateo;

Terminal Test

Determines if the game is over (at terminal states)

Utility Function

Objective function, payoff function, a numeric value for the terminal

states (+1, -1) or (+192, -192)

We are not looking for a path, only the next move to make(that hopefully

leads to a wining state)

Our best move depends on what the other player does

Game Tree

6

Root node represents the configuration of the board at which a decision must be made

Root is labeled a "MAX" node indicating it is my turn; otherwise it is labeled a "MIN" (your turn)

Each level of the tree has nodes that are all MAX or all MIN

7

The root of the tree is the initial stateNext level is all of MAX’s movesNext level is all of MIN’s moves…

Example: Tic-Tac-ToeRoot has 9 blank squares (MAX)Level 1 has 8 blank squares (MIN)Level 2 has 7 blank squares (MAX)…

Utility function: win for X is +1win for O is -1

Game Trees

Tic-tac-toe: Game tree (2-player, deterministic, turns)

8

Optimal Strategies

9

In a normal search problem, the optimal solution would be

a sequence of moves leading to a goal state - a terminal

state that is a win

In a game, MIN has something to say about it and

therefore MAX must find a contingent strategy, which

specifies :

MAX’s move in the initial state,

then MAX’s moves in the states resulting from every

possible response by MIN

Optimal strategies

10

Then MAX’s moves in the states resulting

from every possible response by MIN to

those moves

An optimal strategy leads to outcomes at

least as good as any other

Strategy when one is playing an infallible

opponent

11

Basic Idea:

Choose the move with the highest minimax value

best achievable payoff against best play

Choose moves that will lead to a win, even though min is trying to block

Max’s goal: get to 1

Min’s goal: get to -1

Minimax value of a node (backed up value):

If N is terminal, use the utility value

If N is a Max move, take max of successors

If N is a Min move, take min of successors

Minimax Strategy

Minimax

12

Perfect play for deterministic gamesIdea: choose move to position with highest

minimax value = best achievable payoff against best play

E.g., 2-ply game:

A

B C D

Minimax value

13

Given a game tree, the optimal strategy can be

determined by examining the minimax value of

each node (MINIMAX-VALUE(n))

The minimax value of a node is the utility of being

in the corresponding state, assuming that both

players play optimally from there to the end of the

game

Given a choice, MAX prefer to move to a state of

maximum value, whereas MIN prefers a state of

minimum value

Minimax algorithm

14

Minimax

15

MINIMAX-VALUE(root) = max(min(3,12,8), min(2,4,6), min(14,5,2))

= max(3,2,2)

= 3

The algorithm first recurses down to the tree bottom-left nodes and uses the Utility function on them to discover that their values are 3, 12 and 8.

A

B C D

Minimax

16

Then it takes the minimum of these values, 3, and returns it as the backed-up value of node B.

Similar process for the other nodes.

A

CB D

Properties of minimax

17

Complete? Yes (if tree is finite)Optimal? Yes (against an optimal opponent)Time complexity? O(bm)Space complexity? O(bm) (depth-first

exploration)For chess, b ≈ 35, m ≈100 for "reasonable"

games Exact solution completely infeasible(إذا الشجرة هو محدود) استكمال؟ نعم

األمثل؟ نعم (ضد الخصم األمثل)O (BM) تعقيد الوقت؟(عمق االستكشاف والعشرين) O (BM) الفضاء التعقيد؟

ل≈ "معقولة" ألعاب100، م 35لعبة الشطرنج، ب ≈ بالضبط تماما الحل غير قابل للتطبيق

The minimax algorithm: problems

18

Problem with minimax search: The number of game states it has to examine is

exponential in the number of moves.

Unfortunately, the exponent can’t be eliminated, but it can be cut in half.

:مشكلة مع مينيماكس البحثعدد الدول اللعبة لديها لدراسة هو األسي في عدد من التحركات.

لألسف، ال يمكن أن يتم القضاء على األس، ولكن يمكن قطع .عليه في النصف

α-β pruning

19

It is possible to compute the correct minimax decision without looking at every node in the game tree.

Alpha-beta pruning allows to eliminate large parts of the tree from consideration, without influencing the final decision.

فمن الممكن لحساب القرار الصحيح مينيماكس دون النظر إلى كل عقدة فيشجرة لعبة.

ألفا بيتا تقليم يسمح للقضاء على أجزاء كبيرة من شجرة من النظر، دون .التأثير على القرار النهائي

α-β pruning

20

MINIMAX-VALUE(root) = max(min(3,12,8), min(2,x,y),min(14,5,2))

= max(3,min(2,x,y),2)

= max(3,z,2) where z <=2

= 3

α-β pruning example

21

It can be inferred that the value at the root is at least 3, because MAX has a choice worth 3.

، ألن لديه 3ويمكن استنتاج أن القيمة في جذور ما ال يقل عن 3بقيمة MAX خيار .


22

Therefore, there is no point in looking at the other successors of C.

لذلك، ال يوجد أي نقطة في النظر إلى غيرها من خلفاء C.


23

This is still higher than MAX’s best alternative (i.e., 3), so D’s other successors are explored.

) MAX هذا ال يزال أعلى من بديل أفضل 3أي، لذلك يتم ،(.سلع أخرى'D استكشاف خلفاء


24

The second successor of D is worth 5, so the exploration continues.

D 5 خليفة الثاني من الجدير ، .بحيث يواصل االستكشاف


25

MAX’s decision at the root is to move to B, giving a value of 3

Why is it called α-β?

26

α = the value of the best

(i.e., highest-value) choice

found so far at any choice

point along the path for max

β = the value of the best

(i.e., lowest value) choice

found so far along the path for

MIN

If v is worse than α, max will

avoid it

Prune that branch

α = العثور على قيمة أفضل خيار (أي أعلى قيمة) حتىاآلن في أي لحظة االختيار على طول مسار ماكس

β = العثور على قيمة أفضل (أي أقل قيمة) االختيار حتىMINاآلن على طول الطريق ل

والحد األقصى تجنب ذلك، αهو أسوأ من Vإذا تقليم ذلك الفرع

27

Properties of α-β

28

Pruning does not affect final result

Good move ordering improves effectiveness of

pruning

With "perfect ordering," time complexity =O(bm/2)

Doubles depth of search

A simple example of the value of reasoning about

which computations are relevant (a form of

metareasoning)

التقليم ال يؤثر النتيجة النهائيةخطوة جيدة يأمر يحسن فعالية التقليم

O (BM / 2) مع "الكمال طلب،" التعقيد الوقت = يضاعف عمق البحث

وهناك مثال بسيط من قيمة المنطق الحسابية حول أي (metareasoningوثيقة الصلة (شكل من أشكال

29

MinMax – AlphaBeta Pruning

30

MinMax searches entire tree, even if in some cases the rest can be

ignored

In general, stop evaluating move when find worse than previously

examined move

Does not benefit the player to play that move, it need not be

evaluated any further.

Save processing time without affecting final result

MinMax يبحث شجرة بأكملها، حتى لو في بعض الحاالت يمكن تجاهل بقية

بشكل عام، عندما وقف تقييم الخطوة تجد أسوأ من الخطوة سبق النظر فيها

ال يستفيد الالعب للعب هذا التحرك، ليس من الضروري أن يتم تقييم أكثر

.من ذلك

توفيرا للوقت دون التأثير على معالجة النتيجة النهائية

Chapter 6 Instructor : Miss Mahreen Nasir Butt. Outline Games Optimal decisions Minimax algorithm...

Documents

Transcript of Chapter 6 Instructor : Miss Mahreen Nasir Butt. Outline Games Optimal decisions Minimax algorithm...