Chapter 6
Instructor : Miss Mahreen Nasir Butt
OutlineGamesOptimal decisionsMinimax algorithmα-β pruningImperfect, real-time decisions
2
GamesMulti agent environments : any given agent will need to
consider the actions of other agents and how they affect
its own welfare
The unpredictability of these other agents can introduce
many possible contingencies
There could be competitive or cooperative environments
Competitive environments, in which the agent’s goals are in
conflict require adversarial search – these problems are
called as games
3
Games
4
In game theory (economics), any multiagent environment
(either cooperative or competitive) is a game provided
that the impact of each agent on the other is significant
AI games are a specialized kind - deterministic, turn
taking, two-player, zero sum games of perfect information
In our terminology – deterministic, fully observable
environments with two agents whose actions alternate
and the utility values at the end of the game are always
equal and opposite (+1 and –1)
Optimal Decisions in Games
5
Consider games with two players (MAX, MIN)
Initial State
Board position and identifies the player to move
Successor Function
Returns a list of (move, state) pairs; each a legal move and resulting
stateo;
Terminal Test
Determines if the game is over (at terminal states)
Utility Function
Objective function, payoff function, a numeric value for the terminal
states (+1, -1) or (+192, -192)
We are not looking for a path, only the next move to make(that hopefully
leads to a wining state)
Our best move depends on what the other player does
Game Tree
6
Root node represents the configuration of the board at which a decision must be made
Root is labeled a "MAX" node indicating it is my turn; otherwise it is labeled a "MIN" (your turn)
Each level of the tree has nodes that are all MAX or all MIN
7
The root of the tree is the initial stateNext level is all of MAX’s movesNext level is all of MIN’s moves…
Example: Tic-Tac-ToeRoot has 9 blank squares (MAX)Level 1 has 8 blank squares (MIN)Level 2 has 7 blank squares (MAX)…
Utility function: win for X is +1win for O is -1
Game Trees
Tic-tac-toe: Game tree (2-player, deterministic, turns)
8
Optimal Strategies
9
In a normal search problem, the optimal solution would be
a sequence of moves leading to a goal state - a terminal
state that is a win
In a game, MIN has something to say about it and
therefore MAX must find a contingent strategy, which
specifies :
MAX’s move in the initial state,
then MAX’s moves in the states resulting from every
possible response by MIN
Optimal strategies
10
Then MAX’s moves in the states resulting
from every possible response by MIN to
those moves
An optimal strategy leads to outcomes at
least as good as any other
Strategy when one is playing an infallible
opponent
11
Basic Idea:
Choose the move with the highest minimax value
best achievable payoff against best play
Choose moves that will lead to a win, even though min is trying to block
Max’s goal: get to 1
Min’s goal: get to -1
Minimax value of a node (backed up value):
If N is terminal, use the utility value
If N is a Max move, take max of successors
If N is a Min move, take min of successors
Minimax Strategy
Minimax
12
Perfect play for deterministic gamesIdea: choose move to position with highest
minimax value = best achievable payoff against best play
E.g., 2-ply game:
A
B C D
Minimax value
13
Given a game tree, the optimal strategy can be
determined by examining the minimax value of
each node (MINIMAX-VALUE(n))
The minimax value of a node is the utility of being
in the corresponding state, assuming that both
players play optimally from there to the end of the
game
Given a choice, MAX prefer to move to a state of
maximum value, whereas MIN prefers a state of
minimum value
Minimax algorithm
14
Minimax
15
MINIMAX-VALUE(root) = max(min(3,12,8), min(2,4,6), min(14,5,2))
= max(3,2,2)
= 3
The algorithm first recurses down to the tree bottom-left nodes and uses the Utility function on them to discover that their values are 3, 12 and 8.
A
B C D
Minimax
16
Then it takes the minimum of these values, 3, and returns it as the backed-up value of node B.
Similar process for the other nodes.
A
CB D
Properties of minimax
17
Complete? Yes (if tree is finite)Optimal? Yes (against an optimal opponent)Time complexity? O(bm)Space complexity? O(bm) (depth-first
exploration)For chess, b ≈ 35, m ≈100 for "reasonable"
games Exact solution completely infeasible(إذا الشجرة هو محدود) استكمال؟ نعم
األمثل؟ نعم (ضد الخصم األمثل)O (BM) تعقيد الوقت؟(عمق االستكشاف والعشرين) O (BM) الفضاء التعقيد؟
ل≈ "معقولة" ألعاب100، م 35لعبة الشطرنج، ب ≈ بالضبط تماما الحل غير قابل للتطبيق
The minimax algorithm: problems
18
Problem with minimax search: The number of game states it has to examine is
exponential in the number of moves.
Unfortunately, the exponent can’t be eliminated, but it can be cut in half.
:مشكلة مع مينيماكس البحثعدد الدول اللعبة لديها لدراسة هو األسي في عدد من التحركات.
لألسف، ال يمكن أن يتم القضاء على األس، ولكن يمكن قطع .عليه في النصف
α-β pruning
19
It is possible to compute the correct minimax decision without looking at every node in the game tree.
Alpha-beta pruning allows to eliminate large parts of the tree from consideration, without influencing the final decision.
فمن الممكن لحساب القرار الصحيح مينيماكس دون النظر إلى كل عقدة فيشجرة لعبة.
ألفا بيتا تقليم يسمح للقضاء على أجزاء كبيرة من شجرة من النظر، دون .التأثير على القرار النهائي
α-β pruning
20
MINIMAX-VALUE(root) = max(min(3,12,8), min(2,x,y),min(14,5,2))
= max(3,min(2,x,y),2)
= max(3,z,2) where z <=2
= 3
α-β pruning example
21
It can be inferred that the value at the root is at least 3, because MAX has a choice worth 3.
، ألن لديه 3ويمكن استنتاج أن القيمة في جذور ما ال يقل عن 3بقيمة MAX خيار .
α-β pruning example
22
Therefore, there is no point in looking at the other successors of C.
لذلك، ال يوجد أي نقطة في النظر إلى غيرها من خلفاء C.
α-β pruning example
23
This is still higher than MAX’s best alternative (i.e., 3), so D’s other successors are explored.
) MAX هذا ال يزال أعلى من بديل أفضل 3أي، لذلك يتم ،(.سلع أخرى'D استكشاف خلفاء
α-β pruning example
24
The second successor of D is worth 5, so the exploration continues.
D 5 خليفة الثاني من الجدير ، .بحيث يواصل االستكشاف
α-β pruning example
25
MAX’s decision at the root is to move to B, giving a value of 3
Why is it called α-β?
26
α = the value of the best
(i.e., highest-value) choice
found so far at any choice
point along the path for max
β = the value of the best
(i.e., lowest value) choice
found so far along the path for
MIN
If v is worse than α, max will
avoid it
Prune that branch
α = العثور على قيمة أفضل خيار (أي أعلى قيمة) حتىاآلن في أي لحظة االختيار على طول مسار ماكس
β = العثور على قيمة أفضل (أي أقل قيمة) االختيار حتىMINاآلن على طول الطريق ل
والحد األقصى تجنب ذلك، αهو أسوأ من Vإذا تقليم ذلك الفرع
27
Properties of α-β
28
Pruning does not affect final result
Good move ordering improves effectiveness of
pruning
With "perfect ordering," time complexity =O(bm/2)
Doubles depth of search
A simple example of the value of reasoning about
which computations are relevant (a form of
metareasoning)
التقليم ال يؤثر النتيجة النهائيةخطوة جيدة يأمر يحسن فعالية التقليم
O (BM / 2) مع "الكمال طلب،" التعقيد الوقت = يضاعف عمق البحث
وهناك مثال بسيط من قيمة المنطق الحسابية حول أي (metareasoningوثيقة الصلة (شكل من أشكال
29
MinMax – AlphaBeta Pruning
30
MinMax searches entire tree, even if in some cases the rest can be
ignored
In general, stop evaluating move when find worse than previously
examined move
Does not benefit the player to play that move, it need not be
evaluated any further.
Save processing time without affecting final result
MinMax يبحث شجرة بأكملها، حتى لو في بعض الحاالت يمكن تجاهل بقية
بشكل عام، عندما وقف تقييم الخطوة تجد أسوأ من الخطوة سبق النظر فيها
ال يستفيد الالعب للعب هذا التحرك، ليس من الضروري أن يتم تقييم أكثر
.من ذلك
توفيرا للوقت دون التأثير على معالجة النتيجة النهائية
Top Related