1 6 Chaptercsatol/mestint/pdfs/russell06_jatek.pdf · 2006. 9. 3. · T yp es of games...

Gameplaying

Chapter6

Chapter61

Outline

♦Games

♦Perfectplay–minimaxdecisions–α–βpruning

♦Resourcelimitsandapproximateevaluation

♦Gamesofchance

♦Gamesofimperfectinformation

Chapter62

Gamesvs.searchproblems

“Unpredictable”opponent⇒solutionisastrategyspecifyingamoveforeverypossibleopponentreply

Timelimits⇒unlikelytofindgoal,mustapproximate

Planofattack:

•Computerconsiderspossiblelinesofplay(Babbage,1846)

•Algorithmforperfectplay(Zermelo,1912;VonNeumann,1944)

•Finitehorizon,approximateevaluation(Zuse,1945;Wiener,1948;Shannon,1950)

•Firstchessprogram(Turing,1951)

•Machinelearningtoimproveevaluationaccuracy(Samuel,1952–57)

•Pruningtoallowdeepersearch(McCarthy,1956)

Chapter63

Typesofgames

deterministicchance

perfect information

imperfect information

chess, checkers,go, othello

backgammonmonopoly

bridge, poker, scrabblenuclear war

battleships,blind tictactoe

Chapter64

Gametree(2-player,deterministic,turns)

X XX X

XX

X

X X

MAX (X)

MIN (O)

XX

O

OO XO

OOO

OO O

MAX (X)

XO XO XOXXX

XX

XX

MIN (O)

XOXXOXXOX

. . .. . .. . .. . .

. . .

. . .

. . .

TERMINALX X

−1 0+1 Utility

Chapter65

Minimax

Perfectplayfordeterministic,perfect-informationgames

Idea:choosemovetopositionwithhighestminimaxvalue=bestachievablepayoffagainstbestplay

E.g.,2-plygame:

MAX

31286 4 21452

MIN

3

A1A3 A2

A13 A12 A11A21A23 A22A33 A32 A31

322

Chapter66

Minimaxalgorithm

functionMinimax-Decision(state)returnsanaction

inputs:state,currentstateingame

returntheainActions(state)maximizingMin-Value(Result(a,state))

functionMax-Value(state)returnsautilityvalue

ifTerminal-Test(state)thenreturnUtility(state)

v←−∞

fora,sinSuccessors(state)dov←Max(v,Min-Value(s))

returnv

functionMin-Value(state)returnsautilityvalue


v←∞

fora,sinSuccessors(state)dov←Min(v,Max-Value(s))

returnv

Chapter67

Propertiesofminimax

Complete??

Chapter68

Propertiesofminimax

Complete??Onlyiftreeisfinite(chesshasspecificrulesforthis).NBafinitestrategycanexisteveninaninfinitetree!

Optimal??

Chapter69

Propertiesofminimax

Complete??Yes,iftreeisfinite(chesshasspecificrulesforthis)

Optimal??Yes,againstanoptimalopponent.Otherwise??

Timecomplexity??

Chapter610

Propertiesofminimax



Timecomplexity??O(bm

)

Spacecomplexity??

Chapter611

Propertiesofminimax



Timecomplexity??O(bm

)

Spacecomplexity??O(bm)(depth-firstexploration)

Forchess,b≈35,m≈100for“reasonable”games⇒exactsolutioncompletelyinfeasible

Butdoweneedtoexploreeverypath?

Chapter612

α–βpruningexample

MAX

3128

MIN3

3

Chapter613


MAX

3128

MIN3

2

2

XX

3

Chapter614


MAX

3128

MIN3

2

2

XX14

14

3

Chapter615


MAX

3128

MIN3

2

2

XX14

14

5

5

3

Chapter616


MAX

3128

MIN

3

3

2

2

XX14

14

5

5

2

2

3

Chapter617

Whyisitcalledα–β?

..

..

..

MAX

MIN

MAX

MINV

αisthebestvalue(tomax)foundsofaroffthecurrentpath

IfVisworsethanα,maxwillavoidit⇒prunethatbranch

Defineβsimilarlyformin

Chapter618

Theα–βalgorithm

functionAlpha-Beta-Decision(state)returnsanaction

returntheainActions(state)maximizingMin-Value(Result(a,state))

functionMax-Value(state,α,β)returnsautilityvalue

inputs:state,currentstateingame

α,thevalueofthebestalternativeformaxalongthepathtostate

β,thevalueofthebestalternativeforminalongthepathtostate


v←−∞

fora,sinSuccessors(state)do

v←Max(v,Min-Value(s,α,β))

ifv≥βthenreturnv

α←Max(α,v)

returnv

functionMin-Value(state,α,β)returnsautilityvalue

sameasMax-Valuebutwithrolesofα,βreversed

Chapter619

Propertiesofα–β

Pruningdoesnotaffectfinalresult

Goodmoveorderingimproveseffectivenessofpruning

With“perfectordering,”timecomplexity=O(bm/2

)⇒doublessolvabledepth

Asimpleexampleofthevalueofreasoningaboutwhichcomputationsarerelevant(aformofmetareasoning)

Unfortunately,3550

isstillimpossible!

Chapter620

Resourcelimits

Standardapproach:

•UseCutoff-TestinsteadofTerminal-Test

e.g.,depthlimit(perhapsaddquiescencesearch)

•UseEvalinsteadofUtility

i.e.,evaluationfunctionthatestimatesdesirabilityofposition

Supposewehave100seconds,explore104

nodes/second⇒10

6nodespermove≈35

8/2

⇒α–βreachesdepth8⇒prettygoodchessprogram

Chapter621

Evaluationfunctions

Black to move

White slightly better

White to move

Black winning

Forchess,typicallylinearweightedsumoffeatures

Eval(s)=w1f1(s)+w2f2(s)+...+wnfn(s)

e.g.,w1=9withf1(s)=(numberofwhitequeens)–(numberofblackqueens),etc.

Chapter622

Digression:Exactvaluesdon’tmatter

MIN

MAX

2 1

1

4 2

2

20

1

1400 20

20

BehaviourispreservedunderanymonotonictransformationofEval

Onlytheordermatters:payoffindeterministicgamesactsasanordinalutilityfunction

Chapter623

Deterministicgamesinpractice

Checkers:Chinookended40-year-reignofhumanworldchampionMarionTinsleyin1994.Usedanendgamedatabasedefiningperfectplayforallpositionsinvolving8orfewerpiecesontheboard,atotalof443,748,401,247positions.

Chess:DeepBluedefeatedhumanworldchampionGaryKasparovinasix-gamematchin1997.DeepBluesearches200millionpositionspersecond,usesverysophisticatedevaluation,andundisclosedmethodsforextendingsomelinesofsearchupto40ply.

Othello:humanchampionsrefusetocompeteagainstcomputers,whoaretoogood.

Go:humanchampionsrefusetocompeteagainstcomputers,whoaretoobad.Ingo,b>300,somostprogramsusepatternknowledgebasestosuggestplausiblemoves.

Chapter624

Nondeterministicgames:backgammon

123456789101112

242322212019181716151413

0

25

Chapter625

Nondeterministicgamesingeneral

Innondeterministicgames,chanceintroducedbydice,card-shuffling

Simplifiedexamplewithcoin-flipping:

MIN

MAX

2

CHANCE

474605−2

240−2

0.50.50.50.5

3−1

Chapter626

Algorithmfornondeterministicgames

Expectiminimaxgivesperfectplay

JustlikeMinimax,exceptwemustalsohandlechancenodes:

...ifstateisaMaxnodethen

returnthehighestExpectiMinimax-ValueofSuccessors(state)ifstateisaMinnodethen

returnthelowestExpectiMinimax-ValueofSuccessors(state)ifstateisachancenodethen

returnaverageofExpectiMinimax-ValueofSuccessors(state)...

Chapter627

Nondeterministicgamesinpractice

Dicerollsincreaseb:21possiblerollswith2diceBackgammon≈20legalmoves(canbe6,000with1-1roll)

depth4=20×(21×20)3≈1.2×10

9

Asdepthincreases,probabilityofreachingagivennodeshrinks⇒valueoflookaheadisdiminished

α–βpruningismuchlesseffective

TDGammonusesdepth-2search+verygoodEval

≈world-championlevel

Chapter628

Digression:ExactvaluesDOmatter

DICE

MIN

MAX

22331144

2314

.9.1.9.1

2.11.3

2020303011400400

20301400

.9.1.9.1

2140.9

BehaviourispreservedonlybypositivelineartransformationofEval

HenceEvalshouldbeproportionaltotheexpectedpayoff

Chapter629

Gamesofimperfectinformation

E.g.,cardgames,whereopponent’sinitialcardsareunknown

Typicallywecancalculateaprobabilityforeachpossibledeal

Seemsjustlikehavingonebigdicerollatthebeginningofthegame∗

Idea:computetheminimaxvalueofeachactionineachdeal,thenchoosetheactionwithhighestexpectedvalueoveralldeals

∗

Specialcase:ifanactionisoptimalforalldeals,it’soptimal.∗

GIB,currentbestbridgeprogram,approximatesthisideaby1)generating100dealsconsistentwithbiddinginformation2)pickingtheactionthatwinsmosttricksonaverage

Chapter630

Example

Four-cardbridge/whist/heartshand,Maxtoplayfirst

8

92

6 668766766766767

4293429342343430

Chapter631

Example


6

4

8

92

6 668766766766767

4293429342343430

8

92

66876676676677

29329323330 4 4 4 4

6 MAX

MIN

MAX

MIN

Chapter632

Example


8

92

6 668766766766767

4293429342343430

6

4

8

92

66876676676677

29329323330 4 4 4 4

6

6

4

8

92

6687667667

29329323

7

3

6

4667

3 4 4 46

6

7

3 4

−0.5

−0.5

MAX

MIN

MAX

MIN

MAX

MIN

Chapter633

Commonsenseexample

RoadAleadstoasmallheapofgoldpiecesRoadBleadstoafork:

taketheleftforkandyou’llfindamoundofjewels;taketherightforkandyou’llberunoverbyabus.

Chapter634

Commonsenseexample




taketheleftforkandyou’llberunoverbyabus;taketherightforkandyou’llfindamoundofjewels.

Chapter635

Commonsenseexample




taketheleftforkandyou’llberunoverbyabus;taketherightforkandyou’llfindamoundofjewels.


guesscorrectlyandyou’llfindamoundofjewels;guessincorrectlyandyou’llberunoverbyabus.

Chapter636

Properanalysis

*IntuitionthatthevalueofanactionistheaverageofitsvaluesinallactualstatesisWRONG

Withpartialobservability,valueofanactiondependsontheinformationstateorbeliefstatetheagentisin

Cangenerateandsearchatreeofinformationstates

Leadstorationalbehaviorssuchas♦Actingtoobtaininformation♦Signallingtoone’spartner♦Actingrandomlytominimizeinformationdisclosure

Chapter637

Summary

Gamesarefuntoworkon!(anddangerous)

TheyillustrateseveralimportantpointsaboutAI

♦perfectionisunattainable⇒mustapproximate

♦goodideatothinkaboutwhattothinkabout

♦uncertaintyconstrainstheassignmentofvaluestostates

♦optimaldecisionsdependoninformationstate,notrealstate

GamesaretoAIasgrandprixracingistoautomobiledesign

Chapter638

1 6 Chaptercsatol/mestint/pdfs/russell06_jatek.pdf · 2006. 9. 3. · T yp es of games...

Documents

Transcript of 1 6 Chaptercsatol/mestint/pdfs/russell06_jatek.pdf · 2006. 9. 3. · T yp es of games...