Download - Prediction, Learning and Games - Chapter 7 Prediction and ...walid.krichene.net/notes/reading-plg-chap7.pdf · Prediction,LearningandGames-Chapter7 PredictionandPlayingGames WalidKrichene

Transcript
Page 1: Prediction, Learning and Games - Chapter 7 Prediction and ...walid.krichene.net/notes/reading-plg-chap7.pdf · Prediction,LearningandGames-Chapter7 PredictionandPlayingGames WalidKrichene

Prediction, Learning and Games - Chapter 7Prediction and Playing Games

Walid Krichene

November 4, 2013

Walid Krichene Prediction, Learning and Games - Chapter 7 Prediction and Playing GamesNovember 4, 2013 1 / 21

Page 2: Prediction, Learning and Games - Chapter 7 Prediction and ...walid.krichene.net/notes/reading-plg-chap7.pdf · Prediction,LearningandGames-Chapter7 PredictionandPlayingGames WalidKrichene

The setting

K playersplayer k : Nk actions

joint action vector i ∈ [N1]× · · · × [NK ]

loss vector `(i) ∈ [0, 1]

players randomize: mixed strategy profile π ∈ ∆[N1] × · · · ×∆[NK ]

π(i) = p(1)(i1) . . . p(K)(iK )

joint random action I ∈ [N1]× [NK ], I ∼ πexpected loss E[`(k)] =

∑i π(i)`(k)(i)

by abuse of notation `(k)(π) (can think of it as a linear extension of ` from[N1]× · · · × [NK ] to ∆[N1] × · · · ×∆[NK ])

Walid Krichene Prediction, Learning and Games - Chapter 7 Prediction and Playing GamesNovember 4, 2013 2 / 21

Page 3: Prediction, Learning and Games - Chapter 7 Prediction and ...walid.krichene.net/notes/reading-plg-chap7.pdf · Prediction,LearningandGames-Chapter7 PredictionandPlayingGames WalidKrichene

The setting

K playersplayer k : Nk actionsjoint action vector i ∈ [N1]× · · · × [NK ]

loss vector `(i) ∈ [0, 1]

players randomize: mixed strategy profile π ∈ ∆[N1] × · · · ×∆[NK ]

π(i) = p(1)(i1) . . . p(K)(iK )

joint random action I ∈ [N1]× [NK ], I ∼ πexpected loss E[`(k)] =

∑i π(i)`(k)(i)

by abuse of notation `(k)(π) (can think of it as a linear extension of ` from[N1]× · · · × [NK ] to ∆[N1] × · · · ×∆[NK ])

Walid Krichene Prediction, Learning and Games - Chapter 7 Prediction and Playing GamesNovember 4, 2013 2 / 21

Page 4: Prediction, Learning and Games - Chapter 7 Prediction and ...walid.krichene.net/notes/reading-plg-chap7.pdf · Prediction,LearningandGames-Chapter7 PredictionandPlayingGames WalidKrichene

The setting

K playersplayer k : Nk actionsjoint action vector i ∈ [N1]× · · · × [NK ]

loss vector `(i) ∈ [0, 1]

players randomize: mixed strategy profile π ∈ ∆[N1] × · · · ×∆[NK ]

π(i) = p(1)(i1) . . . p(K)(iK )

joint random action I ∈ [N1]× [NK ], I ∼ πexpected loss E[`(k)] =

∑i π(i)`(k)(i)

by abuse of notation `(k)(π) (can think of it as a linear extension of ` from[N1]× · · · × [NK ] to ∆[N1] × · · · ×∆[NK ])

Walid Krichene Prediction, Learning and Games - Chapter 7 Prediction and Playing GamesNovember 4, 2013 2 / 21

Page 5: Prediction, Learning and Games - Chapter 7 Prediction and ...walid.krichene.net/notes/reading-plg-chap7.pdf · Prediction,LearningandGames-Chapter7 PredictionandPlayingGames WalidKrichene

Nash equilibrium

Nash equilibriump is a Nash eq. if for all k and all p′

`(k)(p(k), p(−k)) ≤ `(k)(p′, p(−k))

no one has an incentive to unilaterally deviate.set of Nash equilibria: N

Walid Krichene Prediction, Learning and Games - Chapter 7 Prediction and Playing GamesNovember 4, 2013 3 / 21

Page 6: Prediction, Learning and Games - Chapter 7 Prediction and ...walid.krichene.net/notes/reading-plg-chap7.pdf · Prediction,LearningandGames-Chapter7 PredictionandPlayingGames WalidKrichene

Correlated equilibria

Correlated equilibria (Aumann)Probability distribution P over [N1]× · · · × [NK ], which is not necessarily aproduct distribution

Some coordinator gives advice I ∼ PIf player unilaterally deviates, worse-off in expectation.

Equivalent to “switching j and j ′ is a bad idea”. For all j , j ′∑i :ik=j

P(i)[`(k)(j , i (−k))− `(k)(j ′, i (−k))] ≤ 0

Link to internal regret.

Structure of correlated equilibriaThink of P as a vector in ∆[N1]×···×[NK ]

Linear inequalities in P ⇒ closed convex polyhedronConvex combination of Nash eq ⇒ correlated eq. (but not ⇐)

Walid Krichene Prediction, Learning and Games - Chapter 7 Prediction and Playing GamesNovember 4, 2013 4 / 21

Page 7: Prediction, Learning and Games - Chapter 7 Prediction and ...walid.krichene.net/notes/reading-plg-chap7.pdf · Prediction,LearningandGames-Chapter7 PredictionandPlayingGames WalidKrichene

Correlated equilibria

Correlated equilibria (Aumann)Probability distribution P over [N1]× · · · × [NK ], which is not necessarily aproduct distributionSome coordinator gives advice I ∼ PIf player unilaterally deviates, worse-off in expectation.

Equivalent to “switching j and j ′ is a bad idea”. For all j , j ′∑i :ik=j

P(i)[`(k)(j , i (−k))− `(k)(j ′, i (−k))] ≤ 0

Link to internal regret.

Structure of correlated equilibriaThink of P as a vector in ∆[N1]×···×[NK ]

Linear inequalities in P ⇒ closed convex polyhedronConvex combination of Nash eq ⇒ correlated eq. (but not ⇐)

Walid Krichene Prediction, Learning and Games - Chapter 7 Prediction and Playing GamesNovember 4, 2013 4 / 21

Page 8: Prediction, Learning and Games - Chapter 7 Prediction and ...walid.krichene.net/notes/reading-plg-chap7.pdf · Prediction,LearningandGames-Chapter7 PredictionandPlayingGames WalidKrichene

Correlated equilibria

Correlated equilibria (Aumann)Probability distribution P over [N1]× · · · × [NK ], which is not necessarily aproduct distributionSome coordinator gives advice I ∼ PIf player unilaterally deviates, worse-off in expectation.

Equivalent to “switching j and j ′ is a bad idea”. For all j , j ′∑i :ik=j

P(i)[`(k)(j , i (−k))− `(k)(j ′, i (−k))] ≤ 0

Link to internal regret.

Structure of correlated equilibriaThink of P as a vector in ∆[N1]×···×[NK ]

Linear inequalities in P ⇒ closed convex polyhedronConvex combination of Nash eq ⇒ correlated eq. (but not ⇐)

Walid Krichene Prediction, Learning and Games - Chapter 7 Prediction and Playing GamesNovember 4, 2013 4 / 21

Page 9: Prediction, Learning and Games - Chapter 7 Prediction and ...walid.krichene.net/notes/reading-plg-chap7.pdf · Prediction,LearningandGames-Chapter7 PredictionandPlayingGames WalidKrichene

Correlated equilibria

Correlated equilibria (Aumann)Probability distribution P over [N1]× · · · × [NK ], which is not necessarily aproduct distributionSome coordinator gives advice I ∼ PIf player unilaterally deviates, worse-off in expectation.

Equivalent to “switching j and j ′ is a bad idea”. For all j , j ′∑i :ik=j

P(i)[`(k)(j , i (−k))− `(k)(j ′, i (−k))] ≤ 0

Link to internal regret.

Structure of correlated equilibriaThink of P as a vector in ∆[N1]×···×[NK ]

Linear inequalities in P ⇒ closed convex polyhedronConvex combination of Nash eq ⇒ correlated eq. (but not ⇐)

Walid Krichene Prediction, Learning and Games - Chapter 7 Prediction and Playing GamesNovember 4, 2013 4 / 21

Page 10: Prediction, Learning and Games - Chapter 7 Prediction and ...walid.krichene.net/notes/reading-plg-chap7.pdf · Prediction,LearningandGames-Chapter7 PredictionandPlayingGames WalidKrichene

Repeated games

k maintains p(k)t , draws action I k

t ∼ p(k)t

observes all players’ actions It = (I (1)t , . . . , I (K)

t )

Uncoupled play: player only knows his loss function. In fact, only his regret.

Regret-minimizing procedureDoes the repeated play converge to N (or other equilibrium concept) in somesense?

Summary of resultsMinimizing regret:

I average marginal distributions p(1) × · · · × p(K) converge to NI average joint distribution p(1) × p(K) converges to the Hannan set

Minimizing internal regret:I average joint distribution p(1) × p(K) converges to correlated equilibria

Walid Krichene Prediction, Learning and Games - Chapter 7 Prediction and Playing GamesNovember 4, 2013 5 / 21

Page 11: Prediction, Learning and Games - Chapter 7 Prediction and ...walid.krichene.net/notes/reading-plg-chap7.pdf · Prediction,LearningandGames-Chapter7 PredictionandPlayingGames WalidKrichene

Repeated games

k maintains p(k)t , draws action I k

t ∼ p(k)t

observes all players’ actions It = (I (1)t , . . . , I (K)

t )

Uncoupled play: player only knows his loss function.

In fact, only his regret.

Regret-minimizing procedureDoes the repeated play converge to N (or other equilibrium concept) in somesense?

Summary of resultsMinimizing regret:

I average marginal distributions p(1) × · · · × p(K) converge to NI average joint distribution p(1) × p(K) converges to the Hannan set

Minimizing internal regret:I average joint distribution p(1) × p(K) converges to correlated equilibria

Walid Krichene Prediction, Learning and Games - Chapter 7 Prediction and Playing GamesNovember 4, 2013 5 / 21

Page 12: Prediction, Learning and Games - Chapter 7 Prediction and ...walid.krichene.net/notes/reading-plg-chap7.pdf · Prediction,LearningandGames-Chapter7 PredictionandPlayingGames WalidKrichene

Repeated games

k maintains p(k)t , draws action I k

t ∼ p(k)t

observes all players’ actions It = (I (1)t , . . . , I (K)

t )

Uncoupled play: player only knows his loss function. In fact, only his regret.

Regret-minimizing procedureDoes the repeated play converge to N (or other equilibrium concept) in somesense?

Summary of resultsMinimizing regret:

I average marginal distributions p(1) × · · · × p(K) converge to NI average joint distribution p(1) × p(K) converges to the Hannan set

Minimizing internal regret:I average joint distribution p(1) × p(K) converges to correlated equilibria

Walid Krichene Prediction, Learning and Games - Chapter 7 Prediction and Playing GamesNovember 4, 2013 5 / 21

Page 13: Prediction, Learning and Games - Chapter 7 Prediction and ...walid.krichene.net/notes/reading-plg-chap7.pdf · Prediction,LearningandGames-Chapter7 PredictionandPlayingGames WalidKrichene

Repeated games

k maintains p(k)t , draws action I k

t ∼ p(k)t

observes all players’ actions It = (I (1)t , . . . , I (K)

t )

Uncoupled play: player only knows his loss function. In fact, only his regret.

Regret-minimizing procedureDoes the repeated play converge to N (or other equilibrium concept) in somesense?

Summary of resultsMinimizing regret:

I average marginal distributions p(1) × · · · × p(K) converge to NI average joint distribution p(1) × p(K) converges to the Hannan set

Minimizing internal regret:I average joint distribution p(1) × p(K) converges to correlated equilibria

Walid Krichene Prediction, Learning and Games - Chapter 7 Prediction and Playing GamesNovember 4, 2013 5 / 21

Page 14: Prediction, Learning and Games - Chapter 7 Prediction and ...walid.krichene.net/notes/reading-plg-chap7.pdf · Prediction,LearningandGames-Chapter7 PredictionandPlayingGames WalidKrichene

Regret-minimization Vs. fictitious playOn iteration t, play best response to cumulative losses

`(k)t : vector of losses `(k)

t (i (k)) = `(k)(i (k), I (−k)t )

L(k)t =

t∑τ=1

`(k)τ

Fictitious play

i (k)t+1 ∈ arg min

i∈[Nk ]L(k)

t (i)

equivalent tomin

p∈∆[Nk ]

⟨p, L(k)

t

Only converges in very special cases (e.g. 2 players, 2 actions each)Not Hannan-consistent (does not minimize average expected regret)

Walid Krichene Prediction, Learning and Games - Chapter 7 Prediction and Playing GamesNovember 4, 2013 6 / 21

Page 15: Prediction, Learning and Games - Chapter 7 Prediction and ...walid.krichene.net/notes/reading-plg-chap7.pdf · Prediction,LearningandGames-Chapter7 PredictionandPlayingGames WalidKrichene

Regret-minimization Vs. fictitious playOn iteration t, play best response to cumulative losses

`(k)t : vector of losses `(k)

t (i (k)) = `(k)(i (k), I (−k)t )

L(k)t =

t∑τ=1

`(k)τ

Fictitious play

i (k)t+1 ∈ arg min

i∈[Nk ]L(k)

t (i)

equivalent tomin

p∈∆[Nk ]

⟨p, L(k)

t

Only converges in very special cases (e.g. 2 players, 2 actions each)Not Hannan-consistent (does not minimize average expected regret)

Walid Krichene Prediction, Learning and Games - Chapter 7 Prediction and Playing GamesNovember 4, 2013 6 / 21

Page 16: Prediction, Learning and Games - Chapter 7 Prediction and ...walid.krichene.net/notes/reading-plg-chap7.pdf · Prediction,LearningandGames-Chapter7 PredictionandPlayingGames WalidKrichene

Regret-minimization Vs. fictitious playOn iteration t, play best response to cumulative losses

`(k)t : vector of losses `(k)

t (i (k)) = `(k)(i (k), I (−k)t )

L(k)t =

t∑τ=1

`(k)τ

Fictitious play

i (k)t+1 ∈ arg min

i∈[Nk ]L(k)

t (i)

equivalent tomin

p∈∆[Nk ]

⟨p, L(k)

t

Only converges in very special cases (e.g. 2 players, 2 actions each)Not Hannan-consistent (does not minimize average expected regret)

Walid Krichene Prediction, Learning and Games - Chapter 7 Prediction and Playing GamesNovember 4, 2013 6 / 21

Page 17: Prediction, Learning and Games - Chapter 7 Prediction and ...walid.krichene.net/notes/reading-plg-chap7.pdf · Prediction,LearningandGames-Chapter7 PredictionandPlayingGames WalidKrichene

Regret-minimization Vs. fictitious play

Fictitious play

p(k)t+1 ∈ arg min

p∈∆[Nk ]

⟨p, L(k)

t

Regret-minimizationSome algorithms can be written as

p(k)t+1 ∈ arg min

p∈∆[Nk ]

⟨p, L(k)

t

⟩+1γ

R(p)

Walid Krichene Prediction, Learning and Games - Chapter 7 Prediction and Playing GamesNovember 4, 2013 7 / 21

Page 18: Prediction, Learning and Games - Chapter 7 Prediction and ...walid.krichene.net/notes/reading-plg-chap7.pdf · Prediction,LearningandGames-Chapter7 PredictionandPlayingGames WalidKrichene

A Minimax theorem

Minimax Theorem via Regret minimizationf (x , y)

X convex compact, f (·, y) is convex continuousY convex compact, f (x , ·) is concave continuous.

Theninfxsupy

f (x , y) = supy

infx

f (x , y)

≥: OK. When x plays first, worse than playing second.

Walid Krichene Prediction, Learning and Games - Chapter 7 Prediction and Playing GamesNovember 4, 2013 8 / 21

Page 19: Prediction, Learning and Games - Chapter 7 Prediction and ...walid.krichene.net/notes/reading-plg-chap7.pdf · Prediction,LearningandGames-Chapter7 PredictionandPlayingGames WalidKrichene

A Minimax theorem

Minimax Theorem via Regret minimizationf (x , y)

X convex compact, f (·, y) is convex continuousY convex compact, f (x , ·) is concave continuous.

Theninfxsupy

f (x , y) = supy

infx

f (x , y)

≥: OK. When x plays first, worse than playing second.

Walid Krichene Prediction, Learning and Games - Chapter 7 Prediction and Playing GamesNovember 4, 2013 8 / 21

Page 20: Prediction, Learning and Games - Chapter 7 Prediction and ...walid.krichene.net/notes/reading-plg-chap7.pdf · Prediction,LearningandGames-Chapter7 PredictionandPlayingGames WalidKrichene

A Minimax theorem

Minimax Theorem via Regret minimizationf (x , y)

X convex compact, f (·, y) is convex continuousY convex compact, f (x , ·) is concave continuous.

Theninfxsupy

f (x , y) = supy

infx

f (x , y)

≥: OK. When x plays first, worse than playing second.

Walid Krichene Prediction, Learning and Games - Chapter 7 Prediction and Playing GamesNovember 4, 2013 8 / 21

Page 21: Prediction, Learning and Games - Chapter 7 Prediction and ...walid.krichene.net/notes/reading-plg-chap7.pdf · Prediction,LearningandGames-Chapter7 PredictionandPlayingGames WalidKrichene

A Minimax theorem

proof of ≤:Idea: sequential play (xt), (yt), then

infxsupy

f (x , y) ≤ 1t

∑τ≤t

f (xτ , yτ ) ≤ supy

infx

f (x , y) + O(1√t

)

Walid Krichene Prediction, Learning and Games - Chapter 7 Prediction and Playing GamesNovember 4, 2013 9 / 21

Page 22: Prediction, Learning and Games - Chapter 7 Prediction and ...walid.krichene.net/notes/reading-plg-chap7.pdf · Prediction,LearningandGames-Chapter7 PredictionandPlayingGames WalidKrichene

A Minimax theorem

Let ε > 0.Find a finite cover of X by balls B(a(i), ε). Use a(i) as the actions.

x plays firstApplies regret-minimizing procedureaction set {a(i)}ilosses f (a(i), yt)

y plays secondplays best responseyt ∈ argmaxy∈Y f (xt , y)

xt =1t

t∑τ=1

xτ yt =1t

t∑τ=1

Walid Krichene Prediction, Learning and Games - Chapter 7 Prediction and Playing GamesNovember 4, 2013 10 / 21

Page 23: Prediction, Learning and Games - Chapter 7 Prediction and ...walid.krichene.net/notes/reading-plg-chap7.pdf · Prediction,LearningandGames-Chapter7 PredictionandPlayingGames WalidKrichene

A Minimax theorem

Let ε > 0.Find a finite cover of X by balls B(a(i), ε). Use a(i) as the actions.

x plays firstApplies regret-minimizing procedureaction set {a(i)}ilosses f (a(i), yt)

y plays secondplays best responseyt ∈ argmaxy∈Y f (xt , y)

xt =1t

t∑τ=1

xτ yt =1t

t∑τ=1

Walid Krichene Prediction, Learning and Games - Chapter 7 Prediction and Playing GamesNovember 4, 2013 10 / 21

Page 24: Prediction, Learning and Games - Chapter 7 Prediction and ...walid.krichene.net/notes/reading-plg-chap7.pdf · Prediction,LearningandGames-Chapter7 PredictionandPlayingGames WalidKrichene

A Minimax theorem

Let ε > 0.Find a finite cover of X by balls B(a(i), ε). Use a(i) as the actions.

x plays firstApplies regret-minimizing procedureaction set {a(i)}ilosses f (a(i), yt)

y plays secondplays best responseyt ∈ argmaxy∈Y f (xt , y)

xt =1t

t∑τ=1

xτ yt =1t

t∑τ=1

Walid Krichene Prediction, Learning and Games - Chapter 7 Prediction and Playing GamesNovember 4, 2013 10 / 21

Page 25: Prediction, Learning and Games - Chapter 7 Prediction and ...walid.krichene.net/notes/reading-plg-chap7.pdf · Prediction,LearningandGames-Chapter7 PredictionandPlayingGames WalidKrichene

A Minimax theorem

infxsupy

f (x , y) ≤ supy

f (xt , y)

≤ 1t

∑τ≤t

f (xτ , y) by convexity

≤ 1t

∑τ≤t

f (xτ , yτ ) y plays optimally

≤ mini

1t

∑τ≤t

f (a(i), yτ ) +

√lnN2t

by the regret bound

≤ mini

f (a(i), yt)

√lnN2t

by concavity

≤ supy

mini

f (a(i), y) +

√lnN2t

Then t →∞ and ε→ 0

Walid Krichene Prediction, Learning and Games - Chapter 7 Prediction and Playing GamesNovember 4, 2013 11 / 21

Page 26: Prediction, Learning and Games - Chapter 7 Prediction and ...walid.krichene.net/notes/reading-plg-chap7.pdf · Prediction,LearningandGames-Chapter7 PredictionandPlayingGames WalidKrichene

Zero-sum two player game

`(row)(i , j) = −`(column)(i , j)

Matrix of losses: M such that Mi,j = `(row)(i , j)Joint strategy: (p, q)Loss of row player:

∑i,j piMi,jqj = 〈p,Mq〉

Nash equilibrium(p, q) is a Nash eq. iff for all p′, q′

〈p,Mq′〉 ≤ 〈p,Mq〉 ≤ 〈p′,Mq〉

In fact, by Von Neumann’s minimax theorem,

minp′

maxq′〈p′,Mq′〉 = max

q′minp′〈p′,Mq′〉

Call it V , the value of the game.

Value of the game

(p, q) is a Nash eq. ⇔ 〈p,Mq〉 = V

Walid Krichene Prediction, Learning and Games - Chapter 7 Prediction and Playing GamesNovember 4, 2013 12 / 21

Page 27: Prediction, Learning and Games - Chapter 7 Prediction and ...walid.krichene.net/notes/reading-plg-chap7.pdf · Prediction,LearningandGames-Chapter7 PredictionandPlayingGames WalidKrichene

Zero-sum two player game

`(row)(i , j) = −`(column)(i , j)

Matrix of losses: M such that Mi,j = `(row)(i , j)Joint strategy: (p, q)Loss of row player:

∑i,j piMi,jqj = 〈p,Mq〉

Nash equilibrium(p, q) is a Nash eq. iff for all p′, q′

〈p,Mq′〉 ≤ 〈p,Mq〉 ≤ 〈p′,Mq〉

In fact, by Von Neumann’s minimax theorem,

minp′

maxq′〈p′,Mq′〉 = max

q′minp′〈p′,Mq′〉

Call it V , the value of the game.

Value of the game

(p, q) is a Nash eq. ⇔ 〈p,Mq〉 = V

Walid Krichene Prediction, Learning and Games - Chapter 7 Prediction and Playing GamesNovember 4, 2013 12 / 21

Page 28: Prediction, Learning and Games - Chapter 7 Prediction and ...walid.krichene.net/notes/reading-plg-chap7.pdf · Prediction,LearningandGames-Chapter7 PredictionandPlayingGames WalidKrichene

Zero-sum two player game

`(row)(i , j) = −`(column)(i , j)

Matrix of losses: M such that Mi,j = `(row)(i , j)Joint strategy: (p, q)Loss of row player:

∑i,j piMi,jqj = 〈p,Mq〉

Nash equilibrium(p, q) is a Nash eq. iff for all p′, q′

〈p,Mq′〉 ≤ 〈p,Mq〉 ≤ 〈p′,Mq〉

In fact, by Von Neumann’s minimax theorem,

minp′

maxq′〈p′,Mq′〉 = max

q′minp′〈p′,Mq′〉

Call it V , the value of the game.

Value of the game

(p, q) is a Nash eq. ⇔ 〈p,Mq〉 = V

Walid Krichene Prediction, Learning and Games - Chapter 7 Prediction and Playing GamesNovember 4, 2013 12 / 21

Page 29: Prediction, Learning and Games - Chapter 7 Prediction and ...walid.krichene.net/notes/reading-plg-chap7.pdf · Prediction,LearningandGames-Chapter7 PredictionandPlayingGames WalidKrichene

Repeated zero-sum two-player game

Row player plays firstGiven a sequence of column plays, j1, j2, . . . , score performance of rowsequence i1, i2, . . . by evaluating regret

maxi

T∑t=1

`(row)(it , jj)−T∑

t=1

`(row)(i , jt)

Can be extended to time varying: `(row)t

NoteImplicit assumption: oblivious opponent.This is not the loss Row would have suffered had he played a constant strategy.

Walid Krichene Prediction, Learning and Games - Chapter 7 Prediction and Playing GamesNovember 4, 2013 13 / 21

Page 30: Prediction, Learning and Games - Chapter 7 Prediction and ...walid.krichene.net/notes/reading-plg-chap7.pdf · Prediction,LearningandGames-Chapter7 PredictionandPlayingGames WalidKrichene

Repeated zero-sum two-player game

Row player plays firstGiven a sequence of column plays, j1, j2, . . . , score performance of rowsequence i1, i2, . . . by evaluating regret

maxi

T∑t=1

`(row)(it , jj)−T∑

t=1

`(row)(i , jt)

Can be extended to time varying: `(row)t

NoteImplicit assumption: oblivious opponent.This is not the loss Row would have suffered had he played a constant strategy.

Walid Krichene Prediction, Learning and Games - Chapter 7 Prediction and Playing GamesNovember 4, 2013 13 / 21

Page 31: Prediction, Learning and Games - Chapter 7 Prediction and ...walid.krichene.net/notes/reading-plg-chap7.pdf · Prediction,LearningandGames-Chapter7 PredictionandPlayingGames WalidKrichene

Repeated zero-sum two-player game

Row player plays firstGiven a sequence of column plays, j1, j2, . . . , score performance of rowsequence i1, i2, . . . by evaluating regret

maxi

T∑t=1

`(row)(it , jj)−T∑

t=1

`(row)(i , jt)

Can be extended to time varying: `(row)t

NoteImplicit assumption: oblivious opponent.This is not the loss Row would have suffered had he played a constant strategy.

Walid Krichene Prediction, Learning and Games - Chapter 7 Prediction and Playing GamesNovember 4, 2013 13 / 21

Page 32: Prediction, Learning and Games - Chapter 7 Prediction and ...walid.krichene.net/notes/reading-plg-chap7.pdf · Prediction,LearningandGames-Chapter7 PredictionandPlayingGames WalidKrichene

Repeated zero-sum two-player game

Convergence of average lossesIf row strategies are Hannan-consistent, then

lim supT

1T

T∑t=1

`(row)(it , jt) ≤ V

Note: only upper bound: can do much better if column player is notadversarialproof: suffices to show

lim supT

mini

1T

T∑t=1

`(row)(i , jt) ≤ V

(then use regret bound)

Walid Krichene Prediction, Learning and Games - Chapter 7 Prediction and Playing GamesNovember 4, 2013 14 / 21

Page 33: Prediction, Learning and Games - Chapter 7 Prediction and ...walid.krichene.net/notes/reading-plg-chap7.pdf · Prediction,LearningandGames-Chapter7 PredictionandPlayingGames WalidKrichene

Repeated zero-sum two-player game

Convergence of average lossesIf row strategies are Hannan-consistent, then

lim supT

1T

T∑t=1

`(row)(it , jt) ≤ V

Note: only upper bound: can do much better if column player is notadversarial

proof: suffices to show

lim supT

mini

1T

T∑t=1

`(row)(i , jt) ≤ V

(then use regret bound)

Walid Krichene Prediction, Learning and Games - Chapter 7 Prediction and Playing GamesNovember 4, 2013 14 / 21

Page 34: Prediction, Learning and Games - Chapter 7 Prediction and ...walid.krichene.net/notes/reading-plg-chap7.pdf · Prediction,LearningandGames-Chapter7 PredictionandPlayingGames WalidKrichene

Repeated zero-sum two-player game

Convergence of average lossesIf row strategies are Hannan-consistent, then

lim supT

1T

T∑t=1

`(row)(it , jt) ≤ V

Note: only upper bound: can do much better if column player is notadversarialproof: suffices to show

lim supT

mini

1T

T∑t=1

`(row)(i , jt) ≤ V

(then use regret bound)

Walid Krichene Prediction, Learning and Games - Chapter 7 Prediction and Playing GamesNovember 4, 2013 14 / 21

Page 35: Prediction, Learning and Games - Chapter 7 Prediction and ...walid.krichene.net/notes/reading-plg-chap7.pdf · Prediction,LearningandGames-Chapter7 PredictionandPlayingGames WalidKrichene

Repeated zero-sum two-player game

proof:

mini

1T

T∑t=1

`(row)(i , jt) = minp

1T

T∑t=1

〈p,Mejt 〉

= minp

⟨p,M

1T

T∑t=1

ejt

⟩≤ min

pmax

q〈p,Mq〉

= V

Walid Krichene Prediction, Learning and Games - Chapter 7 Prediction and Playing GamesNovember 4, 2013 15 / 21

Page 36: Prediction, Learning and Games - Chapter 7 Prediction and ...walid.krichene.net/notes/reading-plg-chap7.pdf · Prediction,LearningandGames-Chapter7 PredictionandPlayingGames WalidKrichene

Repeated zero-sum two-player game

If both players minimize regret, average distributions converge to Npt = 1

t

∑τ≤t pτ

qt = 1t

∑τ≤t qτ

pt × qt → N

Does not prove convergence of actual probability sequences (pt , qt)

Typical example: sequences (pt , qt) oscillate, but averages (pt , qt) convergeto N

Also does not prove convergence of average joint distribution, p × q

p × qt(i , j) =1T

∑τ≤t

pt(i)qt(j)

Walid Krichene Prediction, Learning and Games - Chapter 7 Prediction and Playing GamesNovember 4, 2013 16 / 21

Page 37: Prediction, Learning and Games - Chapter 7 Prediction and ...walid.krichene.net/notes/reading-plg-chap7.pdf · Prediction,LearningandGames-Chapter7 PredictionandPlayingGames WalidKrichene

Repeated zero-sum two-player game

If both players minimize regret, average distributions converge to Npt = 1

t

∑τ≤t pτ

qt = 1t

∑τ≤t qτ

pt × qt → N

Does not prove convergence of actual probability sequences (pt , qt)

Typical example: sequences (pt , qt) oscillate, but averages (pt , qt) convergeto N

Also does not prove convergence of average joint distribution, p × q

p × qt(i , j) =1T

∑τ≤t

pt(i)qt(j)

Walid Krichene Prediction, Learning and Games - Chapter 7 Prediction and Playing GamesNovember 4, 2013 16 / 21

Page 38: Prediction, Learning and Games - Chapter 7 Prediction and ...walid.krichene.net/notes/reading-plg-chap7.pdf · Prediction,LearningandGames-Chapter7 PredictionandPlayingGames WalidKrichene

Repeated zero-sum two-player game

If both players minimize regret, average distributions converge to Npt = 1

t

∑τ≤t pτ

qt = 1t

∑τ≤t qτ

pt × qt → N

Does not prove convergence of actual probability sequences (pt , qt)

Typical example: sequences (pt , qt) oscillate, but averages (pt , qt) convergeto N

Also does not prove convergence of average joint distribution, p × q

p × qt(i , j) =1T

∑τ≤t

pt(i)qt(j)

Walid Krichene Prediction, Learning and Games - Chapter 7 Prediction and Playing GamesNovember 4, 2013 16 / 21

Page 39: Prediction, Learning and Games - Chapter 7 Prediction and ...walid.krichene.net/notes/reading-plg-chap7.pdf · Prediction,LearningandGames-Chapter7 PredictionandPlayingGames WalidKrichene

Note: average marginals Vs. average joint

They are not the same

pt × qt(i , j) =1

T 2

T∑t=1

T∑t′=1

pt(i)qt′(j)

p × qt(i , j) =1T

T∑t=1

pt(i)qt(j)

Walid Krichene Prediction, Learning and Games - Chapter 7 Prediction and Playing GamesNovember 4, 2013 17 / 21

Page 40: Prediction, Learning and Games - Chapter 7 Prediction and ...walid.krichene.net/notes/reading-plg-chap7.pdf · Prediction,LearningandGames-Chapter7 PredictionandPlayingGames WalidKrichene

Correlated equilibria and internal regret

Back to general K player game.Let P be the average joint distribution

Pt(i) =1t

∑τ≤t

p(1)(i1) . . . p(K)(iK )

If every player minimizes regret, Pt converges to the Hannan set

H = {P : ∀k, ∀j ,∑

i

P(i)`(k)(i (k), i (−k)) ≤∑

i

P(i)`(k)(j , i (−k))}

H contains correlated equilibria, but proper superset in most cases.

N ⊂ C ⊂ H

Walid Krichene Prediction, Learning and Games - Chapter 7 Prediction and Playing GamesNovember 4, 2013 18 / 21

Page 41: Prediction, Learning and Games - Chapter 7 Prediction and ...walid.krichene.net/notes/reading-plg-chap7.pdf · Prediction,LearningandGames-Chapter7 PredictionandPlayingGames WalidKrichene

Correlated equilibria and internal regret

Back to general K player game.Let P be the average joint distribution

Pt(i) =1t

∑τ≤t

p(1)(i1) . . . p(K)(iK )

If every player minimizes regret, Pt converges to the Hannan set

H = {P : ∀k, ∀j ,∑

i

P(i)`(k)(i (k), i (−k)) ≤∑

i

P(i)`(k)(j , i (−k))}

H contains correlated equilibria, but proper superset in most cases.

N ⊂ C ⊂ H

Walid Krichene Prediction, Learning and Games - Chapter 7 Prediction and Playing GamesNovember 4, 2013 18 / 21

Page 42: Prediction, Learning and Games - Chapter 7 Prediction and ...walid.krichene.net/notes/reading-plg-chap7.pdf · Prediction,LearningandGames-Chapter7 PredictionandPlayingGames WalidKrichene

Correlated equilibria and internal regret

Back to general K player game.Let P be the average joint distribution

Pt(i) =1t

∑τ≤t

p(1)(i1) . . . p(K)(iK )

If every player minimizes regret, Pt converges to the Hannan set

H = {P : ∀k, ∀j ,∑

i

P(i)`(k)(i (k), i (−k)) ≤∑

i

P(i)`(k)(j , i (−k))}

H contains correlated equilibria, but proper superset in most cases.

N ⊂ C ⊂ H

Walid Krichene Prediction, Learning and Games - Chapter 7 Prediction and Playing GamesNovember 4, 2013 18 / 21

Page 43: Prediction, Learning and Games - Chapter 7 Prediction and ...walid.krichene.net/notes/reading-plg-chap7.pdf · Prediction,LearningandGames-Chapter7 PredictionandPlayingGames WalidKrichene

Correlated equilibria and internal regret

Convergence of average joint to CIf players minimize internal regret, then Pt → C

Recall the definition of internal regret

Internal regret

R(k)j,j′,T =

T∑t=1

p(k)j,t (`(j , I (−k))− `(j ′, I (−k)))

RT = maxj,j′

Rj,j′,T

Also recall that we can turn any regret-minimizing procedure over N actions, tointernal-regret minimizing procedure, by taking the set of pairs (size N2)

Walid Krichene Prediction, Learning and Games - Chapter 7 Prediction and Playing GamesNovember 4, 2013 19 / 21

Page 44: Prediction, Learning and Games - Chapter 7 Prediction and ...walid.krichene.net/notes/reading-plg-chap7.pdf · Prediction,LearningandGames-Chapter7 PredictionandPlayingGames WalidKrichene

Correlated equilibria and internal regret

Convergence of average joint to CIf players minimize internal regret, then Pt → C

Recall the definition of internal regret

Internal regret

R(k)j,j′,T =

T∑t=1

p(k)j,t (`(j , I (−k))− `(j ′, I (−k)))

RT = maxj,j′

Rj,j′,T

Also recall that we can turn any regret-minimizing procedure over N actions, tointernal-regret minimizing procedure, by taking the set of pairs (size N2)

Walid Krichene Prediction, Learning and Games - Chapter 7 Prediction and Playing GamesNovember 4, 2013 19 / 21

Page 45: Prediction, Learning and Games - Chapter 7 Prediction and ...walid.krichene.net/notes/reading-plg-chap7.pdf · Prediction,LearningandGames-Chapter7 PredictionandPlayingGames WalidKrichene

Summary of results

N ⊂ C ⊂ H

Minimizing regret:I (p(1)

t × · · · × p(K)t )→ N

I (p(1)t × · · · × p(K)

t )→ N ?I Pt → H

Minimizing internal regret:I Pt → C, updates O(N2)

NextI (Unknown game: exploration)I Blackwell approachability theoremI Potential-based approachability

Walid Krichene Prediction, Learning and Games - Chapter 7 Prediction and Playing GamesNovember 4, 2013 20 / 21

Page 46: Prediction, Learning and Games - Chapter 7 Prediction and ...walid.krichene.net/notes/reading-plg-chap7.pdf · Prediction,LearningandGames-Chapter7 PredictionandPlayingGames WalidKrichene

Summary of results

N ⊂ C ⊂ H

Minimizing regret:I (p(1)

t × · · · × p(K)t )→ N

I (p(1)t × · · · × p(K)

t )→ N ?I Pt → H

Minimizing internal regret:I Pt → C, updates O(N2)

NextI (Unknown game: exploration)I Blackwell approachability theoremI Potential-based approachability

Walid Krichene Prediction, Learning and Games - Chapter 7 Prediction and Playing GamesNovember 4, 2013 20 / 21

Page 47: Prediction, Learning and Games - Chapter 7 Prediction and ...walid.krichene.net/notes/reading-plg-chap7.pdf · Prediction,LearningandGames-Chapter7 PredictionandPlayingGames WalidKrichene

Next week

Finish Chapter 7 (approachability)Application to the routing game

Walid Krichene Prediction, Learning and Games - Chapter 7 Prediction and Playing GamesNovember 4, 2013 21 / 21