• date post

19-May-2020
• Category

## Documents

• view

0

0

Embed Size (px)

### Transcript of Prediction, Learning and Games - Chapter 7 Prediction and ...walid. · PDF file...

• Prediction, Learning and Games - Chapter 7 Prediction and Playing Games

Walid Krichene

November 4, 2013

Walid Krichene Prediction, Learning and Games - Chapter 7 Prediction and Playing GamesNovember 4, 2013 1 / 21

• The setting

K players player k : Nk actions

joint action vector i ∈ [N1]× · · · × [NK ] loss vector `(i) ∈ [0, 1] players randomize: mixed strategy profile π ∈ ∆[N1] × · · · ×∆[NK ]

π(i) = p(1)(i1) . . . p(K)(iK )

joint random action I ∈ [N1]× [NK ], I ∼ π expected loss E[`(k)] =

∑ i π(i)`

(k)(i) by abuse of notation `(k)(π) (can think of it as a linear extension of ` from [N1]× · · · × [NK ] to ∆[N1] × · · · ×∆[NK ])

Walid Krichene Prediction, Learning and Games - Chapter 7 Prediction and Playing GamesNovember 4, 2013 2 / 21

• The setting

K players player k : Nk actions joint action vector i ∈ [N1]× · · · × [NK ] loss vector `(i) ∈ [0, 1]

players randomize: mixed strategy profile π ∈ ∆[N1] × · · · ×∆[NK ]

π(i) = p(1)(i1) . . . p(K)(iK )

joint random action I ∈ [N1]× [NK ], I ∼ π expected loss E[`(k)] =

∑ i π(i)`

(k)(i) by abuse of notation `(k)(π) (can think of it as a linear extension of ` from [N1]× · · · × [NK ] to ∆[N1] × · · · ×∆[NK ])

Walid Krichene Prediction, Learning and Games - Chapter 7 Prediction and Playing GamesNovember 4, 2013 2 / 21

• The setting

K players player k : Nk actions joint action vector i ∈ [N1]× · · · × [NK ] loss vector `(i) ∈ [0, 1] players randomize: mixed strategy profile π ∈ ∆[N1] × · · · ×∆[NK ]

π(i) = p(1)(i1) . . . p(K)(iK )

joint random action I ∈ [N1]× [NK ], I ∼ π expected loss E[`(k)] =

∑ i π(i)`

(k)(i) by abuse of notation `(k)(π) (can think of it as a linear extension of ` from [N1]× · · · × [NK ] to ∆[N1] × · · · ×∆[NK ])

Walid Krichene Prediction, Learning and Games - Chapter 7 Prediction and Playing GamesNovember 4, 2013 2 / 21

• Nash equilibrium

Nash equilibrium p is a Nash eq. if for all k and all p′

`(k)(p(k), p(−k)) ≤ `(k)(p′, p(−k))

no one has an incentive to unilaterally deviate. set of Nash equilibria: N

Walid Krichene Prediction, Learning and Games - Chapter 7 Prediction and Playing GamesNovember 4, 2013 3 / 21

• Correlated equilibria

Correlated equilibria (Aumann) Probability distribution P over [N1]× · · · × [NK ], which is not necessarily a product distribution

Some coordinator gives advice I ∼ P If player unilaterally deviates, worse-off in expectation.

Equivalent to “switching j and j ′ is a bad idea”. For all j , j ′∑ i :ik=j

P(i)[`(k)(j , i (−k))− `(k)(j ′, i (−k))] ≤ 0

Link to internal regret.

Structure of correlated equilibria Think of P as a vector in ∆[N1]×···×[NK ]

Linear inequalities in P ⇒ closed convex polyhedron Convex combination of Nash eq ⇒ correlated eq. (but not ⇐)

Walid Krichene Prediction, Learning and Games - Chapter 7 Prediction and Playing GamesNovember 4, 2013 4 / 21

• Correlated equilibria

Correlated equilibria (Aumann) Probability distribution P over [N1]× · · · × [NK ], which is not necessarily a product distribution Some coordinator gives advice I ∼ P If player unilaterally deviates, worse-off in expectation.

Equivalent to “switching j and j ′ is a bad idea”. For all j , j ′∑ i :ik=j

P(i)[`(k)(j , i (−k))− `(k)(j ′, i (−k))] ≤ 0

Link to internal regret.

Structure of correlated equilibria Think of P as a vector in ∆[N1]×···×[NK ]

Linear inequalities in P ⇒ closed convex polyhedron Convex combination of Nash eq ⇒ correlated eq. (but not ⇐)

Walid Krichene Prediction, Learning and Games - Chapter 7 Prediction and Playing GamesNovember 4, 2013 4 / 21

• Correlated equilibria

Correlated equilibria (Aumann) Probability distribution P over [N1]× · · · × [NK ], which is not necessarily a product distribution Some coordinator gives advice I ∼ P If player unilaterally deviates, worse-off in expectation.

Equivalent to “switching j and j ′ is a bad idea”. For all j , j ′∑ i :ik=j

P(i)[`(k)(j , i (−k))− `(k)(j ′, i (−k))] ≤ 0

Link to internal regret.

Structure of correlated equilibria Think of P as a vector in ∆[N1]×···×[NK ]

Linear inequalities in P ⇒ closed convex polyhedron Convex combination of Nash eq ⇒ correlated eq. (but not ⇐)

Walid Krichene Prediction, Learning and Games - Chapter 7 Prediction and Playing GamesNovember 4, 2013 4 / 21

• Correlated equilibria

Correlated equilibria (Aumann) Probability distribution P over [N1]× · · · × [NK ], which is not necessarily a product distribution Some coordinator gives advice I ∼ P If player unilaterally deviates, worse-off in expectation.

Equivalent to “switching j and j ′ is a bad idea”. For all j , j ′∑ i :ik=j

P(i)[`(k)(j , i (−k))− `(k)(j ′, i (−k))] ≤ 0

Link to internal regret.

Structure of correlated equilibria Think of P as a vector in ∆[N1]×···×[NK ]

Linear inequalities in P ⇒ closed convex polyhedron Convex combination of Nash eq ⇒ correlated eq. (but not ⇐)

Walid Krichene Prediction, Learning and Games - Chapter 7 Prediction and Playing GamesNovember 4, 2013 4 / 21

• Repeated games

k maintains p(k)t , draws action I kt ∼ p (k) t

observes all players’ actions It = (I (1) t , . . . , I

(K) t )

Uncoupled play: player only knows his loss function. In fact, only his regret.

Regret-minimizing procedure Does the repeated play converge to N (or other equilibrium concept) in some sense?

Summary of results Minimizing regret:

I average marginal distributions p̄(1) × · · · × p̄(K) converge to N I average joint distribution p(1) × p(K) converges to the Hannan set

Minimizing internal regret: I average joint distribution p(1) × p(K) converges to correlated equilibria

Walid Krichene Prediction, Learning and Games - Chapter 7 Prediction and Playing GamesNovember 4, 2013 5 / 21

• Repeated games

k maintains p(k)t , draws action I kt ∼ p (k) t

observes all players’ actions It = (I (1) t , . . . , I

(K) t )

Uncoupled play: player only knows his loss function.

In fact, only his regret.

Regret-minimizing procedure Does the repeated play converge to N (or other equilibrium concept) in some sense?

Summary of results Minimizing regret:

I average marginal distributions p̄(1) × · · · × p̄(K) converge to N I average joint distribution p(1) × p(K) converges to the Hannan set

Minimizing internal regret: I average joint distribution p(1) × p(K) converges to correlated equilibria

Walid Krichene Prediction, Learning and Games - Chapter 7 Prediction and Playing GamesNovember 4, 2013 5 / 21

• Repeated games

k maintains p(k)t , draws action I kt ∼ p (k) t

observes all players’ actions It = (I (1) t , . . . , I

(K) t )

Uncoupled play: player only knows his loss function. In fact, only his regret.

Regret-minimizing procedure Does the repeated play converge to N (or other equilibrium concept) in some sense?

Summary of results Minimizing regret:

I average marginal distributions p̄(1) × · · · × p̄(K) converge to N I average joint distribution p(1) × p(K) converges to the Hannan set

Minimizing internal regret: I average joint distribution p(1) × p(K) converges to correlated equilibria

Walid Krichene Prediction, Learning and Games - Chapter 7 Prediction and Playing GamesNovember 4, 2013 5 / 21

• Repeated games

k maintains p(k)t , draws action I kt ∼ p (k) t

observes all players’ actions It = (I (1) t , . . . , I

(K) t )

Uncoupled play: player only knows his loss function. In fact, only his regret.

Regret-minimizing procedure Does the repeated play converge to N (or other equilibrium concept) in some sense?

Summary of results Minimizing regret:

I average marginal distributions p̄(1) × · · · × p̄(K) converge to N I average joint distribution p(1) × p(K) converges to the Hannan set

Minimizing internal regret: I average joint distribution p(1) × p(K) converges to correlated equilibria

Walid Krichene Prediction, Learning and Games - Chapter 7 Prediction and Playing GamesNovember 4, 2013 5 / 21

• Regret-minimization Vs. fictitious play On iteration t, play best response to cumulative losses

` (k) t : vector of losses `

(k) t (i

(k)) = `(k)(i (k), I (−k)t )

L(k)t = t∑

τ=1

`(k)τ

Fictitious play

i (k)t+1 ∈ arg mini∈[Nk ] L(k)t (i)

equivalent to min

p∈∆[Nk ]

〈 p, L(k)t

Only converges in very special cases (e.g. 2 players, 2 actions each) Not Hannan-consistent (does not minimize average expected regret)

Walid Krichene Prediction, Learning and Games - Chapter 7 Prediction and Playing GamesNovember 4, 2013 6 / 21

• Regret-minimization Vs. fictitious play On iteration t, play best response to cumulative losses

` (k) t : vector of losses `

(k) t (i

(