Prediction, Learning and Games - Chapter 7 Prediction and ...walid. · PDF file...

Click here to load reader

  • date post

    19-May-2020
  • Category

    Documents

  • view

    0
  • download

    0

Embed Size (px)

Transcript of Prediction, Learning and Games - Chapter 7 Prediction and ...walid. · PDF file...

  • Prediction, Learning and Games - Chapter 7 Prediction and Playing Games

    Walid Krichene

    November 4, 2013

    Walid Krichene Prediction, Learning and Games - Chapter 7 Prediction and Playing GamesNovember 4, 2013 1 / 21

  • The setting

    K players player k : Nk actions

    joint action vector i ∈ [N1]× · · · × [NK ] loss vector `(i) ∈ [0, 1] players randomize: mixed strategy profile π ∈ ∆[N1] × · · · ×∆[NK ]

    π(i) = p(1)(i1) . . . p(K)(iK )

    joint random action I ∈ [N1]× [NK ], I ∼ π expected loss E[`(k)] =

    ∑ i π(i)`

    (k)(i) by abuse of notation `(k)(π) (can think of it as a linear extension of ` from [N1]× · · · × [NK ] to ∆[N1] × · · · ×∆[NK ])

    Walid Krichene Prediction, Learning and Games - Chapter 7 Prediction and Playing GamesNovember 4, 2013 2 / 21

  • The setting

    K players player k : Nk actions joint action vector i ∈ [N1]× · · · × [NK ] loss vector `(i) ∈ [0, 1]

    players randomize: mixed strategy profile π ∈ ∆[N1] × · · · ×∆[NK ]

    π(i) = p(1)(i1) . . . p(K)(iK )

    joint random action I ∈ [N1]× [NK ], I ∼ π expected loss E[`(k)] =

    ∑ i π(i)`

    (k)(i) by abuse of notation `(k)(π) (can think of it as a linear extension of ` from [N1]× · · · × [NK ] to ∆[N1] × · · · ×∆[NK ])

    Walid Krichene Prediction, Learning and Games - Chapter 7 Prediction and Playing GamesNovember 4, 2013 2 / 21

  • The setting

    K players player k : Nk actions joint action vector i ∈ [N1]× · · · × [NK ] loss vector `(i) ∈ [0, 1] players randomize: mixed strategy profile π ∈ ∆[N1] × · · · ×∆[NK ]

    π(i) = p(1)(i1) . . . p(K)(iK )

    joint random action I ∈ [N1]× [NK ], I ∼ π expected loss E[`(k)] =

    ∑ i π(i)`

    (k)(i) by abuse of notation `(k)(π) (can think of it as a linear extension of ` from [N1]× · · · × [NK ] to ∆[N1] × · · · ×∆[NK ])

    Walid Krichene Prediction, Learning and Games - Chapter 7 Prediction and Playing GamesNovember 4, 2013 2 / 21

  • Nash equilibrium

    Nash equilibrium p is a Nash eq. if for all k and all p′

    `(k)(p(k), p(−k)) ≤ `(k)(p′, p(−k))

    no one has an incentive to unilaterally deviate. set of Nash equilibria: N

    Walid Krichene Prediction, Learning and Games - Chapter 7 Prediction and Playing GamesNovember 4, 2013 3 / 21

  • Correlated equilibria

    Correlated equilibria (Aumann) Probability distribution P over [N1]× · · · × [NK ], which is not necessarily a product distribution

    Some coordinator gives advice I ∼ P If player unilaterally deviates, worse-off in expectation.

    Equivalent to “switching j and j ′ is a bad idea”. For all j , j ′∑ i :ik=j

    P(i)[`(k)(j , i (−k))− `(k)(j ′, i (−k))] ≤ 0

    Link to internal regret.

    Structure of correlated equilibria Think of P as a vector in ∆[N1]×···×[NK ]

    Linear inequalities in P ⇒ closed convex polyhedron Convex combination of Nash eq ⇒ correlated eq. (but not ⇐)

    Walid Krichene Prediction, Learning and Games - Chapter 7 Prediction and Playing GamesNovember 4, 2013 4 / 21

  • Correlated equilibria

    Correlated equilibria (Aumann) Probability distribution P over [N1]× · · · × [NK ], which is not necessarily a product distribution Some coordinator gives advice I ∼ P If player unilaterally deviates, worse-off in expectation.

    Equivalent to “switching j and j ′ is a bad idea”. For all j , j ′∑ i :ik=j

    P(i)[`(k)(j , i (−k))− `(k)(j ′, i (−k))] ≤ 0

    Link to internal regret.

    Structure of correlated equilibria Think of P as a vector in ∆[N1]×···×[NK ]

    Linear inequalities in P ⇒ closed convex polyhedron Convex combination of Nash eq ⇒ correlated eq. (but not ⇐)

    Walid Krichene Prediction, Learning and Games - Chapter 7 Prediction and Playing GamesNovember 4, 2013 4 / 21

  • Correlated equilibria

    Correlated equilibria (Aumann) Probability distribution P over [N1]× · · · × [NK ], which is not necessarily a product distribution Some coordinator gives advice I ∼ P If player unilaterally deviates, worse-off in expectation.

    Equivalent to “switching j and j ′ is a bad idea”. For all j , j ′∑ i :ik=j

    P(i)[`(k)(j , i (−k))− `(k)(j ′, i (−k))] ≤ 0

    Link to internal regret.

    Structure of correlated equilibria Think of P as a vector in ∆[N1]×···×[NK ]

    Linear inequalities in P ⇒ closed convex polyhedron Convex combination of Nash eq ⇒ correlated eq. (but not ⇐)

    Walid Krichene Prediction, Learning and Games - Chapter 7 Prediction and Playing GamesNovember 4, 2013 4 / 21

  • Correlated equilibria

    Correlated equilibria (Aumann) Probability distribution P over [N1]× · · · × [NK ], which is not necessarily a product distribution Some coordinator gives advice I ∼ P If player unilaterally deviates, worse-off in expectation.

    Equivalent to “switching j and j ′ is a bad idea”. For all j , j ′∑ i :ik=j

    P(i)[`(k)(j , i (−k))− `(k)(j ′, i (−k))] ≤ 0

    Link to internal regret.

    Structure of correlated equilibria Think of P as a vector in ∆[N1]×···×[NK ]

    Linear inequalities in P ⇒ closed convex polyhedron Convex combination of Nash eq ⇒ correlated eq. (but not ⇐)

    Walid Krichene Prediction, Learning and Games - Chapter 7 Prediction and Playing GamesNovember 4, 2013 4 / 21

  • Repeated games

    k maintains p(k)t , draws action I kt ∼ p (k) t

    observes all players’ actions It = (I (1) t , . . . , I

    (K) t )

    Uncoupled play: player only knows his loss function. In fact, only his regret.

    Regret-minimizing procedure Does the repeated play converge to N (or other equilibrium concept) in some sense?

    Summary of results Minimizing regret:

    I average marginal distributions p̄(1) × · · · × p̄(K) converge to N I average joint distribution p(1) × p(K) converges to the Hannan set

    Minimizing internal regret: I average joint distribution p(1) × p(K) converges to correlated equilibria

    Walid Krichene Prediction, Learning and Games - Chapter 7 Prediction and Playing GamesNovember 4, 2013 5 / 21

  • Repeated games

    k maintains p(k)t , draws action I kt ∼ p (k) t

    observes all players’ actions It = (I (1) t , . . . , I

    (K) t )

    Uncoupled play: player only knows his loss function.

    In fact, only his regret.

    Regret-minimizing procedure Does the repeated play converge to N (or other equilibrium concept) in some sense?

    Summary of results Minimizing regret:

    I average marginal distributions p̄(1) × · · · × p̄(K) converge to N I average joint distribution p(1) × p(K) converges to the Hannan set

    Minimizing internal regret: I average joint distribution p(1) × p(K) converges to correlated equilibria

    Walid Krichene Prediction, Learning and Games - Chapter 7 Prediction and Playing GamesNovember 4, 2013 5 / 21

  • Repeated games

    k maintains p(k)t , draws action I kt ∼ p (k) t

    observes all players’ actions It = (I (1) t , . . . , I

    (K) t )

    Uncoupled play: player only knows his loss function. In fact, only his regret.

    Regret-minimizing procedure Does the repeated play converge to N (or other equilibrium concept) in some sense?

    Summary of results Minimizing regret:

    I average marginal distributions p̄(1) × · · · × p̄(K) converge to N I average joint distribution p(1) × p(K) converges to the Hannan set

    Minimizing internal regret: I average joint distribution p(1) × p(K) converges to correlated equilibria

    Walid Krichene Prediction, Learning and Games - Chapter 7 Prediction and Playing GamesNovember 4, 2013 5 / 21

  • Repeated games

    k maintains p(k)t , draws action I kt ∼ p (k) t

    observes all players’ actions It = (I (1) t , . . . , I

    (K) t )

    Uncoupled play: player only knows his loss function. In fact, only his regret.

    Regret-minimizing procedure Does the repeated play converge to N (or other equilibrium concept) in some sense?

    Summary of results Minimizing regret:

    I average marginal distributions p̄(1) × · · · × p̄(K) converge to N I average joint distribution p(1) × p(K) converges to the Hannan set

    Minimizing internal regret: I average joint distribution p(1) × p(K) converges to correlated equilibria

    Walid Krichene Prediction, Learning and Games - Chapter 7 Prediction and Playing GamesNovember 4, 2013 5 / 21

  • Regret-minimization Vs. fictitious play On iteration t, play best response to cumulative losses

    ` (k) t : vector of losses `

    (k) t (i

    (k)) = `(k)(i (k), I (−k)t )

    L(k)t = t∑

    τ=1

    `(k)τ

    Fictitious play

    i (k)t+1 ∈ arg mini∈[Nk ] L(k)t (i)

    equivalent to min

    p∈∆[Nk ]

    〈 p, L(k)t

    Only converges in very special cases (e.g. 2 players, 2 actions each) Not Hannan-consistent (does not minimize average expected regret)

    Walid Krichene Prediction, Learning and Games - Chapter 7 Prediction and Playing GamesNovember 4, 2013 6 / 21

  • Regret-minimization Vs. fictitious play On iteration t, play best response to cumulative losses

    ` (k) t : vector of losses `

    (k) t (i

    (