Download - The Design of Online Learning Algorithms fileThe Design of Online Learning Algorithms Wouter M. Koolen Online Learning Workshop Paris, Friday 20th October, 2017

Transcript
Page 1: The Design of Online Learning Algorithms fileThe Design of Online Learning Algorithms Wouter M. Koolen Online Learning Workshop Paris, Friday 20th October, 2017

The Design of Online Learning Algorithms

Wouter M. Koolen

Online Learning WorkshopParis, Friday 20th October, 2017

Page 2: The Design of Online Learning Algorithms fileThe Design of Online Learning Algorithms Wouter M. Koolen Online Learning Workshop Paris, Friday 20th October, 2017

Conclusion

A simple factor (1 + ηrt) stretches surprisingly far.

Page 3: The Design of Online Learning Algorithms fileThe Design of Online Learning Algorithms Wouter M. Koolen Online Learning Workshop Paris, Friday 20th October, 2017

Outline

1 Coin Betting

2 Defensive Forecasting

3 Squint

4 MetaGrad

Page 4: The Design of Online Learning Algorithms fileThe Design of Online Learning Algorithms Wouter M. Koolen Online Learning Workshop Paris, Friday 20th October, 2017

Coin Betting

K0 = 1.For t = 1, 2, . . .

Skeptic picks Mt ∈ RReality picks rt ∈ [−1, 1]

Kt = Kt−1 + Mtrt

Pick an event E .Skeptic wins if

1 Kt ≥ 0

2 r1r2 · · · ∈ E or Kt →∞.

Say: Skeptic can force E .

Page 5: The Design of Online Learning Algorithms fileThe Design of Online Learning Algorithms Wouter M. Koolen Online Learning Workshop Paris, Friday 20th October, 2017

Coin Betting

K0 = 1.For t = 1, 2, . . .

Skeptic picks Mt ∈ RReality picks rt ∈ [−1, 1]

Kt = Kt−1 + Mtrt

Pick an event E .Skeptic wins if

1 Kt ≥ 0

2 r1r2 · · · ∈ E or Kt →∞.

Say: Skeptic can force E .

Page 6: The Design of Online Learning Algorithms fileThe Design of Online Learning Algorithms Wouter M. Koolen Online Learning Workshop Paris, Friday 20th October, 2017

Forcing The Law of Large Numbers

Fix η ∈ [0, 1/2]. Suppose Skeptic plays Mt = Kt−1η.Then

KT = KT−1 +KT−1ηrT = KT−1(1 + ηrT ) =T∏t=1

(1 + ηrt)

Now say C ≥ KT . Then, using ln(1 + x) ≥ x − x2,

lnC ≥T∑t=1

ln(1 + ηrt) ≥T∑t=1

ηrt − Tη2

Hence

lnC

Tη≥ 1

T

T∑t=1

rt − η and so η ≥ lim supT→∞

1

T

T∑t=1

rt

Page 7: The Design of Online Learning Algorithms fileThe Design of Online Learning Algorithms Wouter M. Koolen Online Learning Workshop Paris, Friday 20th October, 2017

Forcing The Law of Large Numbers

Fix η ∈ [0, 1/2]. Suppose Skeptic plays Mt = Kt−1η.Then

KT = KT−1 +KT−1ηrT = KT−1(1 + ηrT ) =T∏t=1

(1 + ηrt)

Now say C ≥ KT . Then, using ln(1 + x) ≥ x − x2,

lnC ≥T∑t=1

ln(1 + ηrt) ≥T∑t=1

ηrt − Tη2

Hence

lnC

Tη≥ 1

T

T∑t=1

rt − η and so η ≥ lim supT→∞

1

T

T∑t=1

rt

Page 8: The Design of Online Learning Algorithms fileThe Design of Online Learning Algorithms Wouter M. Koolen Online Learning Workshop Paris, Friday 20th October, 2017

Forcing The Law of Large Numbers

Fix η ∈ [0, 1/2]. Suppose Skeptic plays Mt = Kt−1η.Then

KT = KT−1 +KT−1ηrT = KT−1(1 + ηrT ) =T∏t=1

(1 + ηrt)

Now say C ≥ KT . Then, using ln(1 + x) ≥ x − x2,

lnC ≥T∑t=1

ln(1 + ηrt) ≥T∑t=1

ηrt − Tη2

Hence

lnC

Tη≥ 1

T

T∑t=1

rt − η and so η ≥ lim supT→∞

1

T

T∑t=1

rt

Page 9: The Design of Online Learning Algorithms fileThe Design of Online Learning Algorithms Wouter M. Koolen Online Learning Workshop Paris, Friday 20th October, 2017

Forcing The Law of Large NumbersFinally, let Skeptic allocate a fraction γi of his initial K0 = 1 to ηi .Then

KT =∞∑i=1

γi

T∏t=1

(1 + ηi rt)

Now suppose C ≥ KT . Then for each i :

lnC ≥ ln γi +T∑t=1

ln(1 + ηi rt)

So for each i

lim supT→∞

1

T

T∑t=1

rt ≤ ηi

and hence

lim supT→∞

1

T

T∑t=1

rt ≤ 0

Page 10: The Design of Online Learning Algorithms fileThe Design of Online Learning Algorithms Wouter M. Koolen Online Learning Workshop Paris, Friday 20th October, 2017

Forcing The Law of Large NumbersFinally, let Skeptic allocate a fraction γi of his initial K0 = 1 to ηi .Then

KT =∞∑i=1

γi

T∏t=1

(1 + ηi rt)

Now suppose C ≥ KT . Then for each i :

lnC ≥ ln γi +T∑t=1

ln(1 + ηi rt)

So for each i

lim supT→∞

1

T

T∑t=1

rt ≤ ηi

and hence

lim supT→∞

1

T

T∑t=1

rt ≤ 0

Page 11: The Design of Online Learning Algorithms fileThe Design of Online Learning Algorithms Wouter M. Koolen Online Learning Workshop Paris, Friday 20th October, 2017

Forcing The Law of Large NumbersFinally, let Skeptic allocate a fraction γi of his initial K0 = 1 to ηi .Then

KT =∞∑i=1

γi

T∏t=1

(1 + ηi rt)

Now suppose C ≥ KT . Then for each i :

lnC ≥ ln γi +T∑t=1

ln(1 + ηi rt)

So for each i

lim supT→∞

1

T

T∑t=1

rt ≤ ηi

and hence

lim supT→∞

1

T

T∑t=1

rt ≤ 0

Page 12: The Design of Online Learning Algorithms fileThe Design of Online Learning Algorithms Wouter M. Koolen Online Learning Workshop Paris, Friday 20th October, 2017

What else

Skeptic can force many laws of probability. For example the LIL

lim supT→∞

∑Tt=1 rt√

2T ln lnT≤ 1

Small deviations?

Page 13: The Design of Online Learning Algorithms fileThe Design of Online Learning Algorithms Wouter M. Koolen Online Learning Workshop Paris, Friday 20th October, 2017

What else

Skeptic can force many laws of probability. For example the LIL

lim supT→∞

∑Tt=1 rt√

2T ln lnT≤ 1

Small deviations?

Page 14: The Design of Online Learning Algorithms fileThe Design of Online Learning Algorithms Wouter M. Koolen Online Learning Workshop Paris, Friday 20th October, 2017

Outline

1 Coin Betting

2 Defensive Forecasting

3 Squint

4 MetaGrad

Page 15: The Design of Online Learning Algorithms fileThe Design of Online Learning Algorithms Wouter M. Koolen Online Learning Workshop Paris, Friday 20th October, 2017

Experts

Let’s play the experts game.

Learner picks wt ∈ 4K

Reality picks `t ∈ [0, 1]K

Learner incurs 〈wt , `t〉

Goal: make sure regret compared to any expert k is sublinear.

lim supT→∞

1

T

T∑t=1

rkt ≤ 0 where rkt = 〈wt , `t〉 − `kt

Key idea: Defensive Forecasting

1 Fix a strategy for Skeptic that forces this goal.

2 Play wt so that Skeptic does not get rich

Page 16: The Design of Online Learning Algorithms fileThe Design of Online Learning Algorithms Wouter M. Koolen Online Learning Workshop Paris, Friday 20th October, 2017

Experts

Let’s play the experts game.

Learner picks wt ∈ 4K

Reality picks `t ∈ [0, 1]K

Learner incurs 〈wt , `t〉

Goal: make sure regret compared to any expert k is sublinear.

lim supT→∞

1

T

T∑t=1

rkt ≤ 0 where rkt = 〈wt , `t〉 − `kt

Key idea: Defensive Forecasting

1 Fix a strategy for Skeptic that forces this goal.

2 Play wt so that Skeptic does not get rich

Page 17: The Design of Online Learning Algorithms fileThe Design of Online Learning Algorithms Wouter M. Koolen Online Learning Workshop Paris, Friday 20th October, 2017

Experts

Let’s play the experts game.

Learner picks wt ∈ 4K

Reality picks `t ∈ [0, 1]K

Learner incurs 〈wt , `t〉

Goal: make sure regret compared to any expert k is sublinear.

lim supT→∞

1

T

T∑t=1

rkt ≤ 0 where rkt = 〈wt , `t〉 − `kt

Key idea: Defensive Forecasting

1 Fix a strategy for Skeptic that forces this goal.

2 Play wt so that Skeptic does not get rich

Page 18: The Design of Online Learning Algorithms fileThe Design of Online Learning Algorithms Wouter M. Koolen Online Learning Workshop Paris, Friday 20th October, 2017

StategyStragegy for Skeptic:Split capital K0 = 1 over experts k with weights πk and ηi with γi .

KT =∑k,i

πkγi

T∏t=1

(1 + ηi rkt )

How to play? Make sure KT does not grow big.

KT+1 −KT =∑k,i

πkγi

T∏t=1

(1 + ηi rkt )ηi r

kT+1

=∑k,i

πkγi

T∏t=1

(1 + ηi rkt )ηi

(〈wT+1, `T+1〉 − `kT+1

)= 0

when we pick

wkT+1 =

∑i πkγi

∏Tt=1(1 + ηi r

kt )ηi∑

j ,i πjγi∏T

t=1(1 + ηi rjt )ηi

(iProd)

Page 19: The Design of Online Learning Algorithms fileThe Design of Online Learning Algorithms Wouter M. Koolen Online Learning Workshop Paris, Friday 20th October, 2017

StategyStragegy for Skeptic:Split capital K0 = 1 over experts k with weights πk and ηi with γi .

KT =∑k,i

πkγi

T∏t=1

(1 + ηi rkt )

How to play? Make sure KT does not grow big.

KT+1 −KT =∑k,i

πkγi

T∏t=1

(1 + ηi rkt )ηi r

kT+1

=∑k,i

πkγi

T∏t=1

(1 + ηi rkt )ηi

(〈wT+1, `T+1〉 − `kT+1

)= 0

when we pick

wkT+1 =

∑i πkγi

∏Tt=1(1 + ηi r

kt )ηi∑

j ,i πjγi∏T

t=1(1 + ηi rjt )ηi

(iProd)

Page 20: The Design of Online Learning Algorithms fileThe Design of Online Learning Algorithms Wouter M. Koolen Online Learning Workshop Paris, Friday 20th October, 2017

StategyStragegy for Skeptic:Split capital K0 = 1 over experts k with weights πk and ηi with γi .

KT =∑k,i

πkγi

T∏t=1

(1 + ηi rkt )

How to play? Make sure KT does not grow big.

KT+1 −KT =∑k,i

πkγi

T∏t=1

(1 + ηi rkt )ηi r

kT+1

=∑k,i

πkγi

T∏t=1

(1 + ηi rkt )ηi

(〈wT+1, `T+1〉 − `kT+1

)= 0

when we pick

wkT+1 =

∑i πkγi

∏Tt=1(1 + ηi r

kt )ηi∑

j ,i πjγi∏T

t=1(1 + ηi rjt )ηi

(iProd)

Page 21: The Design of Online Learning Algorithms fileThe Design of Online Learning Algorithms Wouter M. Koolen Online Learning Workshop Paris, Friday 20th October, 2017

Outline

1 Coin Betting

2 Defensive Forecasting

3 Squint

4 MetaGrad

Page 22: The Design of Online Learning Algorithms fileThe Design of Online Learning Algorithms Wouter M. Koolen Online Learning Workshop Paris, Friday 20th October, 2017

Squint

wkT+1 =

∑i πkγi

∏Tt=1(1 + ηi r

kt )ηi∑

j ,i πjγi∏T

t=1(1 + ηi rjt )ηi

(iProd)

Needs work:

Rates (how sublinear are the regrets?)

Computation

Page 23: The Design of Online Learning Algorithms fileThe Design of Online Learning Algorithms Wouter M. Koolen Online Learning Workshop Paris, Friday 20th October, 2017

RatesBy design,

1 = KT =∑k,i

πkγi

T∏t=1

(1 + ηi rkt )

So for each i and k,

− lnπk − ln γi ≥T∑t=1

ln(1 + ηi rkt ) ≥ ηi

T∑t=1

rkt − η2iT∑t=1

(rkt )2

That is, abbreviating vkt = (rkt )2,

T∑t=1

rkt ≤ mini

(ηi

T∑t=1

vkt +− lnπk − ln γi

ηi

)

Theorem (Koolen and van Erven [2015])

RkT ≤ O

(√V kT

(− lnπk + ln lnV k

T

))

Page 24: The Design of Online Learning Algorithms fileThe Design of Online Learning Algorithms Wouter M. Koolen Online Learning Workshop Paris, Friday 20th October, 2017

RatesBy design,

1 = KT =∑k,i

πkγi

T∏t=1

(1 + ηi rkt )

So for each i and k,

− lnπk − ln γi ≥T∑t=1

ln(1 + ηi rkt ) ≥ ηi

T∑t=1

rkt − η2iT∑t=1

(rkt )2

That is, abbreviating vkt = (rkt )2,

T∑t=1

rkt ≤ mini

(ηi

T∑t=1

vkt +− lnπk − ln γi

ηi

)

Theorem (Koolen and van Erven [2015])

RkT ≤ O

(√V kT

(− lnπk + ln lnV k

T

))

Page 25: The Design of Online Learning Algorithms fileThe Design of Online Learning Algorithms Wouter M. Koolen Online Learning Workshop Paris, Friday 20th October, 2017

RatesBy design,

1 = KT =∑k,i

πkγi

T∏t=1

(1 + ηi rkt )

So for each i and k,

− lnπk − ln γi ≥T∑t=1

ln(1 + ηi rkt ) ≥ ηi

T∑t=1

rkt − η2iT∑t=1

(rkt )2

That is, abbreviating vkt = (rkt )2,

T∑t=1

rkt ≤ mini

(ηi

T∑t=1

vkt +− lnπk − ln γi

ηi

)

Theorem (Koolen and van Erven [2015])

RkT ≤ O

(√V kT

(− lnπk + ln lnV k

T

))

Page 26: The Design of Online Learning Algorithms fileThe Design of Online Learning Algorithms Wouter M. Koolen Online Learning Workshop Paris, Friday 20th October, 2017

RatesBy design,

1 = KT =∑k,i

πkγi

T∏t=1

(1 + ηi rkt )

So for each i and k,

− lnπk − ln γi ≥T∑t=1

ln(1 + ηi rkt ) ≥ ηi

T∑t=1

rkt − η2iT∑t=1

(rkt )2

That is, abbreviating vkt = (rkt )2,

T∑t=1

rkt ≤ mini

(ηi

T∑t=1

vkt +− lnπk − ln γi

ηi

)

Theorem (Koolen and van Erven [2015])

RkT ≤ O

(√V kT

(− lnπk + ln lnV k

T

))

Page 27: The Design of Online Learning Algorithms fileThe Design of Online Learning Algorithms Wouter M. Koolen Online Learning Workshop Paris, Friday 20th October, 2017

Computation

KT =∑k,i

πkγi

T∏t=1

(1 + ηi rkt )

We are using ln(1 + r) ≥ r − r2 in the analysis. Put it inalgorithm?

Indeed, we can start from supermartingale

KT ≥ ΦT =∑k,i

πkγi

T∏t=1

eηi rkt −η2i v

kt =

∑k,i

πkγieηiR

kT−η

2i V

kT

One choice of weights to keep this small:

wkT+1 =

∑i πkγie

ηiRkT−η

2i V

kT ηi∑

j ,i πjγieηiR

jT−η

2i V

jT ηi

(Squint)

Page 28: The Design of Online Learning Algorithms fileThe Design of Online Learning Algorithms Wouter M. Koolen Online Learning Workshop Paris, Friday 20th October, 2017

Computation

KT =∑k,i

πkγi

T∏t=1

(1 + ηi rkt )

We are using ln(1 + r) ≥ r − r2 in the analysis. Put it inalgorithm?Indeed, we can start from supermartingale

KT ≥ ΦT =∑k,i

πkγi

T∏t=1

eηi rkt −η2i v

kt =

∑k,i

πkγieηiR

kT−η

2i V

kT

One choice of weights to keep this small:

wkT+1 =

∑i πkγie

ηiRkT−η

2i V

kT ηi∑

j ,i πjγieηiR

jT−η

2i V

jT ηi

(Squint)

Page 29: The Design of Online Learning Algorithms fileThe Design of Online Learning Algorithms Wouter M. Koolen Online Learning Workshop Paris, Friday 20th October, 2017

Computation

KT =∑k,i

πkγi

T∏t=1

(1 + ηi rkt )

We are using ln(1 + r) ≥ r − r2 in the analysis. Put it inalgorithm?Indeed, we can start from supermartingale

KT ≥ ΦT =∑k,i

πkγi

T∏t=1

eηi rkt −η2i v

kt =

∑k,i

πkγieηiR

kT−η

2i V

kT

One choice of weights to keep this small:

wkT+1 =

∑i πkγie

ηiRkT−η

2i V

kT ηi∑

j ,i πjγieηiR

jT−η

2i V

jT ηi

(Squint)

Page 30: The Design of Online Learning Algorithms fileThe Design of Online Learning Algorithms Wouter M. Koolen Online Learning Workshop Paris, Friday 20th October, 2017

Computation, ctd

wkT+1 =

∑i πkγie

ηiRkT−η

2i V

kT ηi∑

j ,i πjγieηiR

jT−η

2i V

jT ηi

(Squint)

Maybe a continuous prior on η could help? How to make∫ 1/2

0γ(η)eηR

kT−η

2V kT η dη

fast?

Conjugate γ(η) ∝ e−aη−bη2. ⇒ Truncated Guassian mean.

Improper γ(η) ∝ 1η . ⇒ Gaussian CDF.

Theorem (Koolen and van Erven [2015])

RkT ≤ O

(√V kT (− lnπk + ln lnT )

)

Page 31: The Design of Online Learning Algorithms fileThe Design of Online Learning Algorithms Wouter M. Koolen Online Learning Workshop Paris, Friday 20th October, 2017

Computation, ctd

wkT+1 =

∑i πkγie

ηiRkT−η

2i V

kT ηi∑

j ,i πjγieηiR

jT−η

2i V

jT ηi

(Squint)

Maybe a continuous prior on η could help? How to make∫ 1/2

0γ(η)eηR

kT−η

2V kT η dη

fast?

Conjugate γ(η) ∝ e−aη−bη2. ⇒ Truncated Guassian mean.

Improper γ(η) ∝ 1η . ⇒ Gaussian CDF.

Theorem (Koolen and van Erven [2015])

RkT ≤ O

(√V kT (− lnπk + ln lnT )

)

Page 32: The Design of Online Learning Algorithms fileThe Design of Online Learning Algorithms Wouter M. Koolen Online Learning Workshop Paris, Friday 20th October, 2017

Computation, ctd

wkT+1 =

∑i πkγie

ηiRkT−η

2i V

kT ηi∑

j ,i πjγieηiR

jT−η

2i V

jT ηi

(Squint)

Maybe a continuous prior on η could help? How to make∫ 1/2

0γ(η)eηR

kT−η

2V kT η dη

fast?

Conjugate γ(η) ∝ e−aη−bη2. ⇒ Truncated Guassian mean.

Improper γ(η) ∝ 1η . ⇒ Gaussian CDF.

Theorem (Koolen and van Erven [2015])

RkT ≤ O

(√V kT (− lnπk + ln lnT )

)

Page 33: The Design of Online Learning Algorithms fileThe Design of Online Learning Algorithms Wouter M. Koolen Online Learning Workshop Paris, Friday 20th October, 2017

Computation, ctd

wkT+1 =

∑i πkγie

ηiRkT−η

2i V

kT ηi∑

j ,i πjγieηiR

jT−η

2i V

jT ηi

(Squint)

Maybe a continuous prior on η could help? How to make∫ 1/2

0γ(η)eηR

kT−η

2V kT η dη

fast?

Conjugate γ(η) ∝ e−aη−bη2. ⇒ Truncated Guassian mean.

Improper γ(η) ∝ 1η . ⇒ Gaussian CDF.

Theorem (Koolen and van Erven [2015])

RkT ≤ O

(√V kT (− lnπk + ln lnT )

)

Page 34: The Design of Online Learning Algorithms fileThe Design of Online Learning Algorithms Wouter M. Koolen Online Learning Workshop Paris, Friday 20th October, 2017

Squint Conclusion

Computation:

O(1) time per round (like Hedge, Adapt-ML-Prod, . . . )

Library: https://bitbucket.org/wmkoolen/squint

Regret:

Adaptive√V kT (lnK + ln lnT ) bound.

Implies L∗ bound, T bound.

Constant regret in stochastic gap case.

Page 35: The Design of Online Learning Algorithms fileThe Design of Online Learning Algorithms Wouter M. Koolen Online Learning Workshop Paris, Friday 20th October, 2017

Outline

1 Coin Betting

2 Defensive Forecasting

3 Squint

4 MetaGrad

Page 36: The Design of Online Learning Algorithms fileThe Design of Online Learning Algorithms Wouter M. Koolen Online Learning Workshop Paris, Friday 20th October, 2017

Online Convex Optimisation

Let’s play the OCO game.For t = 1, 2, . . .

Learner plays wt ∈ U (convex, bounded).

Reality picks ft : U → R (convex, bounded gradient)

Learner incurs f (wt) and observes ∇ft(wt)

Goal: small regret w.r.t. all u ∈ U

RuT =

T∑t=1

(ft(wt)− ft(u))

Step 1: let’s play a harder game with linearised loss

ft(wt)− ft(u) ≤ 〈wt − u,∇ft(wt)〉 =: rut

Page 37: The Design of Online Learning Algorithms fileThe Design of Online Learning Algorithms Wouter M. Koolen Online Learning Workshop Paris, Friday 20th October, 2017

Online Convex Optimisation

Let’s play the OCO game.For t = 1, 2, . . .

Learner plays wt ∈ U (convex, bounded).

Reality picks ft : U → R (convex, bounded gradient)

Learner incurs f (wt) and observes ∇ft(wt)

Goal: small regret w.r.t. all u ∈ U

RuT =

T∑t=1

(ft(wt)− ft(u))

Step 1: let’s play a harder game with linearised loss

ft(wt)− ft(u) ≤ 〈wt − u,∇ft(wt)〉 =: rut

Page 38: The Design of Online Learning Algorithms fileThe Design of Online Learning Algorithms Wouter M. Koolen Online Learning Workshop Paris, Friday 20th October, 2017

Online Convex Optimisation

Let’s play the OCO game.For t = 1, 2, . . .

Learner plays wt ∈ U (convex, bounded).

Reality picks ft : U → R (convex, bounded gradient)

Learner incurs f (wt) and observes ∇ft(wt)

Goal: small regret w.r.t. all u ∈ U

RuT =

T∑t=1

(ft(wt)− ft(u))

Step 1: let’s play a harder game with linearised loss

ft(wt)− ft(u) ≤ 〈wt − u,∇ft(wt)〉 =: rut

Page 39: The Design of Online Learning Algorithms fileThe Design of Online Learning Algorithms Wouter M. Koolen Online Learning Workshop Paris, Friday 20th October, 2017

iProd/Squint for OCOGoal: keep regret small for all u ∈ U ⇒ prior π on u.

KT =

∫ 1/2

0γ(η)

∫Uπ(u)

T∏t=1

(1 + ηrut ) du dη

Kan we keep this small?

KT+1 −KT

=

∫ 1/2

0γ(η)

∫Uπ(u)

T∏t=1

(1 + ηrut )η 〈wT+1 − u,∇fT+1(wT+1)〉 du dη

Need∫ 1/2

0γ(η)

∫Uπ(u)

T∏t=1

(1 + ηrut )η (wT+1 − u) du dη = 0

which mandates

wT+1 =

∫ 1/20 γ(η)

∫U π(u)

∏Tt=1(1 + ηrut )ηu du dη∫ 1/2

0 γ(η)∫U π(u)

∏Tt=1(1 + ηrut )η du dη

Page 40: The Design of Online Learning Algorithms fileThe Design of Online Learning Algorithms Wouter M. Koolen Online Learning Workshop Paris, Friday 20th October, 2017

iProd/Squint for OCOGoal: keep regret small for all u ∈ U ⇒ prior π on u.

KT =

∫ 1/2

0γ(η)

∫Uπ(u)

T∏t=1

(1 + ηrut ) du dη

Kan we keep this small?

KT+1 −KT

=

∫ 1/2

0γ(η)

∫Uπ(u)

T∏t=1

(1 + ηrut )η 〈wT+1 − u,∇fT+1(wT+1)〉 du dη

Need∫ 1/2

0γ(η)

∫Uπ(u)

T∏t=1

(1 + ηrut )η (wT+1 − u) du dη = 0

which mandates

wT+1 =

∫ 1/20 γ(η)

∫U π(u)

∏Tt=1(1 + ηrut )ηu du dη∫ 1/2

0 γ(η)∫U π(u)

∏Tt=1(1 + ηrut )η du dη

Page 41: The Design of Online Learning Algorithms fileThe Design of Online Learning Algorithms Wouter M. Koolen Online Learning Workshop Paris, Friday 20th October, 2017

iProd/Squint for OCOGoal: keep regret small for all u ∈ U ⇒ prior π on u.

KT =

∫ 1/2

0γ(η)

∫Uπ(u)

T∏t=1

(1 + ηrut ) du dη

Kan we keep this small?

KT+1 −KT

=

∫ 1/2

0γ(η)

∫Uπ(u)

T∏t=1

(1 + ηrut )η 〈wT+1 − u,∇fT+1(wT+1)〉 du dη

Need∫ 1/2

0γ(η)

∫Uπ(u)

T∏t=1

(1 + ηrut )η (wT+1 − u) du dη = 0

which mandates

wT+1 =

∫ 1/20 γ(η)

∫U π(u)

∏Tt=1(1 + ηrut )ηu du dη∫ 1/2

0 γ(η)∫U π(u)

∏Tt=1(1 + ηrut )η du dη

Page 42: The Design of Online Learning Algorithms fileThe Design of Online Learning Algorithms Wouter M. Koolen Online Learning Workshop Paris, Friday 20th October, 2017

iProd/Squint for OCOGoal: keep regret small for all u ∈ U ⇒ prior π on u.

KT =

∫ 1/2

0γ(η)

∫Uπ(u)

T∏t=1

(1 + ηrut ) du dη

Kan we keep this small?

KT+1 −KT

=

∫ 1/2

0γ(η)

∫Uπ(u)

T∏t=1

(1 + ηrut )η 〈wT+1 − u,∇fT+1(wT+1)〉 du dη

Need∫ 1/2

0γ(η)

∫Uπ(u)

T∏t=1

(1 + ηrut )η (wT+1 − u) du dη = 0

which mandates

wT+1 =

∫ 1/20 γ(η)

∫U π(u)

∏Tt=1(1 + ηrut )ηu du dη∫ 1/2

0 γ(η)∫U π(u)

∏Tt=1(1 + ηrut )η du dη

Page 43: The Design of Online Learning Algorithms fileThe Design of Online Learning Algorithms Wouter M. Koolen Online Learning Workshop Paris, Friday 20th October, 2017

OCO iProd:

wT+1 =

∫ 1/20 γ(η)

∫U π(u)

∏Tt=1(1 + ηrut )ηu du dη∫ 1/2

0 γ(η)∫U π(u)

∏Tt=1(1 + ηrut )η du dη

Work needed:

Picking priors

Computation

Rates

Page 44: The Design of Online Learning Algorithms fileThe Design of Online Learning Algorithms Wouter M. Koolen Online Learning Workshop Paris, Friday 20th October, 2017

Prod bound to the rescue

We might also use

wT+1 =

∫ 1/20 γ(η)

∫U π(u)eηR

uT−η

2VuT ηu du dη∫ 1/2

0 γ(η)∫U π(u)eηR

uT−η2V

uT η du dη

RuT =

T∑t=1

〈wt − u,∇ft(wt)〉

V uT =

T∑t=1

〈wt − u,∇ft(wt)〉2

linear/quadratic in u. ⇒ suggests π(u) multivariate Normal.

But then wt may end up outside U . And rut not bounded.

Page 45: The Design of Online Learning Algorithms fileThe Design of Online Learning Algorithms Wouter M. Koolen Online Learning Workshop Paris, Friday 20th October, 2017

Prod bound to the rescue

We might also use

wT+1 =

∫ 1/20 γ(η)

∫U π(u)eηR

uT−η

2VuT ηu du dη∫ 1/2

0 γ(η)∫U π(u)eηR

uT−η2V

uT η du dη

RuT =

T∑t=1

〈wt − u,∇ft(wt)〉

V uT =

T∑t=1

〈wt − u,∇ft(wt)〉2

linear/quadratic in u. ⇒ suggests π(u) multivariate Normal.

But then wt may end up outside U . And rut not bounded.

Page 46: The Design of Online Learning Algorithms fileThe Design of Online Learning Algorithms Wouter M. Koolen Online Learning Workshop Paris, Friday 20th October, 2017

Prod bound to the rescue

We might also use

wT+1 =

∫ 1/20 γ(η)

∫U π(u)eηR

uT−η

2VuT ηu du dη∫ 1/2

0 γ(η)∫U π(u)eηR

uT−η2V

uT η du dη

RuT =

T∑t=1

〈wt − u,∇ft(wt)〉

V uT =

T∑t=1

〈wt − u,∇ft(wt)〉2

linear/quadratic in u. ⇒ suggests π(u) multivariate Normal.

But then wt may end up outside U . And rut not bounded.

Page 47: The Design of Online Learning Algorithms fileThe Design of Online Learning Algorithms Wouter M. Koolen Online Learning Workshop Paris, Friday 20th October, 2017

MetaGrad

Let’s do it anyway. Turns out it works.

w =

∑i γiηiwi∑i γiηi

γi ← γie−ηi ri−η2i r

2i

whereri = (wi − w)>∇ft

Tilted Exponential Weights

Σi ← (Σ−1i + 2η2i∇ft∇f >t )−1

wi ← ΠU (wi − ηiΣi∇ft (1 + 2ηi ri ))

≈ Online Newton Step

Theorem

The regret of MetaGrad is bounded by

RT = O

(min

{√T ,√V u∗T d lnT

})

Page 48: The Design of Online Learning Algorithms fileThe Design of Online Learning Algorithms Wouter M. Koolen Online Learning Workshop Paris, Friday 20th October, 2017

MetaGrad

Let’s do it anyway. Turns out it works.

w =

∑i γiηiwi∑i γiηi

γi ← γie−ηi ri−η2i r

2i

whereri = (wi − w)>∇ft

Tilted Exponential Weights

Σi ← (Σ−1i + 2η2i∇ft∇f >t )−1

wi ← ΠU (wi − ηiΣi∇ft (1 + 2ηi ri ))

≈ Online Newton Step

Theorem

The regret of MetaGrad is bounded by

RT = O

(min

{√T ,√V u∗T d lnT

})

Page 49: The Design of Online Learning Algorithms fileThe Design of Online Learning Algorithms Wouter M. Koolen Online Learning Workshop Paris, Friday 20th October, 2017

Consequences

What’s new with√VTd lnT?

Corollary

For α-exp-concave or α-strongly convex losses, MetaGradensures

RT = O (d lnT )

without knowing α.

Corollary (Koolen, Grunwald, and van Erven [2016])

For any β-Bernstein P, MetaGrad keeps the expected regret below

ER∗T ≤ O(

(d lnT )1

2−β T1−β2−β

)without knowing β.

Page 50: The Design of Online Learning Algorithms fileThe Design of Online Learning Algorithms Wouter M. Koolen Online Learning Workshop Paris, Friday 20th October, 2017

Consequences

What’s new with√VTd lnT?

Corollary

For α-exp-concave or α-strongly convex losses, MetaGradensures

RT = O (d lnT )

without knowing α.

Corollary (Koolen, Grunwald, and van Erven [2016])

For any β-Bernstein P, MetaGrad keeps the expected regret below

ER∗T ≤ O(

(d lnT )1

2−β T1−β2−β

)without knowing β.

Page 51: The Design of Online Learning Algorithms fileThe Design of Online Learning Algorithms Wouter M. Koolen Online Learning Workshop Paris, Friday 20th October, 2017

Consequences

What’s new with√VTd lnT?

Corollary

For α-exp-concave or α-strongly convex losses, MetaGradensures

RT = O (d lnT )

without knowing α.

Corollary (Koolen, Grunwald, and van Erven [2016])

For any β-Bernstein P, MetaGrad keeps the expected regret below

ER∗T ≤ O(

(d lnT )1

2−β T1−β2−β

)without knowing β.

Page 52: The Design of Online Learning Algorithms fileThe Design of Online Learning Algorithms Wouter M. Koolen Online Learning Workshop Paris, Friday 20th October, 2017

Conclusion

A simple factor (1 + ηrt) stretches surprisingly far.