Bandits and Hyper-parameter Optimization · Paris, Oct. 2019 Bandits and Hyper-parameter...

Paris, Oct. 2019

Bandits and Hyper-parameter Optimization

Xuedong Shang1, Emilie Kaufmann2, Michal Valko3

1 Inria Lille, SequeL team2 CNRS, CRIStAL3 Google DeepMind

Huawei Paris

What — Hyper-parameter optimization

ProblemWe tackle hyper-parameter tuning for supervised learning tasks:

global optimisation task: minf (λ) : λ ∈ Ω;

f (λ) , E[`(

Y, g (n)λ (X)

)]measures the generalization

power.

Best-Arm Identification and Hyper-parameter Optimization Huawei Paris - 2/11

How — Best-arm identification

We see the problem as best-arm identification in a stochasticinfinitely-armed bandit: arms’ means are drawn from somereservoir distribution ν0.

l


How — Best-arm identification

We see the problem as best-arm identification in a stochasticinfinitely-armed bandit: arms’ means are drawn from somereservoir distribution ν0.

In each round1. (optional) query a new arm from ν0;2. sample an arm that was previously queried.

goal: output an arm with mean close to µ?D-TTTS (Dynamic Top-Two Thompson Sampling) a dynamicalgorithm built on TTTS.


How — D-TTTS

In this talk...I Beta-Bernoulli Bayesian bandit modelI a uniform prior over the mean of new arms

1: Initialization: µ1 ∼ ν0; A = µ1; m = 1; S1,N1 = 02: while budget still available do3: ∀i ∈ A, θi ∼ Beta(Si + 1,Ni − Si + 1)4: I(1) = arg maxi=0,...,m θiThompson sampling5: end while


How — D-TTTS

1: Initialization: µ1 ∼ ν0; A = µ1; m = 1; S1,N1 = 02: while budget still available do3: ∀i ∈ A, θi ∼ Beta(Si + 1,Ni − Si + 1)4: I(1) = arg maxi=0,...,m θiThompson sampling5: if U(∼ U([0, 1])) > β then6: while I(2) 6= I(1) do7: ∀i ∈ A, θ′i ∼ Beta(Si + 1,Ni − Si + 1)8: I(2) ← arg maxi=0,...,m θ

′i

9: end while10: I(1) ← I(2)11: end ifTTTS12: end while


How — D-TTTS

1: Initialization: µ1 ∼ ν0; A = µ1; m = 1; S1,N1 = 02: while budget still available do3: µm+1 ∼ ν0; A ← A∪ µm+14: Sm+1,Nm+1 ← 0; m← m + 15: ∀i ∈ A, θi ∼ Beta(Si + 1,Ni − Si + 1)6: I(1) = arg maxi=0,...,m θiThompson sampling7: if U(∼ U([0, 1])) > β then8: while I(2) 6= I(1) do9: ∀i ∈ A, θ′i ∼ Beta(Si + 1,Ni − Si + 1)

10: I(2) ← arg maxi=0,...,m θ′i

11: end while12: I(1) ← I(2)13: end ifTTTS14: Y ← evaluate arm I(1); X ∼ Ber(Y )15: SI(1) ← SI(1) + X ; NI(1) ← NI(1) + 116: end while


How — D-TTTS c’td

D-TTTS in summary...In each round, query a new arm endowed with a Beta(1,1) prior,without sampling it, and run TTTS on the new set of arms.


How — D-TTTS c’tdOrder statistic trickWith Lt−1 the list of arms that have been effectively sampled attime t, we run TTTS on the set Lt−1 ∪ µ0 where µ0 is apseudo-arm with posterior Beta(t − |Lt−1|, 1).

Figure: Posterior distributions of 4 arms and the pseudo-arm


Why

I TTTS is anytime for finitely-armed bandits;I the flexibility of this Bayesian algorithm allows to propose a

dynamic version for the infinite BAI;I unlike previous approaches, D-TTTS does not need to fix the

number of arms queried in advance, and naturally adapts tothe difficulty of the task.


HPO as a BAI problem

BAI HPO

query ν0 pick a new configuration λ˙

sample an arm˙ train the classifier gλ

reward cross-validation loss


Experiments — Setting

Classifier Hyper-parameter Type Bounds

SVM C R+[10−5, 105]

γ R+[10−5, 105]

Table: hyper-parameters to be tuned for UCI experiments.

Classifier Hyper-parameter Type Bounds

MLP hidden layer size Integer [5, 50]alpha R+ [0, 0.9]learning rate init R+

[10−5, 10−1]

Table: hyper-parameters to be tuned for MNIST experiments.


Experiments — Some results

0 5 10 15 20 25Number of Iterations

0.0

0.1

0.2

0.3

0.4

0.5

Valid

atio

n er

ror

HyperbandTPERandom SearchH-TTTSD-TTTS

(a) wine

0 10 20 30 40 50 60 70 80Number of Iterations

0.05

0.10

0.15

0.20

0.25

0.30

Valid

atio

n er

ror


(b) breast cancer

0 20 40 60 80 100 120 140 160Number of Iterations

0.15

0.16

0.17

0.18

0.19

0.20

0.21

0.22

Valid

atio

n er

ror


(c) adult

0 20 40 60 80 100Number of Iterations

0.000

0.025

0.050

0.075

0.100

0.125

0.150

0.175

0.200

Valid

atio

n er

ror


(d) MNIST


Next?

I Extend to the non-stochastic setting;I Theoretical guarantee?

Theorem (Shang, Heide, et al. 2019)The TTTS sampling rule coupled with the Chernoff stopping ruleform a δ-correct BAI strategy. Moreover, if all the arms means aredistinct, it satisfies

lim supδ→0

E [τδ]

log(1/δ)≤ 1

Γ?β.

More detailsCheck out [Shang, Kaufmann, et al. 2019; Shang, Heide, et al.2019].


References

Thank you!

Xuedong Shang, Rianne de Heide, Pierre Menard,Emilie Kaufmann, and Michal Valko. “Fixed-confidenceguarantees for Bayesian best-arm identification”. 2019.Xuedong Shang, Emilie Kaufmann, and Michal Valko. “Asimple dynamic bandit algorithm for hyper-parameteroptimization”. In: 6th ICML Workshop on AutoML. 2019.


Bandits and Hyper-parameter Optimization · Paris, Oct. 2019 Bandits and Hyper-parameter...

Documents

Transcript of Bandits and Hyper-parameter Optimization · Paris, Oct. 2019 Bandits and Hyper-parameter...