Bandits and Hyper-parameter Optimization Paris, Oct. 2019 Bandits and Hyper-parameter Optimization...

download Bandits and Hyper-parameter Optimization Paris, Oct. 2019 Bandits and Hyper-parameter Optimization Xuedong

of 18

  • date post

    12-May-2020
  • Category

    Documents

  • view

    4
  • download

    0

Embed Size (px)

Transcript of Bandits and Hyper-parameter Optimization Paris, Oct. 2019 Bandits and Hyper-parameter Optimization...

  • Paris, Oct. 2019

    Bandits and Hyper-parameter Optimization

    Xuedong Shang1, Emilie Kaufmann2, Michal Valko3 1 Inria Lille, SequeL team 2 CNRS, CRIStAL 3 Google DeepMind

    Huawei Paris

  • What — Hyper-parameter optimization

    Problem We tackle hyper-parameter tuning for supervised learning tasks:

    global optimisation task: min{f (λ) : λ ∈ Ω};

    f (λ) , E [ ` (

    Y, ĝ (n)λ (X) )]

    measures the generalization power.

    Best-Arm Identification and Hyper-parameter Optimization Huawei Paris - 2/11

  • What — Hyper-parameter optimization

    Problem We tackle hyper-parameter tuning for supervised learning tasks:

    global optimisation task: min{f (λ) : λ ∈ Ω};

    f (λ) , E [ ` (

    Y, ĝ (n)λ (X) )]

    measures the generalization power.

    Best-Arm Identification and Hyper-parameter Optimization Huawei Paris - 2/11

  • What — Hyper-parameter optimization

    Problem We tackle hyper-parameter tuning for supervised learning tasks:

    global optimisation task: min{f (λ) : λ ∈ Ω};

    f (λ) , E [ ` (

    Y, ĝ (n)λ (X) )]

    measures the generalization power.

    Best-Arm Identification and Hyper-parameter Optimization Huawei Paris - 2/11

  • How — Best-arm identification

    We see the problem as best-arm identification in a stochastic infinitely-armed bandit: arms’ means are drawn from some reservoir distribution ν0.

    l

    Best-Arm Identification and Hyper-parameter Optimization Huawei Paris - 3/11

  • How — Best-arm identification

    We see the problem as best-arm identification in a stochastic infinitely-armed bandit: arms’ means are drawn from some reservoir distribution ν0.

    l

    Best-Arm Identification and Hyper-parameter Optimization Huawei Paris - 3/11

  • How — Best-arm identification

    We see the problem as best-arm identification in a stochastic infinitely-armed bandit: arms’ means are drawn from some reservoir distribution ν0.

    In each round 1. (optional) query a new arm from ν0; 2. sample an arm that was previously queried.

    goal: output an arm with mean close to µ? D-TTTS (Dynamic Top-Two Thompson Sampling) a dynamic algorithm built on TTTS.

    Best-Arm Identification and Hyper-parameter Optimization Huawei Paris - 3/11

  • How — D-TTTS

    In this talk... I Beta-Bernoulli Bayesian bandit model I a uniform prior over the mean of new arms

    1: Initialization: µ1 ∼ ν0; A = {µ1}; m = 1; S1,N1 = 0 2: while budget still available do 3: ∀i ∈ A, θi ∼ Beta(Si + 1,Ni − Si + 1) 4: I(1) = arg maxi=0,...,m θi{Thompson sampling} 5: end while

    Best-Arm Identification and Hyper-parameter Optimization Huawei Paris - 4/11

  • How — D-TTTS

    1: Initialization: µ1 ∼ ν0; A = {µ1}; m = 1; S1,N1 = 0 2: while budget still available do 3: ∀i ∈ A, θi ∼ Beta(Si + 1,Ni − Si + 1) 4: I(1) = arg maxi=0,...,m θi{Thompson sampling} 5: if U(∼ U([0, 1])) > β then 6: while I(2) 6= I(1) do 7: ∀i ∈ A, θ′i ∼ Beta(Si + 1,Ni − Si + 1) 8: I(2) ← arg maxi=0,...,m θ′i 9: end while

    10: I(1) ← I(2) 11: end if{TTTS} 12: end while

    Best-Arm Identification and Hyper-parameter Optimization Huawei Paris - 4/11

  • How — D-TTTS 1: Initialization: µ1 ∼ ν0; A = {µ1}; m = 1; S1,N1 = 0 2: while budget still available do 3: µm+1 ∼ ν0; A ← A∪ {µm+1} 4: Sm+1,Nm+1 ← 0; m← m + 1 5: ∀i ∈ A, θi ∼ Beta(Si + 1,Ni − Si + 1) 6: I(1) = arg maxi=0,...,m θi{Thompson sampling} 7: if U(∼ U([0, 1])) > β then 8: while I(2) 6= I(1) do 9: ∀i ∈ A, θ′i ∼ Beta(Si + 1,Ni − Si + 1)

    10: I(2) ← arg maxi=0,...,m θ′i 11: end while 12: I(1) ← I(2) 13: end if{TTTS} 14: Y ← evaluate arm I(1); X ∼ Ber(Y ) 15: SI(1) ← SI(1) + X ; NI(1) ← NI(1) + 1 16: end while

    Best-Arm Identification and Hyper-parameter Optimization Huawei Paris - 4/11

  • How — D-TTTS c’td

    D-TTTS in summary... In each round, query a new arm endowed with a Beta(1,1) prior, without sampling it, and run TTTS on the new set of arms.

    Best-Arm Identification and Hyper-parameter Optimization Huawei Paris - 5/11

  • How — D-TTTS c’td Order statistic trick With Lt−1 the list of arms that have been effectively sampled at time t, we run TTTS on the set Lt−1 ∪ {µ0} where µ0 is a pseudo-arm with posterior Beta(t − |Lt−1|, 1).

    Figure: Posterior distributions of 4 arms and the pseudo-arm

    Best-Arm Identification and Hyper-parameter Optimization Huawei Paris - 5/11

  • Why

    I TTTS is anytime for finitely-armed bandits; I the flexibility of this Bayesian algorithm allows to propose a

    dynamic version for the infinite BAI; I unlike previous approaches, D-TTTS does not need to fix the

    number of arms queried in advance, and naturally adapts to the difficulty of the task.

    Best-Arm Identification and Hyper-parameter Optimization Huawei Paris - 6/11

  • HPO as a BAI problem

    BAI HPO

    query ν0 pick a new configuration λ˙

    sample an arm˙ train the classifier gλ

    reward cross-validation loss

    Best-Arm Identification and Hyper-parameter Optimization Huawei Paris - 7/11

  • Experiments — Setting

    Classifier Hyper-parameter Type Bounds

    SVM C R+ [ 10−5, 105

    ] γ R+

    [ 10−5, 105

    ] Table: hyper-parameters to be tuned for UCI experiments.

    Classifier Hyper-parameter Type Bounds

    MLP hidden layer size Integer [5, 50] alpha R+ [0, 0.9] learning rate init R+

    [ 10−5, 10−1

    ] Table: hyper-parameters to be tuned for MNIST experiments.

    Best-Arm Identification and Hyper-parameter Optimization Huawei Paris - 8/11

  • Experiments — Some results

    0 5 10 15 20 25 Number of Iterations

    0.0

    0.1

    0.2

    0.3

    0.4

    0.5

    Va lid

    at io

    n er

    ro r

    Hyperband TPE Random Search H-TTTS D-TTTS

    (a) wine

    0 10 20 30 40 50 60 70 80 Number of Iterations

    0.05

    0.10

    0.15

    0.20

    0.25

    0.30

    Va lid

    at io

    n er

    ro r

    Hyperband TPE Random Search H-TTTS D-TTTS

    (b) breast cancer

    0 20 40 60 80 100 120 140 160 Number of Iterations

    0.15

    0.16

    0.17

    0.18

    0.19

    0.20

    0.21

    0.22

    Va lid

    at io

    n er

    ro r

    Hyperband TPE Random Search H-TTTS D-TTTS

    (c) adult

    0 20 40 60 80 100 Number of Iterations

    0.000

    0.025

    0.050

    0.075

    0.100

    0.125

    0.150

    0.175

    0.200

    Va lid

    at io

    n er

    ro r

    Hyperband TPE Random Search H-TTTS D-TTTS

    (d) MNIST

    Best-Arm Identification and Hyper-parameter Optimization Huawei Paris - 9/11

  • Next?

    I Extend to the non-stochastic setting; I Theoretical guarantee?

    Theorem (Shang, Heide, et al. 2019) The TTTS sampling rule coupled with the Chernoff stopping rule form a δ-correct BAI strategy. Moreover, if all the arms means are distinct, it satisfies

    lim sup δ→0

    E [τδ] log(1/δ) ≤

    1 Γ?β .

    More details Check out [Shang, Kaufmann, et al. 2019; Shang, Heide, et al. 2019].

    Best-Arm Identification and Hyper-parameter Optimization Huawei Paris - 10/11

  • References

    Thank you!

    Xuedong Shang, Rianne de Heide, Pierre Ménard, Emilie Kaufmann, and Michal Valko. “Fixed-confidence guarantees for Bayesian best-arm identification”. 2019. Xuedong Shang, Emilie Kaufmann, and Michal Valko. “A simple dynamic bandit algorithm for hyper-parameter optimization”. In: 6th ICML Workshop on AutoML. 2019.

    Best-Arm Identification and Hyper-parameter Optimization Huawei Paris - 11/11

    References