Meta online learning: experiments on a unit commitment problem (ESANN2014)

1
Meta Online Learning: Experiments on a Unit Commitment Problem Jialin Liu, Olivier Teytaud [email protected],[email protected] Black-box Noisy Optimization Objective function f itness : R d R Optimum θ * = argmin θ R d f itness(θ ) Some NOAs RSAES : Self-Adaptive Evolution Strategy with resampling; Fabian ’s algorithm: a first-order method using gradients estimated by finite differences[?, ?]; Noisy Newton ’s algorithm: a second-order method using a Hessian matrix approxi- mated also by finite differences[?]. Compare Solvers Early k n n: lag Why this lag ? (i) comparing current recommendations comparing good points very close fitness very expensive (ii) algorithms’ ranking is usually stable let us save up time by comparing "older" recommendations Solvers and Notations μ: parent pop. size in ES λ: pop. size in ES d: search space dimension n: generation index σ n : stepsize at generation n r n : resampling number at generation n For all NOPA: k n = dn 0.1 e r n = n 3 s n = 15n Table 1: Solvers in experiments Notation Algorithm and parametrization RSAES λ = 10d, μ =5d, r n = 10n 2 F abian1 σ n = 10/n 0.49 , a = 100 F abian2 σ n = 10/n 0.05 , a = 100 Newton1 σ n = 10/n, r n = n 2 Newton2 σ n = 100/n 4 , r n = n 2 P.12345 NOPA of 5 solvers above P.12345 + S. P.12345 with information sharing. P.22 NOPA of 2 (identical) F abian1 P.22 + S. P.22 with information sharing. P.222 NOPA of 3 (identical) F abian1 P.222 + S. P.222 with information sharing. Some References Abstract Online learning = real time machine learning “on the fly” Meta online learning = combining several online learning algorithms from a given set (termed portfolio) of algorithms combining Noisy Optimization Algorithms (NOPA=noisy optimiza- tion portfolio algorithm). Goals: (i) mitigating the effect of a bad choice of online learning algorithms (ii) parallelization (iii) combining the strengths of different algorithms. This paper: - Portfolio = classical for combinatorial optimization: we test portfolios for noisy optimization. - Recently, a methodology termed lag has been proposed for NOPA. We test experimentally the lag methodology for various problems. Noisy Optimization Portfolio Algorithm (NOPA) Iteration n of the portfolio {S 1 ,...,S M } containing M NOAs: Initialization module: If n =0 initialize all S i , i ∈{1,...,M }. For i ∈{1,...,M }: Update module: Apply an iteration of solver S i until it has received at least n data samples. Let θ i,n be the current recommendation by solver S i . Comparison module: If n = r m for some m, then For i ∈{1,...,M }, perform s m evaluations of the (stochastic) reward R(θ i,k n ) and define y i the average reward. Define i * arg min i∈{1,...,M } y i . Recommendation module: ˜ θ = θ i * ,n Experiments Table 2: Artificial problem R(θ )= -||θ - θ * || 2 + ||θ - θ * || z × Gaussian. n: evaluation number. z = rate at which the variance decreases around the optimum. z Comparison of log(R( ˜ θ n ))/ log(n) for d =2 Comparison of log(R( ˜ θ n ))/ log(n) for d =5 0 Newton1 < RSAES P.12345* <... Newton1 < RSAES P.12345* <... 1 P.12345 < F abian1 P.22* <... Fabian1 < P.22* < P.222* <... 2 P.12345* F abian1 P.22* <... Fabian1 < P.12345 < P.22* <... Discussion: NOPAs are usually not far from the best of their NOAs. In small dimension with noise variance decreazing quickly to 0 around optimum (z =2), NOPA outperforms all its NOAs. Table 3: Stochastic Unit Commitment problems, conformant planning. St: number of stocks. Problem size Considered NOA or NOPA St, T , d P.22 P.22 + S. P.222 P.222 + S. Best NOA Worst NOA 3, 21, 63 0.61 ± 0.07 0.63 ± 0.03 0.63 ± 0.05 0.63 ± 0.07 0.49 ± 0.08 0.81 ± 0.05 4, 21, 84 0.75 ± 0.02 0.75 ± 0.03 0.79 ± 0.05 0.76 ± 0.03 0.69 ± 0.06 1.27 ± 0.06 5, 21, 105 0.53 ± 0.04 0.58 ± 0.08 0.58 ± 0.03 0.52 ± 0.05 0.58 ± 0.04 1.44 ± 0.16 6, 15, 90 0.40 ± 0.05 0.39 ± 0.06 0.37 ± 0.06 0.39 ± 0.06 0.38 ± 0.06 0.96 ± 0.13 6, 21, 126 0.53 ± 0.08 0.54 ± 0.08 0.55 ± 0.07 0.54 ± 0.07 0.54 ± 0.07 1.78 ± 0.37 8, 15, 120 0.53 ± 0.03 0.50 ± 0.05 0.53 ± 0.02 0.51 ± 0.05 0.51 ± 0.04 1.70 ± 0.10 8, 21, 168 0.69 ± 0.04 0.77 ± 0.09 0.73 ± 0.06 0.71 ± 0.04 0.71 ± 0.06 2.68 ± 0.02 7, 21, 147 0.70 ± 0.07 0.70 ± 0.05 0.70 ± 0.07 0.70 ± 0.07 0.69 ± 0;06 2.28 ± 0.08 Discussion: Given a same budget, a NOPA of identical solvers can outperform its NOAs. RSAES is usually the best NOA for small dimensions and variants of Fabian for large dimension. Table 4: Approximate convergence rates log(-R( ˜ θ n ))/ log(n) for Cart-Pole, a multimodal problem, using NN. n: evaluation number. Solver 2 neurons, d =9 4 neurons, d = 17 8 neurons, d = 33 1(RSAES ) -0.458033±0.045014 -0.421535±0.045643 -0.351726±0.051705 2(F abian1) 0.002226±5.29923e-05 0.002089±1.57766e-04 0.00221±8.14518e-05 3(F abian2) 0.002318±9.80792e-05 0.002238±1.14289e-04 0.00236±1.51244e-04 4(Newton1) 0.002229±6.08973e-05 -0.030731±0.111294 0.002247±1.19829e-04 5(Newton2) 0.00227±5.2989e-05 0.002217±7.80888e-05 0.002307±9.96404e-05 6(P.12345) -0.408705±0.068428 -0.3917±0.071791 -0.320399±0.050338 7(P.12345 + S.) -0.42743±0.05709 -0.403707±0.056173 -0.354043±0.069576 Discussion: Fabian and Newton can’t solve this multimodal problem one solver is much better than others easy for NOPA. Conclusion Main conclusion: Usual: Portfolio of Algorithms for Combinatorial Optimization; New: Portfolio of Algorithms for Noisy Optimization. “Sharing” not that good. NOPA sometimes better than NOA even if all NOA equal! We show mathematically[?] and empirically a log(M ) shift when using M solvers, when working on the log-log scale (usual scale in noisy optimization). Portfolio = approximately as efficient as the best - except when one iteration of one algorithm monopolizes most of the budget - as RSAES in the unit commitment problem.

Transcript of Meta online learning: experiments on a unit commitment problem (ESANN2014)

MetaOnlineLearning: ExperimentsonaUnitCommitmentProblem

Jialin Liu, Olivier [email protected],[email protected]

Black-box Noisy OptimizationObjective function fitness : Rd → ROptimum θ∗ = argmin

θ∈Rd

fitness(θ)

Some NOAs• RSAES : Self-Adaptive Evolution Strategy

with resampling;

• Fabian’s algorithm: a first-order methodusing gradients estimated by finitedifferences[?, ?];

• Noisy Newton’s algorithm: a second-ordermethod using a Hessian matrix approxi-mated also by finite differences[?].

Compare Solvers Early• kn ≤ n: lag

• Why this lag ?(i) comparing current recommendations→ comparing good points→ very close fitness→ very expensive(ii) algorithms’ ranking is usually stable→ let us save up time by comparing"older" recommendations

Solvers and Notationsµ: parent pop. size in ESλ: pop. size in ESd: search space dimensionn: generation indexσn : stepsize at generation nrn: resampling number at generation n

For all NOPA:kn = dn0.1ern = n3

sn = 15n

Table 1: Solvers in experimentsNotation Algorithm and parametrization

RSAES λ = 10d, µ = 5d, rn = 10n2

Fabian1 σn = 10/n0.49, a = 100Fabian2 σn = 10/n0.05, a = 100Newton1 σn = 10/n, rn = n2

Newton2 σn = 100/n4, rn = n2

P.12345 NOPA of 5 solvers aboveP.12345 + S. P.12345 with information sharing.P.22 NOPA of 2 (identical) Fabian1P.22 + S. P.22 with information sharing.P.222 NOPA of 3 (identical) Fabian1P.222 + S. P.222 with information sharing.

Some References

Abstract• Online learning = real time machine learning “on the fly”• Meta online learning = combining several online learning algorithms from a given set (termed

portfolio) of algorithms ' combining Noisy Optimization Algorithms (NOPA=noisy optimiza-tion portfolio algorithm).

• Goals: (i) mitigating the effect of a bad choice of online learning algorithms (ii) parallelization(iii) combining the strengths of different algorithms.

• This paper:- Portfolio = classical for combinatorial optimization: we test portfolios for noisy optimization.- Recently, a methodology termed lag has been proposed for NOPA. We test experimentallythe lag methodology for various problems.

Noisy Optimization Portfolio Algorithm (NOPA)Iteration n of the portfolio {S1, . . . , SM} containing M NOAs:• Initialization module: If n = 0 initialize all Si, i ∈ {1, . . . ,M}.• For i ∈ {1, . . . ,M}:

– Update module: Apply an iteration of solver Si until it has received at least n data samples.

– Let θi,n be the current recommendation by solver Si.• Comparison module: If n = rm for some m, then

– For i ∈ {1, . . . ,M}, perform sm evaluations of the (stochastic) reward R(θi,kn ) and define yi the average reward.

– Define i∗ ← argmini∈{1,...,M} yi.

• Recommendation module: θ̃ = θi∗,n

ExperimentsTable 2: Artificial problem R(θ) = −||θ − θ∗||2 + ||θ − θ∗||z × Gaussian. n: evaluation number. z = rateat which the variance decreases around the optimum.

z Comparison of log(R(θ̃n))/ log(n) for d = 2 Comparison of log(R(θ̃n))/ log(n) for d = 50 Newton1 < RSAES ' P.12345∗ < . . . Newton1 < RSAES ' P.12345∗ < . . .1 P.12345 < Fabian1 ' P.22∗ < . . . Fabian1 < P.22∗ < P.222∗ < . . .2 P.12345∗ � Fabian1 ' P.22∗ < . . . Fabian1 < P.12345 < P.22∗ < . . .

Discussion: NOPAs are usually not far from the best of their NOAs. In small dimension with noisevariance decreazing quickly to 0 around optimum (z = 2), NOPA outperforms all its NOAs.

Table 3: Stochastic Unit Commitment problems, conformant planning. St: number of stocks.Problem size Considered NOA or NOPASt, T , d P.22 P.22 + S. P.222 P.222 + S. Best NOA Worst NOA3, 21, 63 0.61 ± 0.07 0.63 ± 0.03 0.63 ± 0.05 0.63 ± 0.07 0.49 ± 0.08 0.81 ± 0.054, 21, 84 0.75 ± 0.02 0.75 ± 0.03 0.79 ± 0.05 0.76 ± 0.03 0.69 ± 0.06 1.27 ± 0.065, 21, 105 0.53 ± 0.04 0.58 ± 0.08 0.58 ± 0.03 0.52 ± 0.05 0.58 ± 0.04 1.44 ± 0.166, 15, 90 0.40 ± 0.05 0.39 ± 0.06 0.37 ± 0.06 0.39 ± 0.06 0.38 ± 0.06 0.96 ± 0.136, 21, 126 0.53 ± 0.08 0.54 ± 0.08 0.55 ± 0.07 0.54 ± 0.07 0.54 ± 0.07 1.78 ± 0.378, 15, 120 0.53 ± 0.03 0.50 ± 0.05 0.53 ± 0.02 0.51 ± 0.05 0.51 ± 0.04 1.70 ± 0.108, 21, 168 0.69 ± 0.04 0.77 ± 0.09 0.73 ± 0.06 0.71 ± 0.04 0.71 ± 0.06 2.68 ± 0.027, 21, 147 0.70 ± 0.07 0.70 ± 0.05 0.70 ± 0.07 0.70 ± 0.07 0.69 ± 0;06 2.28 ± 0.08

Discussion: Given a same budget, a NOPA of identical solvers can outperform its NOAs. RSAESis usually the best NOA for small dimensions and variants of Fabian for large dimension.

Table 4: Approximate convergence rates log(−R(θ̃n))/ log(n) for Cart-Pole, a multimodal problem, usingNN. n: evaluation number.

Solver 2 neurons, d = 9 4 neurons, d = 17 8 neurons, d = 331 (RSAES) -0.458033±0.045014 -0.421535±0.045643 -0.351726±0.0517052 (Fabian1) 0.002226±5.29923e-05 0.002089±1.57766e-04 0.00221±8.14518e-053 (Fabian2) 0.002318±9.80792e-05 0.002238±1.14289e-04 0.00236±1.51244e-044 (Newton1) 0.002229±6.08973e-05 -0.030731±0.111294 0.002247±1.19829e-045 (Newton2) 0.00227±5.2989e-05 0.002217±7.80888e-05 0.002307±9.96404e-056 (P.12345) -0.408705±0.068428 -0.3917±0.071791 -0.320399±0.0503387 (P.12345 + S.) -0.42743±0.05709 -0.403707±0.056173 -0.354043±0.069576

Discussion: Fabian and Newton can’t solve this multimodal problem ⇒ one solver is much betterthan others ⇒ easy for NOPA.

Conclusion• Main conclusion:

Usual: Portfolio of Algorithms for Combinatorial Optimization;New: Portfolio of Algorithms for Noisy Optimization.

• “Sharing” not that good.• NOPA sometimes better than NOA even if all NOA equal!• We show mathematically[?] and empirically a log(M) shift when usingM solvers, when working

on the log-log scale (usual scale in noisy optimization).• Portfolio = approximately as efficient as the best - except when one iteration of one algorithm

monopolizes most of the budget - as RSAES in the unit commitment problem.