Dualing GANsyujiali/papers/nips17_dualgan_poster.pdf · 2017-11-30 · Dualing GANs Yujia Li 1 ,...

Dualing GANsYujia Li1,∗, Alexander Schwing3, Kuan-Chieh Wang1,2 and Richard Zemel1,2

1University of Toronto 2Vector Institute 3University of Illinois at Urbana-Champaign ∗Now at DeepMind

Introduction

GAN training suffers from instability due to its saddle point formulation:maxθ

minw f (θ,w)θ and w are generator and discriminator parameters respectively, f is theGAN loss. Typical GAN training alternates gradient updates to θ and w

θ → θ + ηθ∇θf (θ,w), w→ w− ηw∇wf (θ,w).However, to solve the saddle point problem ideally for each θ we want tosolve for w∗(θ) = argminw f (θ,w), and then optimize maxθ f (θ,w∗(θ)).For any w obtained from gradient updates, we have f (θ,w) ≥f (θ,w∗(θ)), therefore the outer optimization becomes a maximizationof an upper bound, leading to instability.

In this paper we propose to dualize the inner part minw f (θ,w) intomaxλ g(θ, λ) which is always a lower bound on f (θ,w∗(θ)) and solve themuch more stable maximization problem

maxλg(θ, λ).

This formulation allows us to:• Solve the instability problem for GANs with linear discriminators.• Improve stability for GANs with nonlinear discriminators.

GANs with Linear Discriminators

We start from linear discriminators that rely on a scoring functionF (w,x) = w>x. Any differentiable nonlinear feature φ(x) can be usedin place of x. The discriminator

Dw(x) = pw(y = 1|x) = σ(F (w,x)) = 11 + e−w>x.

The GAN loss on a batch of data {x1, ...,xn} and latent samples{z1, ..., zn} is

2‖w‖2

2 + 12n

1 + e−w>xi)

1 + ew>Gθ(zi)).

The loss is convex in w, we can derive the standard dual problem to be

g(θ, λ) = − 12C

∥∥∥∥∥∥∑i

λxixi −∑i

λziGθ(zi)∥∥∥∥∥∥

H(2nλxi) + 12n

H(2nλzi),

s.t. ∀i, 0 ≤ λxi ≤1

2n, 0 ≤ λzi ≤

H(u) = −u log u − (1 − u) log(1 − u) is the binary entropy, and theoptimal w∗ can be obtained from the optimal solution (λ∗z, λ∗x) for thisdual problem as

w∗ = 1C

λ∗xixi −∑i

λ∗ziGθ(zi) .

Properties:• The ‖∑i λxixi −

∑i λziGθ(zi)‖2

2 encourages moment matching.• The entropy terms encourage the λ’s to be close to the mean.

For training we optimize maxθ maxλ g(θ, λ), which is very stable.

GANs with Non-Linear Discriminators

In general the scoring function F (w,x) may be nonlinear in w and typ-ically implemented by a neural network. In this case the GAN loss

f (θ,w) = C

2‖w‖2

2+ 12n

log(1 + e−F (w,xi)

2n∑i

log(1 + eF (w,Gθ(zi))

)is not convex in w, therefore hard to dualize directly.

Proposed solution: approximate f locally around any point wk usinga model function mk,θ(s) ≈ f (θ,wk + s), then dualize mk,θ(s). Theoptimization problem for the discriminator becomes

mins mk,θ(s) s.t. 12‖s‖2

2 ≤ ∆k,

where 12‖s‖

2 ≤ ∆k is a trust-region constraint that ensures the quality ofthe approximation. The overall algorithm is shown below:

GAN optimization with model function

Initialize θ, w0, k = 0 and iterate1 One or few gradient ascent steps on f (θ,wk) w.r.t. θ2 Find step s using minsmk,θ(s) s.t. 1

2‖s‖22 ≤ ∆k

3 Update wk+1← wk + s4 k ← k + 1

We explore two approximations:(A). Cost function linearization: Linearize f as

mk,θ(s) = f (wk, θ) +∇wf (wk, θ)>sWe can solve for the optimal s∗ = −

√2∆k

‖∇wf (wk,θ)‖2∇wf (wk, θ) analytically.

This s∗ has the same form and direction as a gradient update used instandard GANs.

(B). Score function linearization: Linearize F and keep the lossF (wk + s,x) ≈ F̂ (s,x) = F (wk,x) + s>∇wF (wk,x), ∀x.

Model function is a more accurate approximation compared to (A).

mk,θ(s) =C

2‖wk + s‖2

2 + 12n

1 + e−F (wk,xi)−s>∇wF (wk,xi))

1 + eF (wk,Gθ(zi))+s>∇wF (wk,Gθ(zi))).

This m is convex in s and can be dualized. See paper for details.

With approximations the dual is not exact, but solving the dual may stillbe better than taking gradient steps for the primal.

Experiments

Linear Discriminator, 5 Gaussians

Linear Discriminator, MNIST

0 1 2 3 4 5 6 7 8discretized inception scores

MNIST Sensitivity to Hyperparams

gandual

Dual GAN Standard GAN Sensitivity to Hyperparams

Nonlinear Discriminator, MNIST

Score Lin. Cost Lin. Standard GAN

Nonlinear Discriminator, CIFAR-10Score Type Std. GAN Score Lin Cost Lin Real Data

Inception (end) 5.61±0.09 5.40±0.12 5.43±0.10 10.72 ± 0.38Our classifier (end) 3.85±0.08 3.52±0.09 4.42±0.09 8.03 ± 0.07

Inception (avg) 5.59±0.38 5.44±0.08 5.16±0.37 -Our classifier (avg) 3.64±0.47 3.70±0.27 4.04±0.37 -

Score Lin. Cost Lin. Standard GAN

Dualing GANsyujiali/papers/nips17_dualgan_poster.pdf · 2017-11-30 · Dualing GANs Yujia Li 1 ,...

Documents

Transcript of Dualing GANsyujiali/papers/nips17_dualgan_poster.pdf · 2017-11-30 · Dualing GANs Yujia Li 1 ,...

Classical Hypothesis Testing Theory Alexander Senf.

Instant Radiosity Alexander Keller University Kaiserslautern Present by Li-Fong Lin.

To alfavitari tou Anarxismou Alexander bergman

The Successors of Alexander the Great

Alexander Rhetor

Compositional ADAM: An Adaptive Compositional …Compositional ADAM: An Adaptive Compositional Solver Rasul Tutunov * 1Minne Li* 1 2 Alexander I. Cowen-Rivers Jun Wang2 Haitham Bou-Ammar1

Spente Le Stella Π Alexander Khokhlov Maranov.

Alexander Schmemann Μεγάλη Σαρακοστή πορεία προς το Πάσχα

Alexander: the Wider Vision

Model-Based fMRI Analysis Will Alexander Dept. of ...

MOTION OF ERYTHROCYTES ALONG THE CAPILLARIES Alexander V. Kopyltsov.

Dictamen pericial que presenta los resultados del análisis forense de ADN a Alexander Mora Venancio

VAEs and GANs - efrosgans.eecs.berkeley.eduefrosgans.eecs.berkeley.edu/CVPR18_slides/VAE_GANS_by_Rosca.pdf · Mihaela Rosca 2018 Maximum likelihood argmax θ E x ~p* log p θ (x)

Great alexander dclass

Alexander Gerfer & Michael Eckert...Alexander Gerfer & Michael Eckert FR-PM-3 Capacitor FR-PM-3 Capacitor : Equivalent Circuit Inductance of connection: SMD-types 1 nH ... 5 nH wired

Presented by: Rex Alexander, President of Five-Alpha LLC ...

«Γνωρίζεις ήδη ελληνικά» ενότητα 6 Μιλάς ελληνικά;alexander-edu.org/old/usga/pdf/unit6.pdf · «Γνωρίζεις ήδη ελληνικά»

Molecular Electronic Structure - UMD Physics · 2018. 1. 29. · 1 Molecular Electronic Structure Millard H. Alexander This version dated April 20, 2016 CONTENTS I. Molecular structure2

TDP-43 α liquid phase separation and functionTDP-43 α-helical structure tunes liquid–liquid phase separation and function Alexander E. Conicella a,b,c,1 , Gregory L. Dignon d,e,1

1* 2,3*, Fabio Arturo Grieco 1* 1, Alexander , Conny ... · UBD inhibits IRE1α/JNK in beta cells 1 Ubiquitin D . regulates IRE1α /JNK-dependent apoptosis in pancreatic beta cells.

1* 2,3, Fabio Arturo Grieco 1 1, Alexander , Conny ... · UBD inhibits IRE1α/JNK in beta cells 1 Ubiquitin D . regulates IRE1α /JNK-dependent apoptosis in pancreatic beta cells.