Lipschitz Generative Adversarial Nets11-14-00)-11-15-10-4628-lipschitz_gener.pdf · Zhiming Zhou...

11
Lipschitz Generative Adversarial Nets Zhiming Zhou 1 Jiadong Liang 2 Lantao Yu 3 Yong Yu 1 Zhihua Zhang 2 1 Shanghai Jiao Tong University 2 Peking University 2 Stanford University ICML, 2019 Zhiming Zhou Lipschitz Generative Adversarial Nets ICML, 2019 1 / 11

Transcript of Lipschitz Generative Adversarial Nets11-14-00)-11-15-10-4628-lipschitz_gener.pdf · Zhiming Zhou...

Page 1: Lipschitz Generative Adversarial Nets11-14-00)-11-15-10-4628-lipschitz_gener.pdf · Zhiming Zhou Lipschitz Generative Adversarial Nets ICML, 2019 2 / 11. The Gradient Uninformativeness

Lipschitz Generative Adversarial Nets

Zhiming Zhou1 Jiadong Liang2 Lantao Yu3 Yong Yu1 Zhihua Zhang2

1Shanghai Jiao Tong University

2Peking University

2Stanford University

ICML, 2019

Zhiming Zhou Lipschitz Generative Adversarial Nets ICML, 2019 1 / 11

Page 2: Lipschitz Generative Adversarial Nets11-14-00)-11-15-10-4628-lipschitz_gener.pdf · Zhiming Zhou Lipschitz Generative Adversarial Nets ICML, 2019 2 / 11. The Gradient Uninformativeness

Generalized Formulation for GANs

minf ∈F

Ez∼Pz [φ(f (g(z)))] + Ex∼Pr [ψ(f (x))],

ming∈G

Ez∼Pz [ϕ(f (g(z)))],(1)

where

Pz : the source distribution in Rnz ;Pr : the target (real) distribution in Rnr ;g : the generative function Rnz → Rnr ;f : the discriminative function Rnr → R;G: the generative function space;F : the discriminative function space;

φ, ψ, ϕ: R→ R are loss metrics.

GAN : φ(x) = ψ(−x) = − log(σ(−x))

WGAN : φ(x) = ψ(−x) = x

LSGAN : φ(x) = ψ(−x) = (x + α)2

We denote the generation distribution by Pg .

Zhiming Zhou Lipschitz Generative Adversarial Nets ICML, 2019 2 / 11

Page 3: Lipschitz Generative Adversarial Nets11-14-00)-11-15-10-4628-lipschitz_gener.pdf · Zhiming Zhou Lipschitz Generative Adversarial Nets ICML, 2019 2 / 11. The Gradient Uninformativeness

The Gradient Uninformativeness

The problem that the gradient from the discriminator does not contain any informativeabout the real distribution.

A new perspective for the training instability / convergence issue of GANs.

For GANs with unrestricted F :

f ∗(x) = arg minf (x)∈R

Pg (x)φ(f (x)) + Pr (x)ψ(f (x)), ∀x . (2)

f ∗(x) is independently defined and only reflects the local densities Pr (x) and Pg (x);

∇x f ∗(x) does not reflect any information about the other distribution, if the supports oftwo distributions are disjoint.

Zhiming Zhou Lipschitz Generative Adversarial Nets ICML, 2019 3 / 11

Page 4: Lipschitz Generative Adversarial Nets11-14-00)-11-15-10-4628-lipschitz_gener.pdf · Zhiming Zhou Lipschitz Generative Adversarial Nets ICML, 2019 2 / 11. The Gradient Uninformativeness

The Gradient Uninformativeness

Unrestricted GANs MUST suffer from this problem

Restricted GANs May suffer from this problem

GANs with W-Distance May suffer from this problem

Lipschitz GANs DO NOT suffer from this problem

Zhiming Zhou Lipschitz Generative Adversarial Nets ICML, 2019 4 / 11

Page 5: Lipschitz Generative Adversarial Nets11-14-00)-11-15-10-4628-lipschitz_gener.pdf · Zhiming Zhou Lipschitz Generative Adversarial Nets ICML, 2019 2 / 11. The Gradient Uninformativeness

Lipschitz Generative Adversarial Nets (LGANs)

minf ∈F

Ez∼Pz [φ(f (g(z)))] + Ex∼Pr [ψ(f (x))] + λ · k(f )2 ,

ming∈G

Ez∼Pz [ϕ(f (g(z)))]. (3)

We require φ and ψ to satisfy:{φ′(x) > 0,

φ′′(x) ≥ 0,and φ(x) = ψ(−x). (4)

Any increasing function with non-decreasing derivative.

Zhiming Zhou Lipschitz Generative Adversarial Nets ICML, 2019 5 / 11

k(f ): Lipschitz constant of f

Page 6: Lipschitz Generative Adversarial Nets11-14-00)-11-15-10-4628-lipschitz_gener.pdf · Zhiming Zhou Lipschitz Generative Adversarial Nets ICML, 2019 2 / 11. The Gradient Uninformativeness

Lipschitz Generative Adversarial Nets (LGANs)

Theoretically guaranteed properties:

The optimal discriminative function f ∗ exists;

If φ is strictly convex, then f ∗ is unique;

There exists a unique Nash equilibrium where Pr = Pg and k(f ∗) = 0;

Do not suffer from gradient uninformativeness;

For each generated sample, the gradient directly points towards a real sample.

Zhiming Zhou Lipschitz Generative Adversarial Nets ICML, 2019 6 / 11

Page 7: Lipschitz Generative Adversarial Nets11-14-00)-11-15-10-4628-lipschitz_gener.pdf · Zhiming Zhou Lipschitz Generative Adversarial Nets ICML, 2019 2 / 11. The Gradient Uninformativeness

Experiments: Gradient Uninformativeness

Gradient uninformativeness practically behaviors as noisy gradient.

(a) Disjoint Case (b) Overlapping Case

Zhiming Zhou Lipschitz Generative Adversarial Nets ICML, 2019 7 / 11

Page 8: Lipschitz Generative Adversarial Nets11-14-00)-11-15-10-4628-lipschitz_gener.pdf · Zhiming Zhou Lipschitz Generative Adversarial Nets ICML, 2019 2 / 11. The Gradient Uninformativeness

Experiments: ∇x f∗(x) in LGANs

∇x f ∗(x) directly point towards real samples.

Zhiming Zhou Lipschitz Generative Adversarial Nets ICML, 2019 8 / 11

Page 9: Lipschitz Generative Adversarial Nets11-14-00)-11-15-10-4628-lipschitz_gener.pdf · Zhiming Zhou Lipschitz Generative Adversarial Nets ICML, 2019 2 / 11. The Gradient Uninformativeness

Experiments: f ∗ is Unique

Uniqueness of f ∗ leads to stabilized discriminative functions.

(a) f (x) in WGAN. (b) f (x) in LGANs.

Zhiming Zhou Lipschitz Generative Adversarial Nets ICML, 2019 9 / 11

Page 10: Lipschitz Generative Adversarial Nets11-14-00)-11-15-10-4628-lipschitz_gener.pdf · Zhiming Zhou Lipschitz Generative Adversarial Nets ICML, 2019 2 / 11. The Gradient Uninformativeness

Experiments: Unsupervised Image Generation

LGANs, with different choices of φ(x), consistently outperform WGAN.

(a) Training curves on CIFAR. (b) Training curves on Tiny.

Zhiming Zhou Lipschitz Generative Adversarial Nets ICML, 2019 10 / 11

Page 11: Lipschitz Generative Adversarial Nets11-14-00)-11-15-10-4628-lipschitz_gener.pdf · Zhiming Zhou Lipschitz Generative Adversarial Nets ICML, 2019 2 / 11. The Gradient Uninformativeness

Summary Poster: this evening, #19

Gradient uninformativeness

Unrestricted GANs MUST suffer from this problem

Restricted GANs May suffer from this problem

GANs with W-Distance May suffer from this problem

Lipschitz GANs DO NOT suffer from this problem

(5)

Lipschitz GANs:

Penalize the Lipschitz constant of f ;Set φ(x) to be any increasing function with non-decreasing derivative;If φ is strictly convex, then f ∗ is unique;The gradients directly point towards real samples.

Zhiming Zhou Lipschitz Generative Adversarial Nets ICML, 2019 11 / 11