Deep Generative Models I

41
Deep Generative Models I Variational Autoencoders – Part II Machine Perception Otmar Hilliges 23 April 2020

Transcript of Deep Generative Models I

Page 1: Deep Generative Models I

Deep Generative Models IVariational Autoencoders – Part II

Machine Perception

Otmar Hilliges

23 April 2020

Page 2: Deep Generative Models I

Last week(s)

Intro to generative modelling

VAE

Derivation of the ELBO

4/23/2020 2

Page 3: Deep Generative Models I

This week

More generative models

Step I: Variatonal autoencoders: • Training

• Representation learning & disentanglement

• Applications

Step II: Generative adverserial networks

4/23/2020 3

Page 4: Deep Generative Models I

Generative Modelling

Given training data, generate new samples, drawn from “same” distribution

We want 𝑝𝑚𝑜𝑑𝑒𝑙(𝑥) to be similar to 𝑝𝑑𝑎𝑡𝑎(𝑥)

4/23/2020 4

Training samples ~𝑝𝑑𝑎𝑡𝑎(𝑥) Generated samples ~𝑝𝑚𝑜𝑑𝑒𝑙(𝑥)

Page 5: Deep Generative Models I

Variational autoencoder

4/23/2020 5

𝑥

Inference modelGenerative model

𝑧

Page 6: Deep Generative Models I

Recap: ELBO

Ε𝑧 log 𝑝Θ 𝑥 𝑖 𝑧 ] − 𝐷𝐾𝐿(𝑞𝜙 𝑧 𝑥 𝑖 ) ∥ 𝑝Θ 𝑧 𝑖 )

4/23/2020 6

ℒ(𝑥 𝑖 , Θ, 𝜙)

𝒙

𝝁 𝒛 𝒙 𝚺 𝒛 𝒙

Sample 𝑧|𝑥~𝒩 𝜇 𝑧 𝑥 , Σ 𝑧 𝑥

𝒛

𝝁 𝒙 𝒛 𝚺 𝒙 𝒛

Sample x|𝑧~𝒩 𝜇 𝑥 𝑧 , Σ 𝑥 𝑧

ෝ𝒙

Encoder network𝑞𝜙 𝑧 𝑥

Decoder network𝑝Θ 𝑥 𝑧

Make approximate posteriorSimilar to prior

Maximize reconstruction likelihood

Page 7: Deep Generative Models I

Training VAEs

Training of VAEs is just backprop.

But wait, how do gradients flow through 𝑧?

Or any other random operation (requiring sampling)?

(see blackboard)

4/23/2020 7

Page 8: Deep Generative Models I

4/23/2020 8

Page 9: Deep Generative Models I
Page 10: Deep Generative Models I

Generating Data

4/23/2020 10

𝒛

𝝁 𝒙 𝒛 𝚺 𝒙 𝒛

Sample x|𝑧~𝒩 𝜇 𝑥 𝑧 , Σ 𝑥 𝑧

ෝ𝒙

Decoder network𝑝Θ 𝑥 𝑧

At runtime only use decoder network. Sample z from prior.

Sample 𝑧~𝒩 0, 𝐼

[Kingma and Welling, “Auto-Encoding Variational Bayes”, ICLR 2014]

Page 11: Deep Generative Models I

Cross-modal Deep Variational Hand Pose Estimation

Adrian Spurr, Jie Song, Seonwook Park, Otmar Hilliges

CVPR 2018

Page 12: Deep Generative Models I

RGB hand pose estimation

4/23/2020 12

Page 13: Deep Generative Models I

Assumption: Manifold of valid hand poses

Adopted from [Tagliasacchi et al. 2015]

Page 14: Deep Generative Models I

Cross-modal latent space – Inference model

Page 15: Deep Generative Models I

Cross-modal latent space – Generative model

Page 16: Deep Generative Models I

+++

[Spurr et al.: CVPR ‘18]

Page 17: Deep Generative Models I

Cross-modal training objective

Original VAE considers only one modality → need to modify to consider multiple modalities

Consider 𝑥𝑖 and 𝑥𝑡 samples from different modalities (e.g RGB and 3D joint positions)

Important: Both describe inherently the same thing. In our case, the hand pose.

Task: maximize log 𝑝𝜃(𝑥𝑡) under 𝑥𝑖

17

[Spurr et al.: CVPR ‘18]

Page 18: Deep Generative Models I

Cross-modal training objective derivation

log 𝑝 𝑥𝑡 = 𝑧 𝑞 𝑧 𝑥𝑖 log 𝑝 𝑥𝑡 𝑑𝑧 = 𝑧 𝑞 𝑧 𝑥𝑖 log𝑝 𝑥𝑡 𝑝 𝑧 𝑥𝑡 𝑞(𝑧|𝑥𝑖)

𝑝 𝑧 𝑥𝑡 𝑞(𝑧|𝑥𝑖)𝑑𝑧 (Bayes’ Rule)

= 𝑧 𝑞 𝑧 𝑥𝑖 log𝑞 𝑧 𝑥𝑖𝑝 𝑧 𝑥𝑡

𝑑𝑧 + 𝑧 𝑞 𝑧 𝑥𝑖 log𝑝 𝑥𝑡 𝑝(𝑧|𝑥𝑡)

𝑞(𝑧|𝑥𝑖)𝑑𝑧 (Logarithms)

= 𝐷𝐾𝐿(𝑞(𝑧|𝑥𝑖)||𝑝 𝑧 𝑥𝑡 ) + 𝑧 𝑞 𝑧 𝑥𝑖 log𝑝 𝑥𝑡 𝑧 𝑝 𝑧

𝑞 𝑧 𝑥𝑖𝑑𝑧 (Definition KL)

≥ 𝑧 𝑞 𝑧 𝑥𝑖 log 𝑝 𝑥𝑡 𝑧 𝑑𝑧 − 𝑧 𝑞 𝑧 𝑥𝑖 log𝑞 𝑧 𝑥𝑖𝑝 𝑧

𝑑𝑧 ( log𝑥

𝑦=− log

𝑦

𝑥)

= 𝔼𝑧~𝑞 𝑧 𝑥𝑖 [log 𝑝 𝑥𝑡 𝑧)] − 𝐷𝐾𝐿(𝑞(𝑧|𝑥𝑖)||𝑝 𝑧 ) (𝔼 over z, KL)

18

[Spurr et al.: CVPR ‘18]

Page 19: Deep Generative Models I

Trained individually Trained cross-modal

t-SNE embedding comparisons[Spurr et al.: CVPR ‘18]

Page 20: Deep Generative Models I

RGB Input Ground Truth Prediction

Posterior estimates: RGB (real) to 3D joints[Spurr et al.: CVPR ‘18]

Page 21: Deep Generative Models I

Synthetic hands: Latent Space Walk

[Synthetic samples RGB] [Synthetic samples depth]

[Spurr et al.: CVPR ‘18]

Page 22: Deep Generative Models I

Learning useful representations

4/23/2020 22

Goal: Learn features that capture similarities and dissimilarities Requirement: Objective that defines notion of utility (task-dependent)

Page 23: Deep Generative Models I

Autoencoders

4/23/2020 23

X Z 𝐗

f g

Notion of utility: Ability to reconstruct pixels from code (features are a compressed representation of original data)

Page 24: Deep Generative Models I

Autoencoders

4/23/2020 24

X Z

f

Entangled Representation: Individual dimensions in code encode some unknown combination of features in the data.

Learned representation

Page 25: Deep Generative Models I

Goal: Disentangled Representations

4/23/2020 25

Digit

Styl

e

Input EncoderFactorized

Code

Goal: Learn features that correspond to distinct factors of variation One Notion of Utility: statistical independence

Page 26: Deep Generative Models I

Regular vs Variational Autoencoders

4/23/2020 26

AE (z-dim 50, TSNE) VAE (z-dim 50, TSNE)

Issue: KL-Divergence regularizes values of z butRepresentations are still entangled

Page 27: Deep Generative Models I

Unsupervised vs Semi-supervised

4/23/2020 27

[image source: Babak Esmaeli]

[Kingma et al. Semi-Supervised Learning with Deep Generative Models. NIPS 2014]

Digit

Styl

e

Separate label 𝑦 from “nuisance” variable 𝑧

Page 28: Deep Generative Models I

Learning factorized representations (unsupervised)

4/23/2020 28

[image source: Emile Mathieu]

Page 29: Deep Generative Models I

𝛽-VAE

Goal: Learn disentangled representation without supervision

Idea: Provide framework for automated discovery of interpretable factorised latent representations

Approach: modification of the VAE, introducing an adjustable hyperparameter beta that balances latent channel capacity and independence constraints with reconstruction accuracy.

4/23/2020 29

[Higgins et al., “Learning Basic Visual Concepts with a Constrained Variational Framework”, ICLR 2017]

Page 30: Deep Generative Models I

4/23/2020 30

𝛽-VAE [Higgins et al., “Learning Basic Visual Concepts with a Constrained Variational Framework”, ICLR 2017]

Page 31: Deep Generative Models I

𝛽-VAE

4/23/2020 31

[Higgins et al., “Learning Basic Visual Concepts with a Constrained Variational Framework”, ICLR 2017]

Conditionally independent factors

Conditionally dependent factors

Assume that samples x are generated by controlling the generative factors v and w.

Page 32: Deep Generative Models I

4/23/2020 32

𝛽-VAE – Training Objective [Higgins et al., “Learning Basic Visual Concepts with a Constrained Variational Framework”, ICLR 2017]

max𝜙,𝜃

𝔼𝑥~𝒟 𝔼𝑧~𝑞𝜙(𝑧|𝑥) log 𝑝𝜃(𝑥|𝑧)

subject to DKL 𝑞𝜙 𝑧 𝑥 ∥ 𝑝𝜃 𝑧 < 𝛿

Page 33: Deep Generative Models I

4/23/2020 33

𝛽-VAE – Training Objective [Higgins et al., “Learning Basic Visual Concepts with a Constrained Variational Framework”, ICLR 2017]

Page 34: Deep Generative Models I

Entangled versus Disentangled Representations

4/23/2020 34

𝛽-VAE vanilla VAE

Page 35: Deep Generative Models I

Disentangling Latent Hands for Image Synthesis and Pose Estimation

Linlin Yang, Angela Yao

CVPR 2019

Page 36: Deep Generative Models I

Disentangling latent factors

4/23/2020 36

[Yang et al.: CVPR ‘19]

Page 37: Deep Generative Models I

Disentangling latent factors

4/23/2020 37

[Yang et al.: CVPR ‘19]

𝐿 𝜙𝑥, 𝜙𝑦1 , 𝜙𝑦2 , 𝜃𝑥, 𝜃𝑦1 , 𝜃𝑦2 = 𝐸𝐿𝐵𝑂𝑑𝑖𝑠 𝑥, 𝑦1, 𝑦2, 𝜙𝑦1 , 𝜙𝑦2 , 𝜃𝑥, 𝜃𝑦1 , 𝜃𝑦2+ 𝐸𝐿𝐵𝑂𝑒𝑚𝑏(𝑥, 𝑦1, 𝑦2, 𝜙𝑥).

where

log 𝑝 𝑥, 𝑦1, 𝑦2 ≥ 𝐸𝐿𝐵𝑂𝑑𝑖𝑠 𝑥, 𝑦1, 𝑦2, 𝜙𝑦1 , 𝜙𝑦2 , 𝜃𝑥, 𝜃𝑦1 , 𝜃𝑦2= 𝜆𝑥𝔼𝑧~𝑞𝜙𝑦1,𝜙𝑦2

log 𝑝𝜃𝑥 𝑥 𝑧

+ 𝜆𝑦1𝔼𝑧𝑦1~𝑞𝜙𝑦1log 𝑝𝜃𝑦1 𝑦1 𝑧𝑦1

+ 𝜆𝑦2𝔼𝑧𝑦2~𝑞𝜙𝑦2log 𝑝𝜃𝑦2 𝑦2 𝑧𝑦2

− 𝛽𝐷𝐾𝐿 𝑞𝜙𝑦1 ,𝜙𝑦2𝑧 𝑦1, 𝑦2 ||𝑝 𝑧

and

𝐸𝐿𝐵𝑂𝑒𝑚𝑏 𝑥, 𝑦1, 𝑦2, 𝜙𝑥 = 𝜆′𝑥𝔼𝑧~𝑞𝜙𝑥 log 𝑝𝜃𝑥 𝑥 𝑧

+ 𝜆′𝑦1𝔼𝑧𝑦1~𝑞𝜙𝑥 log 𝑝𝜃𝑦1 𝑦1 𝑧𝑦1

+ 𝜆′𝑦2𝔼𝑧𝑦2~𝑞𝜙𝑥 log 𝑝𝜃𝑦2 𝑦2 𝑧𝑦2

− 𝛽′𝐷𝐾𝐿 𝑞𝜙𝑥(𝑧|𝑥)||𝑝 𝑧

Page 38: Deep Generative Models I

Latenspace walk with disentangled z

4/23/2020 38

Page 39: Deep Generative Models I

Latenspace walk with disentangled z

4/23/2020 39

Page 40: Deep Generative Models I

Limitations of VAEs

4/23/2020 40

Tendency to generate blurry images.

Believed to be due to injected noise and weak inference models (Gaussian assumption of latent samples to simplistic, model capacity to weak)

More expressive model →substantially better results (e.gKingma et al. 2016, Improving variational inference with inverse autoregressive flow)

Page 41: Deep Generative Models I

Next

Deep Generative Modelling II:

Generative Adversarial Networks (GANs)

4/23/2020 41