Lecture 9 VAE variants 40mins - GitHub Pages

Post on 29-Apr-2022

2 views 0 download

Transcript of Lecture 9 VAE variants 40mins - GitHub Pages

VAE variantsHao Dong

Peking University

1

• Convolutional VAE• Conditional VAE• β-VAE• IWAE• Ladder VAE• Progressive + Fade-in VAE• VAE in speech• Temporal Difference VAE (TD-VAE)

2

VAE variants

Hierarchical representation learning

Temporal representa2on learning

Representation learning

• Convolutional VAE• Conditional VAE• IWAE• β-VAE• Ladder VAE• Progressive + Fade-in VAE• VAE in speech• Temporal Difference VAE (TD-VAE)

3

VAE variants

Hierarchical representation learning

Temporal representation learning

Representation learning

4

Convolutional Variational Autoencoder

• Limitations of vanilla VAE• The size of weight of fully connected layer == input size x output size• If VAE uses fully connected layers only, will lead to curse of dimensionality

when the input dimension is large (e.g., image).

• Solution

Image is modified from: Deep Clustering with Convolutional Autoencoder. NIPS 2017.

Fully connected layers here(independent variables)

• Convolutional VAE• Conditional VAE• β-VAE• IWAE• Ladder VAE• Progressive + Fade-in VAE• VAE in speech• Temporal Difference VAE (TD-VAE)

5

VAE variants

Hierarchical representa2on learning

Temporal representation learning

Representation learning

6

Conditional Variational Autoencoder

• Train and inference with labelled data.

Learning structured output representation using deep conditional generative models. NeurIPS 2015.

7

Recap: Variational Autoencoder

Auto-Encoding Variational Bayes. Diederik P. Kingma, Max Welling. ICLR 2013

• Recap: Se>ng up the objecAve• Maximise P(X)• Set Q(z) to be an arbitrary distribuGon

8

Recap: Variational Autoencoder

Auto-Encoding Variational Bayes. Diederik P. Kingma, Max Welling. ICLR 2013

• Recap: Setting up the objective

encoder ideal reconstruction KLD

9

Condi6onal Varia6onal Autoencoder

Learning structured output representation using deep conditional generative models. NeurIPS 2015.

• Setting up the objective with labels• Maximise P(Y|X)• Set Q(z) to be an arbitrary distribution

10

Conditional Variational Autoencoder

Learning structured output representation using deep conditional generative models. NeurIPS 2015.

• Setting up the objective

encoder ideal

reconstruction KLD

11

Conditional Variational Autoencoder

• Train and inference without labelled data i.e., vanilla VAE

Learning structured output representation using deep conditional generative models. NeurIPS 2015.

12

Conditional Variational Autoencoder

• Train and inference with labelled data.

Learning structured output representation using deep conditional generative models. NeurIPS 2015.

13

Conditional Variational Autoencoder

• Train and inference with labelled data.

Learning structured output representation using deep conditional generative models. NeurIPS 2015.

14

Conditional Variational Autoencoder

• Train and inference with labeled data.

Learning structured output representation using deep conditional generative models. NeurIPS 2015.

• Convolutional VAE• Conditional VAE• β-VAE• IWAE• Ladder VAE• Progressive + Fade-in VAE• VAE in speech• Temporal Difference VAE (TD-VAE)

15

VAE variants

Hierarchical representation learning

Temporal representation learning

Representation learning

16

Before we start

• Disentangled / Factorised representation• Each variable in the inferred latent representation is only sensitive to one

single generative factor and relatively invariant to other factors• Good interpretability and easy generalization to a variety of tasks

β-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework. Irina Higgins, Loic Matthey, Arka Pal, Christopher Burgess, Xavier Glorot, Matthew Botvinick, Shakir Mohamed, and Alexander Lerchner. ICLR 2017.

17

Before we start

• Unsupervised hierarchical representation learning

Rethinking Style and Content Disentanglement in Variational Autoencoders. ICLR 2018 Workshops.

18

β-VAE

• Unsupervised representation learning• Augment the original VAE framework with a single hyper-parameter β

that modulates the learning constraints• Impose a limit on the capacity of the latent information channel

β-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework. Irina Higgins, Loic Matthey, Arka Pal, Christopher Burgess, Xavier Glorot, Matthew Botvinick, Shakir Mohamed, and Alexander Lerchner. ICLR 2017.

19

β-VAE

β-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework. Irina Higgins, Loic Matthey, Arka Pal, Christopher Burgess, Xavier Glorot, Matthew Botvinick, Shakir Mohamed, and Alexander Lerchner. ICLR 2017.

20

β-VAE

β-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework. Irina Higgins, Loic Matthey, Arka Pal, Christopher Burgess, Xavier Glorot, Matthew Botvinick, Shakir Mohamed, and Alexander Lerchner. ICLR 2017.

21

β-VAE

β-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework. Irina Higgins, Loic Matthey, Arka Pal, Christopher Burgess, Xavier Glorot, Matthew Botvinick, Shakir Mohamed, and Alexander Lerchner. ICLR 2017.

22

β-VAE

β-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework. Irina Higgins, Loic Matthey, Arka Pal, Christopher Burgess, Xavier Glorot, Matthew Botvinick, Shakir Mohamed, and Alexander Lerchner. ICLR 2017.

23

β-VAE

β-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework. Irina Higgins, Loic Matthey, Arka Pal, Christopher Burgess, Xavier Glorot, Matthew Botvinick, Shakir Mohamed, and Alexander Lerchner. ICLR 2017.

24

β-VAE

β-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework. Irina Higgins, Loic Matthey, Arka Pal, Christopher Burgess, Xavier Glorot, Matthew Botvinick, Shakir Mohamed, and Alexander Lerchner. ICLR 2017.

25

β-VAE

• Discussion: It is really unsupervised?

• It is unsupervised/self-supervised learning, because it does not need any label data

• It is not fully unsupervised learning, it works because of the inductive bias of the neural network model, the hierarchical design introduces prior knowledge about the data

β-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework. Irina Higgins, Loic Matthey, Arka Pal, Christopher Burgess, Xavier Glorot, Matthew Botvinick, Shakir Mohamed, and Alexander Lerchner. ICLR 2017.

• Convolutional VAE• Conditional VAE• β-VAE• IWAE• Ladder VAE• Progressive + Fade-in VAE• VAE in speech• Temporal Difference VAE (TD-VAE)

26

VAE variants

Hierarchical representation learning

Temporal representation learning

Representation learning

27

IWAE(Importance Weighted Autoencoder)

Importance Weighted Autoencoders. ICLR 2016.

• Optimise a tighter lower bound than VAE• VAE just optimises a lower bound of log P(X)

encoder ideal reconstruction KLD

-ELBO

ELBO

28

IWAE

Importance Weighted Autoencoders. ICLR 2016.

• Optimise a tighter lower bound than VAE

29

IWAE

Importance Weighted Autoencoders. ICLR 2016.

30

IWAE

Importance Weighted Autoencoders. ICLR 2016.

• Why “Importance weighted”

where

• Convolutional VAE• Conditional VAE• β-VAE• IWAE• Ladder VAE• Progressive + Fade-in VAE• VAE in speech• Temporal Difference VAE (TD-VAE)

31

VAE variants

Hierarchical representation learning

Temporal representation learning

Representation learning

32

Ladder VAE

• To learn hierarchical latent representation• Deep models with several layers of dependent stochastic variables are difficult to train

• Limiting the improvements obtained using these highly expressive models

LVAE: Ladder Variational Autoencoder. Casper Kaae Sønderby, Tapani Raiko, Lars Maaløe, Søren Kaae Sønderby, and Ole Winther. NIPS 2016.

33

Ladder VAE

LVAE: Ladder Variational Autoencoder. Casper Kaae Sønderby, Tapani Raiko, Lars Maaløe, Søren Kaae Sønderby, and Ole Winther. NIPS 2016.

encode decode encode decode

Hierarchical VAE Ladder VAE

34

Ladder VAE

LVAE: Ladder Variational Autoencoder. Casper Kaae Sønderby, Tapani Raiko, Lars Maaløe, Søren Kaae Sønderby, and Ole Winther. NIPS 2016.

35

Ladder VAE

LVAE: Ladder Variational Autoencoder. Casper Kaae Sønderby, Tapani Raiko, Lars Maaløe, Søren Kaae Sønderby, and Ole Winther. NIPS 2016.

36

Ladder VAE

LVAE: Ladder Variational Autoencoder. Casper Kaae Sønderby, Tapani Raiko, Lars Maaløe, Søren Kaae Sønderby, and Ole Winther. NIPS 2016.

37

Ladder VAE

LVAE: Ladder Variational Autoencoder. Casper Kaae Sønderby, Tapani Raiko, Lars Maaløe, Søren Kaae Sønderby, and Ole Winther. NIPS 2016.

38

Ladder VAE

LVAE: Ladder Variational Autoencoder. Casper Kaae Sønderby, Tapani Raiko, Lars Maaløe, Søren Kaae Sønderby, and Ole Winther. NIPS 2016.

39

Ladder VAE

LVAE: Ladder Variational Autoencoder. Casper Kaae Sønderby, Tapani Raiko, Lars Maaløe, Søren Kaae Sønderby, and Ole Winther. NIPS 2016.

• Convolutional VAE• Conditional VAE• β-VAE• IWAE• Ladder VAE• Progressive + Fade-in VAE• VAE in speech• Temporal Difference VAE (TD-VAE)

40

VAE variants

Hierarchical representation learning

Temporal representation learning

Representation learning

41

Progressive + Fade-in VAE

• Discussion

Progressive Learning and Disentanglement of Hierarchical Representations. ICLR 2020.

encoders decoders

high-level

middle-level

low-level

Can we directly train a hierarchical VAE with ladder structure like that?

42

Progressive + Fade-in VAE

• Discussion

Progressive Learning and Disentanglement of Hierarchical Representations. ICLR 2020.

encoders decoders

high-level

middle-level

low-level

Information SHORTCUT problemall information go through the low-level path,other paths are ignored.model is lazy…

43

Progressive + Fade-in VAE

• Progressive + Fade-in

Progressive Learning and Disentanglement of Hierarchical Representations. ICLR 2020.

Stage 1: enable high-level path Stage 2: enable high-level and middle paths

Stage 3: enable all paths …

𝛼 increase from 0 to 1 gradually

44

Progressive + Fade-in VAE

• Results

Progressive Learning and Disentanglement of Hierarchical Representations. ICLR 2020.

High-level: background and foreground colors

Middle-level: shape

Low-level: …

45

Progressive + Fade-in VAE

• Results

Progressive Learning and Disentanglement of Hierarchical Representations. ICLR 2020.

High-level Middle-level Low-level

• Convolutional VAE• Conditional VAE• β-VAE• IWAE• Ladder VAE• Progressive + Fade-in VAE• VAE in speech• Temporal Difference VAE (TD-VAE)

46

VAE variants

Hierarchical representation learning

Temporal representation learning

Representation learning

47

VAE in speech

• Learning latent representations for style control and transfer in end-to-end speech synthesis

• RNN as encoder

Learning latent representations for style control and transfer in end-to-end speech synthesis. ICASSP 2019.

• Convolutional VAE• Conditional VAE• β-VAE• IWAE• Ladder VAE• Progressive + Fade-in VAE• VAE in speech• Temporal Difference VAE (TD-VAE)

48

VAE variants

Hierarchical representation learning

Temporal representation learning

Representation learning

50

TD-VAE

• State-space model as a Markov Chain model

Temporal Difference VAE. ICLR 2019.

51

TD-VAE

Temporal Difference VAE. ICLR 2019.

52

TD-VAE

Temporal Difference VAE. ICLR 2019.

53

TD-VAE

Temporal Difference VAE. ICLR 2019.

54

TD-VAE

Temporal Difference VAE. ICLR 2019.

55

TD-VAE

Temporal Difference VAE. ICLR 2019.

• Convolutional VAE• Conditional VAE• β-VAE• IWAE• Ladder VAE• Progressive + Fade-in VAE• VAE in speech• Temporal Difference VAE (TD-VAE)

56

Summary

Hierarchical representation learning

Temporal representation learning

Representation learning

Thanks

57