Abhishek Kumar, Ben Pooleno... · Abhishek Kumar, Ben Poole Google Research, Brain Team ICML 2020....

Confidential & Proprietary

On Implicit Regularization in β-VAEs

Abhishek Kumar, Ben Poole

Google Research, Brain Team

ICML 2020

β-VAE

Encoder Decoder

How does variational family regularize the learned generative model?

- Uniqueness of learned generative model (global regularization)

- Influencing the local geometry of the decoding model

- Deterministic approximation of β-VAE

- Empirical validation of theory and accuracy of approximations

Latent variable models and Uniqueness

Fixed prior , conditional decoding model , marginal

A set of solutions (latent representations) that are equivalent in terms of marginal likelihood [1].

Uniqueness: Ignoring permutations and transformations that act separately on each latent

such that

[1] Locatello et al, Challenging Common Assumptions in the Unsupervised Learning of Disentangled Representations, 2019.

Uniqueness via variational family

When true posterior is in the variational family :

Decoder

Decoder Posterior

Posterior

If but , maximum ELBO for the transformed model will be less than the untransformed model.

Example: Isotropic Gaussian prior and orthogonal transforms ( )

Isotropic Gaussian is invariant under orthogonal transformations

- Transforming latents by orthogonal transforms will leave the marginal unchanged

Restricting variational family to mean-field can break this orthogonal “symmetry”:

- If is mean-field, will be mean-field, if and only if (Darmois, 1953; Skitovitch, 1953)

(i) is factorized Gaussian, and

(ii) variances of are all equal (isotropic)

Models with non-Gaussian factorized will not have non-uniqueness to orthogonal transforms.

Uniqueness via variational family (when )

Choice of can lead to uniqueness even if .

: set of transforms w.r.t. which we want uniqueness (that leave the prior invariant)

: completion of by :=

1. If is such that , and

2. is unique (holds when is convex)

Two conditions:

Then transforming the latents with will result in reducing the β-VAE objective value.

Generative model of data:

Train a VAE on this data:

- Decoder

- Encoder

- Decoder

- Encoder

- Decoder

- Encoder

More details in the paper on implications for disentanglement [1].

[1] Locatello et al, Challenging Common Assumptions in the Unsupervised Learning of Disentangled Representations, 2019.

Regularization: local geometry

How does variational family regularize the local geometry of the generative model?c

Assumption: First two moments exist for .

Taylor approximation around : Jacobian

Hessian

Taking expectation wrt. , Taylor approximation of β-VAE reduces to

Covariance of

We can further reduce the approximation in terms of the Jacobian of the decoder.

(exact for relu, leaky-relu in decoder)

We can further reduce the approximation in terms of the Jacobian of the decoder.

(exact for relu, leaky-relu in decoder)

: Diagonal for pixel-wise independent models

Minimizes

Gaussian and

For this special case, optimal variational posterior covariance is given by

Gaussian and

For this special case, optimal variational posterior covariance is given by

A structure on influences the Jacobian of the decoder.

More details in the paper about its influence on metric properties of the learned manifold.

Gaussian , ,

The β-VAE objective approximates to

Reconstruction error

Encoding norm Regularizer on the decoder Jacobian

Gaussian , ,

The β-VAE objective approximates to

Reconstruction error

Encoding norm Regularizer on the decoder Jacobian

We upper bound the regularizer to make it more tractable:

We minimize a stochastic approximation of this upper bound (sampling one column of Jacobian per iteration).

GRAE≈:

Comparison of objectives

Samples

β-VAE

GRAE≈β=0.02 β=0.06 β=0.1

Samples

β-VAE

GRAE≈β=0.4 β=0.6 β=0.8

Thanks

For more details: https://arxiv.org/abs/2002.00041

Abhishek Kumar, Ben Pooleno... · Abhishek Kumar, Ben Poole Google Research, Brain Team ICML 2020....

Documents

Transcript of Abhishek Kumar, Ben Pooleno... · Abhishek Kumar, Ben Poole Google Research, Brain Team ICML 2020....

BEN ZOHAR CATALOG

Prof. Girish Kumar - nptel.ac.innptel.ac.in/...1-Antenna-Fundamentals-6Sep2016.pdf · Prof. Girish Kumar Electrical Engineering Department, IIT Bombay gkumar@ee.iitb.ac.in (022) 2576

Ben Monder - Echolalia (Oceana)

Nanda Kumar N. Shanmugam, Kejie Chen, and Bobby J. Cherayil · 2015-10-29 · Nanda Kumar N. Shanmugam, Kejie Chen, and Bobby J. Cherayil From the Mucosal Immunology and Biology Research

Y.B.$Nithin$Kumar,$E.$Bonizzoni,$A.$Patra,$F.$Maloberti ...ims.unipv.it/~franco/ConferenceProc/330.pdf · Nithin Kumar Y.B. y, Edoardo Bonizzoni , Amit Patra y, and Franco Maloberti

Prof. Girish Kumar

Ben & Jerry's δελτίο τύπου

Natasha Singh* and Ajay Kumar - Semantic Scholar€¦ · Natasha Singh* and Ajay Kumar Thapar Institute of Engineering and Technology, Patiala 147 003, India ω-automata is a variant

Se Ladkani Ben-Gurion University of the Negev Be’er Sheva ...

Computing Compositional Proofs of Input-to-Output ... · Computing Compositional Proofs of Input-to-Output Stability Using SOS Optimization and -Decidability Abhishek Murthya,, Md.

Wind Power by Lalit Kumar

Ben Fine-Οικονομική Θεωρία Και Ιδεολογία-Κάλβος (1985)

Santosh Kumar Alamsetti and Govindasamy Sekar* · Santosh Kumar Alamsetti and Govindasamy Sekar* Department of Chemistry, Indian Institute of Technology Madras, Chennai, Tamil Nadu-600

Learning to Segment with Diverse Data M. Pawan Kumar Stanford University.

Ethno-Medicinal Practices among the Limbu … Practices among the Limbu Community in Limbuwan, Eastern Nepal Dil Kumar Limbu α & Basanta Kumar Rai σ Abstract - Limbuwan is the land

Validated High-Performance Thin-Layer …...2019/10/20 · Jyotsana Dwivedi, Abhishek Gupta, Shikhar Verma, Monika Dwivedi, Sarvesh Paliwal, and Ajay Kumar Singh Rawat* Key Words:

The Prime Number Theorem Ben Green

TAHAR BEN JELOUN -Ο ρατσισμός όπως τον εξήγησα στην κόρη μου

Résume complet idem ben

Nearly-TightSampleComplexityBoundsforcvliaw/neurips18_poster.pdf · Nearly-TightSampleComplexityBoundsfor LearningMixturesofGaussians Hassan Ashtiani12, Shai Ben-David2, Christopher