Deep generative learning_icml_part2

download Deep generative learning_icml_part2

of 68

  • date post

    10-May-2015
  • Category

    Science

  • view

    680
  • download

    0

Embed Size (px)

description

Workshop Deep generative Learning at ICML 2014 part 2

Transcript of Deep generative learning_icml_part2

  • 1.Stochastic Gradient Fisher Scoring Ahn, Korattikara, Welling 2012 Large Gradient SmallGradient Mixing Issues Bernstein-von Mises theorem 0 - True parameter IN - Fisher Information at 0 ( a.k.a Bayesian CLT) 1vrijdag 4 juli 14

2. SGFS Stochastic Gradient Langevin Samples from the correct posterior, , at low 2 2vrijdag 4 juli 14 3. SGFS Stochastic Gradient Langevin LowBias High Samples from the correct posterior, , at low 2 2vrijdag 4 juli 14 4. SGFS Stochastic Gradient Langevin Markov Chain for Approximate LowBias High Samples from the correct posterior, , at low Samples from approximate posterior, , at any 2 2vrijdag 4 juli 14 5. SGFS Stochastic Gradient Langevin Markov Chain for Approximate LowBias High Samples from the correct posterior, , at low Samples from approximate posterior, , at any Low HighBias 2 2vrijdag 4 juli 14 6. SGFS Small Large Bias Variance 3 3vrijdag 4 juli 14 7. SGFS Small Large Bias Variance (term compensates for subsampling noise) 3 3vrijdag 4 juli 14 8. The SGFS Knob Burn-in using Sampli ng Sampli ng Decrease over time Exact Sampli 4 Low Variance ( Fast ) High Variance ( Slow ) High Bias Low Bias xx x x x x x xx x x x x x x x xx x x x x x x x x x x x x x x x x 4vrijdag 4 juli 14 9. Demo SGFS = 2 5 5vrijdag 4 juli 14 10. Demo SGFS = 2 5 5vrijdag 4 juli 14 11. Demo SGFS = 0.4 6 6vrijdag 4 juli 14 12. Demo SGFS = 0.4 6 6vrijdag 4 juli 14 13. Stochastic Gradient Riemannian Langevin Dynamics (SGRLD) - Patterson & Teh, 2013 Euclidean space of parameters = (, ) of a normal distribution 7vrijdag 4 juli 14 14. Stochastic Gradient Riemannian Langevin Dynamics (SGRLD) - Patterson & Teh, 2013 Euclidean space of parameters = (, ) of a normal distribution Euclidean distance b/w parameters is 1, but densities p(x|) are very different 7vrijdag 4 juli 14 15. Stochastic Gradient Riemannian Langevin Dynamics (SGRLD) - Patterson & Teh, 2013 Euclidean space of parameters = (, ) of a normal distribution Euclidean distance b/w parameters is 1, but densities p(x|) are very different Euclidean distance b/w parameters is 10, but densities p(x|) are almost identical 7vrijdag 4 juli 14 16. Stochastic Gradient Riemannian Langevin Dynamics (SGRLD) - Patterson & Teh, 2013 Euclidean space of parameters = (, ) of a normal distribution Euclidean distance b/w parameters is 1, but densities p(x|) are very different Euclidean distance b/w parameters is 10, but densities p(x|) are almost identical where G() is positive semi-definite 7vrijdag 4 juli 14 17. Stochastic Gradient Riemannian Langevin Dynamics (SGRLD) - Patterson & Teh, 2013 Euclidean space of parameters = (, ) of a normal distribution Euclidean distance b/w parameters is 1, but densities p(x|) are very different Euclidean distance b/w parameters is 10, but densities p(x|) are almost identical where G() is positive semi-definite 7vrijdag 4 juli 14 18. Stochastic Gradient Riemannian Langevin Dynamics (SGRLD) - Patterson & Teh, 2013 Euclidean space of parameters = (, ) of a normal distribution Euclidean distance b/w parameters is 1, but densities p(x|) are very different Euclidean distance b/w parameters is 10, but densities p(x|) are almost identical where G() is positive semi-definite 7vrijdag 4 juli 14 19. Stochastic Gradient Riemannian Langevin Dynamics (SGRLD) - Patterson & Teh, 2013 Euclidean space of parameters = (, ) of a normal distribution Euclidean distance b/w parameters is 1, but densities p(x|) are very different Euclidean distance b/w parameters is 10, but densities p(x|) are almost identical where G() is positive semi-definite Natural Gradient change in curvaturealign noise 7vrijdag 4 juli 14 20. Stochastic Gradient Hamiltonian Monte Carlo T. Chen, E. B. Fox, C. Guestrin (2014) 8vrijdag 4 juli 14 21. An (over-) simplified explanation of Hamiltonian Monte Carlo (HMC) Stochastic Gradient Hamiltonian Monte Carlo T. Chen, E. B. Fox, C. Guestrin (2014) one informative gradient step of size + one random step of size = Random walk type movement and bad mixing Langevin Update 8vrijdag 4 juli 14 22. An (over-) simplified explanation of Hamiltonian Monte Carlo (HMC) Stochastic Gradient Hamiltonian Monte Carlo T. Chen, E. B. Fox, C. Guestrin (2014) one informative gradient step of size + one random step of size = Random walk type movement and bad mixing Langevin Update HMC allows multiple gradient steps per noise step HMC can make distant proposals with high acceptance probability 8vrijdag 4 juli 14 23. Stochastic Gradient Hamiltonian Monte Carlo (SGHMC) An (over-) simplified explanation of Hamiltonian Monte Carlo (HMC) Stochastic Gradient Hamiltonian Monte Carlo T. Chen, E. B. Fox, C. Guestrin (2014) one informative gradient step of size + one random step of size = Random walk type movement and bad mixing Langevin Update Naively using stochastic gradients in HMC does not work well Authors use a correction term to cancel the effect of noise in gradients HMC allows multiple gradient steps per noise step HMC can make distant proposals with high acceptance probability 8vrijdag 4 juli 14 24. Stochastic Gradient Hamiltonian Monte Carlo (SGHMC) An (over-) simplified explanation of Hamiltonian Monte Carlo (HMC) Stochastic Gradient Hamiltonian Monte Carlo T. Chen, E. B. Fox, C. Guestrin (2014) one informative gradient step of size + one random step of size = Random walk type movement and bad mixing Langevin Update Naively using stochastic gradients in HMC does not work well Authors use a correction term to cancel the effect of noise in gradients HMC allows multiple gradient steps per noise step HMC can make distant proposals with high acceptance probability Talk tomorrow afternoon In Track C (Monte Carlo) 8vrijdag 4 juli 14 25. Distributed SGLD Ahn, Shahbaba, Welling (2014) N N N Total N Data points 9vrijdag 4 juli 14 26. Distributed SGLD Ahn, Shahbaba, Welling (2014) N N N Total N Data points 9vrijdag 4 juli 14 27. Distributed SGLD Ahn, Shahbaba, Welling (2014) N N N Total N Data points 9vrijdag 4 juli 14 28. Distributed SGLD Ahn, Shahbaba, Welling (2014) N N N Total N Data points 9vrijdag 4 juli 14 29. Distributed SGLD Ahn, Shahbaba, Welling (2014) N N N Total N Data points 9vrijdag 4 juli 14 30. Distributed SGLD Ahn, Shahbaba, Welling (2014) N N N Total N Data points 9vrijdag 4 juli 14 31. Distributed SGLD Ahn, Shahbaba, Welling (2014) N N N Total N Data points 9vrijdag 4 juli 14 32. Distributed SGLD Ahn, Shahbaba, Welling (2014) N N N Total N Data points Adaptive Load Balancing: Longer trajectories from faster machines 9vrijdag 4 juli 14 33. D-SGLD Results Wikipedia dataset: 4.6M articles,811M tokens,vocabulary size: 7702 PubMed dataset: 8.2M articles,730M tokens,vocabulary size: 39987 Model: Latent Dirichlet Allocation 10 10vrijdag 4 juli 14 34. D-SGLD Results Wikipedia dataset: 4.6M articles,811M tokens,vocabulary size: 7702 PubMed dataset: 8.2M articles,730M tokens,vocabulary size: 39987 Model: Latent Dirichlet Allocation 10 Talk tomorrow afternoon In Track C (Monte Carlo) 10vrijdag 4 juli 14 35. A Recap UseanecientproposalsothattheMetropolis-Has3ngstestcanbeavoidedUseanecientproposalsothattheMetropolis-Has3ngstestcanbeavoided SGLD LangevinDynamicswithstochas3cgradients SGFS Precondi3oningmatrixbasedonFisherinforma3onatmode SGRLD Posi3onspecicprecondi3oningmatrixbasedonReimanniangeometry SGHMC Avoidsrandomwalksbytakingmul3plegradientsteps DSGLD Distributedversionofabovealgorithms 11vrijdag 4 juli 14 36. A Recap UseanecientproposalsothattheMetropolis-Has3ngstestcanbeavoidedUseanecientproposalsothattheMetropolis-Has3ngstestcanbeavoided SGLD LangevinDynamicswithstochas3cgradients SGFS Precondi3oningmatrixbasedonFisherinforma3onatmode SGRLD Posi3onspecicprecondi3oningmatrixbasedonReimanniangeometry SGHMC Avoidsrandomwalksbytakingmul3plegradientsteps DSGLD Distributedversionofabovealgorithms ApproximatetheMetropolis-Has3ngsTestusinglessdataApproximatetheMetropolis-Has3ngsTestusinglessdata 11vrijdag 4 juli 14 37. Why approximate the MH test? (if gradient based methods seem to work so well) Gradient based proposals are not always available Parameter spaces of different dimensionality Distributions on constrained manifolds Discrete variables High gradients may catapult the sampler to low density regions 12vrijdag 4 juli 14 38. Metropolis-Hastings 1 2 3 13vrijdag 4 juli 14 39. Metropolis-Hastings 1 2 3 13vrijdag 4 juli 14 40. Metropolis-Hastings 14vrijdag 4 juli 14 41. Metropolis-Hastings Does not depend on the data (x) 14vrijdag 4 juli 14 42. Approximate Metropolis-Hastings 15vrijdag 4 juli 14 43. Approximate Metropolis-Hastings 15vrijdag 4 juli 14 44. Approximate Metropolis-Hastings 15vrijdag 4 juli 14 45. Approximate Metropolis-Hastings Collect more data 15vrijdag 4 juli 14 46. Approximate Metropolis-Hastings How do we choose + and -? Collect more data 15vrijdag 4 juli 14 47. Approach 1: Using Confidence Intervals Korattikara, Chen, Welling (2014) Collect more data 16vrijdag 4 juli 14 48. Approach 1: Using Confidence Intervals Korattikara, Chen, Welling (2014) Collect more data 16vrijdag 4 juli 14 49. Approach 1: Using Confidence Intervals Korattikara, Chen, Welling (2014) Collect more data 16vrijdag 4 juli 14 50. Approach 1: Using Confidence Intervals Korattikara, Chen, Welling (2014) Collect more data 16vrijdag 4 juli 14 51. Approach 1: Using Confidence Intervals Korattikara, Chen, Welling (2014) Collect more data (c is chosen as in a t-test for = 0 vs 0 ) 16vrijdag 4 juli 14 52. Approach 1: Using Confidence Intervals Korattikara, Chen, Welling (2014) Collect more data (c is chosen as in a t-test for = 0 vs 0 ) Talk tomorrow afternoon In Track C (Monte Carlo) 16vrijdag 4 juli 14 53. Approach 1: Using Confidence Intervals Korattikara, Chen, Welling (2014) Collect more data (c is chosen as in a t-test for = 0 vs 0 ) Talk tomorrow afternoon In Track C (Monte Carlo) Singh, Wick, McCallum (2012) inference in lar