Bayesian learning finalized (with high probability)

Bayesian learning finalized (with high

probability)

Everything’s random...•Basic Bayesian viewpoint:

•Treat (almost) everything as a random variable

•Data/independent var: X vector

•Class/dependent var: Y

•Parameters: Θ

•E.g., mean, variance, correlations, multinomial params, etc.

•Use Bayes’ Rule to assess probabilities of classes

•Allows us to say: “It is is very unlikely that the mean height is 2 light years”

Uncertainty over params•Maximum likelihood treats parameters as

(unknown) constants

• Job is just to pick the constants so as to maximize data likelihood

•Fullblown Bayesian modeling treats params as random variables

•PDF over parameter variables tells us how certain/uncertain we are about the location of that parameter

•Also allows us to express prior beliefs (probabilities) about params

Example: Coin flipping•Have a “weighted” coin -- want to figure

out θ=Pr[heads]

•Maximum likelihood:

•Flip coin a bunch of times, measure #heads; #tails

•Use estimator to return a single value for θ

•This is called a point estimate

Example: Coin flipping•Have a “weighted” coin -- want to figure out

θ=Pr[heads]

•Bayesian posterior estimation:

•Start w/ distribution over what θmight be

•Flip coin a bunch of times, measure #heads; #tails

•Update distribution, but never reduce to a single number

•Always keep around Pr[θ | data]: posterior estimate

Example: Coin flipping

0 flips total

1 flip total

5 flips total

10 flips total

20 flips total

50 flips total

100 flips total

How does it work?•Think of parameters as just another kind

of random variable

•Now your data distribution is

•This is the generative distribution

•A.k.a. observation distribution, sensor model, etc.

•What we want is some model of parameter as a function of the data

•Get there with Bayes’ rule:

What does that mean?•Let’s look at the parts:

•Generative distribution

•Describes how data is generated by the underlying process

•Usually easy to write down (well, easier than the other parts, anyway)

•Same old PDF/PMF we’ve been working with

•Can be used to “generate” new samples of data that “look like” your training data

What does that mean?•The parameter prior or a priori distribution:

•Allows you to say “this value of is more likely than that one is...”

•Allows you to express beliefs/assumptions/ preferences about the parameters of the system

•Also takes over when the data is sparse (small N)

• In the limit of large data, prior should “wash out”, letting the data dominate the estimate of the parameter

•Can let be “uniform” (a.k.a., “uninformative”) to minimize its impact

What does that mean?•The data prior:

•Expresses the probability of seeing data set X independent of any particular model

•Huh?

What does that mean?•The data prior:

•Expresses the probability of seeing data set X independent of any particular model

•Can get it from the joint data/parameter model:

• In practice, often don’t need it explicitly (why?)

What does that mean?•Finally, the posterior (or a posteriori)

distribution:

•Lit., “from what comes after” (Latin)

•Essentially, “What we believe about the parameter after we look at the data”

•As compared to the “prior” or “a priori” (lit., “from what is before”) parameter distribution,

Example: coin flipping

•A (biased) coin lands heads-up w/ prob p and tails-up w/ prob 1-p

•Parameter of the system is p

•Goal is to find Pr[p | sequence of coin flips]

•(Technically, we want a PDF, f(p | flips))

•Q: what family of PDFs is appropriate?

Example: coin flipping•We need a PDF that generates possible values of

•p [0,1]∈

•Commonly used distribution is beta distribution:

Normalization Normalization constant: constant:

“Beta “Beta function”function”

Pr[headsPr[heads]]

Pr[tails]Pr[tails]

The Beta Distribution

Image courtesey of Wikimedia commons

Generative distribution• f(p|α,β) is the prior distribution for p

•Parameters α and β are hyperparameters

•Govern shape of f()

•Still need the generative distribution: Pr[h,t|p]

•h,t: number of heads, tails

•Use a binomial distribution:

Posterior•Now, by Bayes’ rule:

Exercise•Suppose you want to estimate the average air

speed of an unladen (African) swallow

•Let’s say that airspeeds of individual swallows, x, are Gaussianly distributed with mean and variance 1:

•Let’s say, also, that we think the mean is “around” 50 kph, but we’re not sure exactly what it is. But our uncertainty (variance) is 10 kph.

•Derive the posterior estimate of the mean airspeed.

Bayesian learning finalized (with high probability)

Documents

Transcript of Bayesian learning finalized (with high probability)

Bayesian and frequentist inference

ABC: Bayesian Computation Without Likelihoods

A New Scoring Function for Bayesian Network Structure ......2. Bayesian Networks (BNs) and Structure Learning P(X i |Pa(X i)) P(R)! Rain ! T! F! 0.3! 0.7! P(G|S,R)! Grass Wet! S! R!

Bayesian Methods in Positioning Applications

Basic probability probability space event spacegarrett/crypto/Overheads/03_prob.pdf · Basic probability A probability space or event space is a set Ω together with a probability

PROBABILITY REVIEW(?) · Bayesian vs Frequentist • Toss a coin • Frequentist • P(head) = θ, θ = #heads/#tosses • Bayesian • P(head) = θ, θ~U(0.6,1.0) • Parameters

PROBABILITY DISTRIBUTIONS

Non-parametric Bayesian Methods - Cambridge Machine Learning …mlg.eng.cam.ac.uk/tutorials/07/zg.pdf · 2007-07-02 · Non-parametric Bayesian Models •Bayesian methods are most

Nonparametric Bayesian Methods 1 What is …larry/=sml/nonparbayes.pdfNonparametric Bayesian Methods 1 What is Nonparametric Bayes? In parametric Bayesian inference we have a model

Probability - pages.pomona.edu

Machine Learning - Introduction to Bayesian Classification

Statistical Learning: Bayesian and ML COMP155 Sections 20.1-20.2 May 2, 2007.

Bayesian Biostatistics Using BUGSjbn/courses/bugs2/... · Bayesian Biostatistics Using BUGS (3) 3.13 Bayesian Biostatistics Using BUGS 5.4. ΠΑΡΑ∆ΕΙΓΜΑΤΑ ΣΤΟ BUGS Department

Faidon Panagiotopoulos - Bayesian Network collision

Lecture 17 – Part 1 Bayesian Econometrics 1 Lecture 17 – Part 1 Bayesian Econometrics Bayesian Econometrics: Introduction • Idea: We are not estimating a parameter value, ...

Bayesian Inference for Normal Mean - University of Torontonosedal/sta313/sta313-normal-mean.pdf · Bayesian Inference for Normal Mean. ... (1 ) 100% Bayesian ... where the z-value

continuity equation for probability densitycontinuity equation for probability density continuity equation for probability density probability-density current time-dependent Schrödinger

LECTURE 05: BAYESIAN ESTIMATION

Stochastic Volatility Models: Bayesian Framework

Probability in Machine Learning