11.5 MAP Estimator - Binghamton

11.5 MAP EstimatorRecall that the “hit-or-miss” cost function gave the MAP estimator… it maximizes the a posteriori PDF

Q: Given that the MMSE estimator is “the most natural” one…why would we consider the MAP estimator?

A: If x and θ are not jointly Gaussian, the form for MMSE estimate requires integration to find the conditional mean.

MAP avoids this Computational Problem!Note: MAP doesn’t require this integration

Trade “natural criterion” vs. “computational ease”

What else do you gain? More flexibility to choose the prior PDF

Notation and Form for MAP

)|(maxargˆ xθθθ

pMAP =

MAPθNotation: maximizes the posterior PDF

“arg max” extracts the value of θ that causes the maximum

Equivalent Form (via Bayes’ Rule): )]()|([maxargˆ θθθθ

ppMAP x=

Proof: Use )()()|()|(

pppp θθθ =

)]()|([maxarg)(

)()|(maxargˆ θθθθθθθ

ppMAP x

Does not depend on θ

Vector MAP < Not as straight-forward as vector extension for MMSE >

The obvious extension leads to problems:

iθChoose to minimize }}ˆ({)ˆ( iii CE θθθ −=R

Exp. over p(x,θi)

⇒ )|(maxargˆ xii pi

θθθ

= 1-D marginal conditioned on x

Need to integrate to get it!!

Problem: The whole point of MAP was to avoid doing the integration needed in MMSE!!!

Is there a way around this?Can we find an Integration-Free Vector MAP?

pddpp θθθ 21 )|θ()|( ∫∫= xx

Circular Hit-or-Miss Cost Function Not in Book

First look at the p-dimensional cost function for this “troubling”version of a vector map:

It consists of p individual applications of 1-D “Hit-or-Miss”

δ-δ-δ

=square innot ),(,1

square in ),(,0),(

εεεεC

The corners of the square “let too much in” ⇒ use a circle!

This actually seems more natural than the “square” cost function!!!

MAP Estimate using Circular Hit-or-Miss Back to Book

So… what vector Bayesian estimator comes from using this circular hit-or-miss cost function?

Can show that it is the following “Vector MAP”

)|(maxargˆ xθθθ

pMAP = Does Not Require Integration!!!

That is… find the maximum of the joint conditional PDF

in all θi conditioned on x

How Do These Vector MAP Versions CompareIn general: They are NOT the Same!!

Example: p = 2p(θ1, θ2 | x)

1 2 3 4 5

The vector MAP using Circular Hit-or-Miss is: [ ]T5.05.2ˆ =θ

To find the vector MAP using the element-wise maximization:

p(θ1|x)

1 2 3 4 5

p(θ2|x)

2/3[ ]T5.15.2ˆ =θ

“Bayesian MLE”Recall… As we keep getting good data, p(θ|x) becomes more concentrated as a function of θ. But… since:

)]()|([maxarg)|(maxargˆ θθxxθθθθ

pppMAP ==

… p(x|θ) should also become more concentrated as a function of θ.

p(x|θ)p(θ)

• Note that the prior PDF is nearly constant where p(x|θ) is non-zero

• This becomes truer as N →∞, and p(x|θ) gets more concentrated

)|(maxarg)]()|([maxarg θxθθxθθ

ppp ≈

MAP “Bayesian MLE”

Uses conditional PDF rather than the parameterized PDF

11.6 Performance CharacterizationThe performance of Bayesian estimators is characterized by looking at the estimation error: θθε ˆ−=

Random (due to a priori PDF)

Random (due to x)

Performance characterized by error’s PDF p(ε)We’ll focus on Mean and Variance

If ε is Gaussian then these tell the whole storyThis will be the case for the Bayesian Linear Model (see Thm. 10.3)

We’ll also concentrate on the MMSE Estimator

Performance of Scalar MMSE Estimator

θθθ

xThe estimator is:

Function of x

So the estimation error is: ),(}|{ θθθε xx fE =−=Function of two RV’s

General Result for a function of two RVs: Z = f (X, Y)

dydxyxpyxfZE XY ),(),(}{ ∫∫=

{ } dydxyxpZEyxfZEZEZ XY ),(}){),((}){(}var{ 22 ∫∫ −=−=

Evaluated as seen below

So… applying the mean result gives:

{ }{ } 00

}]|{[][

θθε

See Chart on “Decomposing Joint

Expectations” in “Notes on 2 RVs”

Pass Eθ |x through the terms

Two Notations for the same thing

}|{)|(}|{

)|(}|{}]|{[

on dependnot does

θθθθ

∫0}{ =εE

i.e., the Mean of the Estimation Error (over data

& parm) is Zero!!!!

And… applying the variance result gives:

{ } { } { }

),()ˆ(

)ˆ(}){(}var{

θθθθ

θθεεεε

−==−=

∫∫ xxx

Use E{ε} = 0

So… the MMSE estimation error has:

• mean = 0

• var = Bmse

So… when we minimize Bmsewe are minimizing the variance

of the estimate

If ε is Gaussian then ( ))ˆ(,0~ θε BmseN

Ex. 11.6: DC Level in WGN w/ Gaussian PriorWe saw that

22 /1/1)ˆ(

ANABmse

σσ +=

σσσ

constantconstant

So… A is Gaussian

+ 22 /1/1,0~

σσε

Note: As N gets large this PDF collapses around 0.

This estimate is “consistent in the Bayesian sense”

Bayesian Consistency: For large N

(regardless of the realization of A!)

AA ≈ˆ

this is Gaussian because it is a linear combo of the jointly

Gaussian data samples

If X is Gaussian then Y = aX + b

is also Gaussian

Performance of Vector MMSE Estimatorθθε ˆ−=Vector estimation error: The mean result is obvious.

Must extend the variance result:

θθx,ε MεεCε ˆ}{}cov{ ∆=== TE

Some New Notation…“Bayesian Mean Square

Error Matrix”

Look some more at this:

}{ ]][[

}|{}|{

θx,θ

xθθxθθ

xθθxθθM

−−=

0ε =}{E }{ |ˆ xθxθε MC CE==

General Vector Results:

See Chart on “Decomposing

Joint Expectations”

= Cθ|x In general this is a function of x

θM ˆThe Diagonal Elements of are Bmse’s of the Estimates

),(}]|{[

),(}]|{[}{

dθdθpE

∫ ∫

∫ ∫ ∫

θxθxxεε

To see this:

Why do we call the error covariance the “Bayesian MSE Matrix”?

Integrate over all the other

parameters…“marginalizing”

the PDF

Perf. of MMSE Est. for Jointly Gaussian CaseLet the data vector x and the parameter vector θ be jointly Gaussian.

0ε =}{ENothing new to say about the mean result:

Now… look at the Error Covariance (i.e., Bayesian MSq Matrix):

}{ |ˆ xθxθε MC CE==Recall General Result:

Thm 10.2 says that for Jointly Gaussian Vectors we get that…Cθ|x does NOT depend on x

xθxθxθε MC ||ˆ }{ CCE ===

xθxθxθ

xθθε

−−=

Thm 10.2 also gives the form as:

Perf. of MMSE Est. for Bayesian Linear ModelwHθx += ~N(µθ,Cθ)

~N(0,Cw)Recall the model:

0ε =}{ENothing new to say about the mean result:

Now… for the error covariance… this is nothing more than a special case of the jointly Gaussian case we just saw:

xθxθxθ

xθθε

−−=

== CResults for Jointly Gaussian Case

THCC θθx =

wθx CHHCC += T

Evaluations for Bayesian Linear

( )( ) 111

−−−

+−===

HCCHHCHCCMC

θwθθθxθθε

Alternate Form … see (10.33)

Summary of MMSE Est. Error Results1. For all cases: Est. Error is zero mean 0ε =}{E

2. Error Covariance for three “Nested” Cases:

}{ |ˆ xθxθε MC CE==

[ ]iiiBmse θM ˆ)( =θ

General Case:

Jointly Gaussian: xθxθxθxθθε CCCCMC 1|ˆ

−−=== C

Bayesian Linear:Jointly Gaussian

& Linear Observation

( )( ) 111

−−−

+−===

HCCHHCHCCMC

θwθθθxθθε

Main Bayesian Approaches

MAP“Hit-or-Miss” Cost Function

MMSE“Squared” Cost Function

(In General: Nonlinear Estimate)

{ }{ }xθxθ CMxθθ

|ˆ :Cov. Err.

ˆ :EstimateE

Jointly Gaussian x and θ(Yields Linear Estimate)

{ } { }( )xθxxθxθθθ

xxCCθθ1

:Cov. Err.

ˆ :Estimate−

−+= EE

Bayesian Linear Model(Yields Linear Estimate)

{ } ( ) ( )

( ) θθθθθ

θθθ

HCCHHCHCCM

HµxCHHCHCθθ1

:Cov. Err.

ˆ :Estimate−

−++=

)|(maxargˆ :Estimate xθθθ

Hard to Implement…numerical integration

Easy to Implement…Performance Analysis

is Challenging

Easier to Implement…Determining Cθx can be

hard to find

“Easy” to Implement…Only need accurate model: Cθ, Cw, H

11.7 Example: Bayesian DeconvolutionThis example shows the power of Bayesian approaches over classical methods in signal estimation problems (i.e. estimating the signal rather than some parameters)

h(t) Σs(t)

Model as a zero-mean

WSS Gaussian

Process w/ known

ACF Rs(τ)

Assumed Known

Gaussian Bandlimited White Noise w/ Known Variance

Measured Data = Samples of x(t)

Goal: Observe x(t) & Estimate s(t)Note: At Output…

s(t) is Smeared & Noisy

So… model as D-T System

Sampled-Data Formulation

−−−

− ]1[

][]1[]1[

00]0[]1[

000]0[

nNhNhNh

Measured Data Vector x

Known Observation Matrix H

Signal Vector sto Estimate

AWGN: wCw = σ2I

We have modeled s(t) as zero-mean WSS process with known ACF…So… s[n] is a D-T WSS process with known ACF Rs[m]…So… vector s has a known covariance matrix (Toeplitz & Symmetric) given by:

]0[]1[]2[]1[

]2[]0[]1[]2[

]1[]0[]1[

]1[]2[]1[]0[

Model for Prior PDF is then s ~ N(0,Cs)

s and w are independent

MMSE Solution for DeconvolutionWe have the case of the Bayesian Linear Model… so:

( ) xIHHCHCs ss12ˆ −

+= σTT

Note that this is a linear estimateThis matrix is called “The Weiner Filter”

( ) 121ˆ /

−− +== σHHCMC ssεT

The performance of the filter is characterized by:

Sub-Example: No Inverse Filtering, Noise OnlyDirect observation of s with H = I… x = s + w

Σs(t)

x(t) Goal: Observe x(t) & “De-Noise” s(t)Note: At Output… s(t) with Noise

( ) xICCs ss12ˆ −

+= σ ( ) 121ˆ /

−− +== σICMC ssε

Note: Dimensionality Problem… # of “parms” = # of observationsClassical Methods Fail… xs =ˆ Bayesian methods can solve it!!

For insight… consider “single sample” case:

)(]0[]0[1

]0[]0[

]0[]0[s 22 SNRRxxR

0]0[ˆ]0[]0[ˆ ≈≈ sSNRLow

xsSNRHigh

Data Driven Prior PDF Driven

Sub-Sub-Example: Specific Signal ModelDirect observation of s with H = I… x = s + w

But here… the signal follows a specific random signal model

][]1[][ 1 nunsans +−−= u[n] is White Gaussian “Driving Process”

This is a 1st-order “auto-regressive” model: AR(1)Such a random signal has an ACF & PSD of

1][ ku

kR −

σ−+

See Figures 11.9 & 11.10 in the

Textbook

11.5 MAP Estimator - Binghamton

Documents

Transcript of 11.5 MAP Estimator - Binghamton

Spectral deconvolution of unitary invariant modelsjgarzav/Slides_Pierre_Tarrago.pdf · 2020. 10. 8. · Bun, Allez, Bouchaud, Potters (2016) : Rotationally Invariant Estimator of

2018 - data.math.au.dkNo. 05, April 2018. A general central limit theorem and subsampling variance estimator for -mixing point processes ... By considering triangular arrays, Karácsony

PM Compliant to the RoHS directive (2011/65/EU).products.nichicon.co.jp/en/pdf/xja043/e-pm.pdf · · 2015-01-15PM series (Through 100V only) PM PW ... 121 6.3 × 15 8 × 11.5 6.3

Maximum Likelihood EstimationMaximum likelihood estimator (MLE) The maximum likelihood estimator (MLE) is the value b which maximises L( ;x). The MLE also maximises l( ;x) because

Stein's Method Applied to Some Statistical Problems · 2017. 7. 5. · Np( ;I) for p 3 James-Stein shrinkage estimator Stein’s unbiased risk estimator for MSE Stein’s Lemma (1):Covariance

Econometrics - Lecture 6 GMM-Estimator and Econometric Models.

Adaptive lasso, MCP, and SCAD - myweb.uiowa.edu · Adaptive lasso Concave penalties Introduction Although the lasso has many excellent properties, it is a biased estimator and this

Chapter 12 Equilibrium and Elasticity - Binghamton …bingweb.binghamton.edu/~suzuki/GeneralPhysNote_PDF/LN12v...The second condition is the condition for rotational equilibrium. 2

room air conditioners air conditioners light commercial ... Condensate drain mm φ16mm Power supply V 220 ... Drain pump pressure lift cm 11.5

Two statistical inference methods: Confidence interval Estimator +/- Margin of Error Hypothesis testing Hypothesis: H 0 v.s. H a Test statistic.

ΚΑΝΟΝΙΣΜΟΣ ΚΑΤΑΡΤΙΣΗΣ - gsae.edu.gr · Προσαρμογή των λογαριασμών στην απογραφή 11.5 Αποτίμηση των περιουσιακών

Counterfactual Model for Online Systems€¦ · Definition [IPS Utility Estimator]: Given 𝑆= 1, 1,𝛿1,…, 𝑛, 𝑛,𝛿𝑛 collected under 𝜋0, Unbiased estimate of utility

Michael L. Jurewicz, Sr. WFO Binghamton, NY Northeast Regional Operational Workshop November 5, 2009.

kar.kent.ac.uk Final Paper... · Web view= 0.030), pH 7 buffer (de-ionised water) and pH 11.5 buffer (0.013 M KHCO3, 0.020 M KOH; I = 0.046). Each sample was heated in a water bath

H3: Linear models, Best Linear Unbiased Estimator

© Mario Freese Hiroki Sayama - Binghamton Universityharvey.binghamton.edu/~sayama/SSIE641/spring2016/5... · networks using NetworkX ... Network density – Characteristic path length

Bootstrap of residual processes in regression: to smooth ... › pdf › 1712.02685.pdf · estimator, Koul and Lahiri (1994) and Neumeyer (2008, 2009) proposed bootstrap procedures,

High-dimensional covariance matrix estimation with ... · Bias-variance trade-o Frobenius norm: kAk2 F = tr(A 2). Any estimator ^ 2S p ++ of veri es MSE(^) = var(^)+bias(^) where

1m leading wire, standard Φ3.5 three core plug output.img.banggood.com/images/upload/2015/12/SKU154739/SCT-013Userm… · 22±0.5 11.5 ±0.5 3 1 ± 0. 5 Model Input current Output

7.7 11.5 Bulk Spectra day-one physics and plans · Bulk Spectra day-one physics and plans Frank Geurts Rice University 7.7 11.5 ... Chiral Magnetic Effect 50 50 50 50 50 Directed