Chapter 5. Bayesian Statistics (II) - BrownExample. Suppose a stock’s daily return Y was recorded...

Chapter 5. Bayesian Statistics (II)

Bayesian for multi-parameter models

The principle remains the same. The (joint) posterior distributiongiven data y is once again

p(θ|y) ∝ π(θ) · p(y|θ)where θ = (θ1, . . . , θd) are the parameters of interest.

For illustration, consider the special case of θ = (θ1, θ2).

1. The joint posterior distribution

p(θ1, θ2|y) ∝ π(θ1, θ2) · p(y|θ1, θ2)2. The marginal posterior distribution of θ2

p(θ2|y) =∫

p(θ1, θ2|y) dθ1 ∝∫

π(θ1, θ2) · p(y|θ1, θ2) dθ1

3. The conditional posterior distribution of θ1 given θ2 is

p(θ1|θ2, y) =p(θ1, θ2|y)p(θ2|y)

∝ π(θ1, θ2) · p(y|θ1, θ2)

Note the difference with joint posterior distribution is that hereθ2 is regarded as fixed and known.

Remark: The following relation is useful for the simulation ofposterior distribution

p(θ1, θ2|y) = p(θ1|θ2, y) · p(θ2|y)

Examples

Normal model. Suppose that y = {y1, . . . , yn} are iid samplesfrom N(θ, σ2) such that (θ, log(σ2)) has a flat prior, or

π(θ, σ2) ∝ 1/σ2.

The joint posterior distribution p(θ, σ2|y).

p(θ, σ2|y) ∝(σ2)−1−n2

e− 1

2σ2[n(θ−ȳ)2+(n−1)s2]

where s2 is the sample variance

s2 =1

n − 1

n∑

i=1

(yi − ȳ)2.

The marginal posterior distribution p(σ2|y).

p(σ2|y) =∫

p(θ, σ2|y)dθ

∝∫ (

σ2)−1−n2

e− 1

2σ2[n(θ−ȳ)2+(n−1)s2]

dθ

=(σ2)−1−n2

e−(n−1)s

2

2σ2

√2πσ2/n

∝(σ2)−n+12

e−(n−1)s

2

2σ2

It follows that the posterior distribution of((n − 1)s2

σ2

∣∣∣∣∣ y)

= χ2(n − 1)

The marginal posterior distribution p(θ|y).

p(θ|y) =∫

p(θ, σ2|y)dσ2

∝∫ (

σ2)−1−n2

e− 1

2σ2[n(θ−ȳ)2+(n−1)s2]

dσ2

∝[n(θ − ȳ)2 + (n − 1)s2

]−n2

∝

[1 +

(θ − ȳs/√

n

)2 1n − 1

]−n2

It follows that the posterior distribution of(θ − ȳs/√

n

∣∣∣∣ y)

= t(n − 1)

The conditional posterior distribution p(θ|σ2, y).

p(θ|σ2, y) = N(ȳ, σ2/n)

The conditional posterior distribution p(σ2|θ, y).

((n − 1)s2 + n(ȳ − θ)2

σ2

∣∣∣∣∣ θ, y)

= χ2(n)

Remark: To simulate from the posterior distribution p(θ, σ2|y), one can firstsimulate σ2 from marginal posterior distribution p(σ2|y), then simulate θ fromthe conditional posterior distribution p(θ|σ2, y).

Example. Suppose a stock’s daily return Y was recorded for n =22 consecutive business days, with ȳ = 5% and s = 4%. Assumethat the daily return Y follows N(θ, σ2) with prior π(θ, σ2) ∝1/σ2. Find the 95% posterior interval for θ. Also use simulationto approximate E[θ/σ|y].

Solution: Since (θ − ȳs/√

n

∣∣∣∣ y)

= t(n − 1)

The 95% posterior interval is (in %)

ȳ ± t0.025(n − 1)s√n

= 5 ± 2.080 ∗ 4√21

= [3.2, 6.8]

Below is the histogram of 1000 draws of θ/σ. For each draw, we (1) draw asample of σ: draw a sample say u from χ2(n− 1), then let σ =

√(n − 1)s2/u;

(2) given σ, draw a sample θ from N(ȳ, σ2/n); (3) θ/σ is a data point. Thesample average of θ/σ is 1.23.

0.00.5

1.01.5

2.02.5

3.0

0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4

Multinomial model. Let Y = (Y1, . . . , Yd) be multinomial withparameter (n; θ1, . . . , θd) where

θ1 + · · · + θd = 1.Consider prior distribution (Dirichlet distribution)

π(θ) ∝d∏

i=1

θαi−1i

restricted to non-negative θi’s with θ1 + · · · + θd = 1.

The joint posterior distribution p(θ|y).

p(θ|y) ∝ π(θ) · p(y|θ) ∝d∏

i=1

θαi−1i ·

d∏

i=1

θyii =

d∏

i=1

θαi+yi−1i

That is, p(θ|y) is a Dirichlet distribution with parameter (α1 +y1, . . . , αd + yd).

The marginal posterior distribution p(θ1|y).

p(θ1|y) ∝∫

{∑d

i=2 θi=1−θ1}θα1+y1−11

d∏

i=2

θαi+yi−1i dθ2 · · · dθd−1

It follows that p(θ1|y) is Beta(α1 + y1,∑d

i=2[αi + yi]).

The conditional posterior distribution p(θ1|y).

p(θ2, . . . , θd|θ1, y) ∝ θα1+y1−11

d∏

i=2

θαi+yi−1i

restricted to {θ2 + · · · + θd = 1 − θ1}. It follows that(θ2

1 − θ1, . . . ,

θd1 − θ1

∣∣∣∣ θ1, y)

= Dirichlet(α2 + y2, . . . , αd + yd).

Remark on simulation: One way to simulate (θ1, . . . , θd) from posterior distribu-

tion is to simulate sequentially θ1 from p(θ1|y), and then θ2 from p(θ2|θ1, y), . . . ,and θd−1 from p(θd−1|θ1, . . . , θd−2, y), and finally set θd = 1− (θ1 + · · ·+ θd−1).Note that all these conditional distributions are Beta distributions [up to a

multiplicative constant]. Another way to simulate (θ1, . . . , θd) from poste-

rior Dirichlet distribution is to simulate xi from Gamma(αi + yi,1/2) for each

i = 1, . . . , d and let θi = xi/(x1 + · · · + xd).

Example. In late October 1988, a pre-election poll was conductedby CBS news of 1447 adults in US to find out their preferences inthe upcoming Presidential election. Out of 1447 persons, y1 = 727supported George Bush, y2 = 583 supported Michael Dukakis,and y3 = 137 supported other candidates or expressed no opin-ion. Assume that the samples are randomly selected from thepopulation, then the data follows multinomial distribution withparameters (θ1, θ2, θ3). The quantity of interest is θ1 − θ2.

Solution: Assume a non-informative prior with α1 = α2 = α3 = 1. The pos-terior distribution for (θ1, θ2, θ3) is Dirichlet(728, 584, 138). We will draw 1000samples of (θ1, θ2, θ3) from the posterior Dirichlet distribution, and computeθ1 − θ2 for each sample. We will simulate using two equivalent approaches.

• Using conditional distribution decomposition. Simulate θ1 from Beta(728,584+138). Given θ1, simulate u from Beta(584, 138) and let θ2 = (1− θ1)u.Let θ3 = 1 − θ2 − θ3. Record θ1 − θ2.

• Using Gamma distribution. Simulate independent x1, x2, x3 from, respec-tively, Gamma(728, 1/2)=χ2(728*2), Gamma(584, 1/2)=χ2(584*2), andGamma(138, 1/2)=χ2(138*2). Let θi = xi/(x1 + x2 + x3). Record θ1 − θ2.

The histograms are attached below, the sample means are 0.099 and 0.100respectively. None of the sample points of θ1 − θ2 are below zero.

0.0 0.05 0.10 0.15 0.20

05

1015

Use decomposition

0.0 0.05 0.10 0.15 0.20

05

1015

Use Gamma distribution

Comparison of two populations

Comparison of two proportions. Suppose Y1 has distribution B(n1; θ1),Y2 has distribution B(n2; θ2), and Y1 and Y2 are independent. Weare interested in θ1 − θ2, given the data Y1 = y1 and Y2 = y2.

Assuming a non-informative prior π(θ1, θ2) ∝ 1 on [0, 1]2. Thejoint posterior distribution p(θ1, θ2|y) is

p(θ1, θ2|y) ∝ θy11 (1 − θ1)

n1−y1θy22 (1 − θ2)n2−y2

Thus the posterior distributions of θ1 and θ2 are independent and

p(θ1|y) = Beta(y1 + 1, n1 − y1 + 1)p(θ2|y) = Beta(y2 + 1, n2 − y2 + 1)

One can use simulation to draws samples of θ1− θ2 or use normalapproximations (when n1 and n2 large) of θ1 − θ2.

Comparison of two normal means. Suppose x = (x1, . . . , xn1)are iid samples from N(θ1, σ

2), y = (y1, . . . , yn2) are iid samplesfrom N(θ2, σ

2), and that the two samples are independent. We areinterested in θ1 − θ2. All the parameters (θ1, θ2, σ) are unknown.

Assume a non-informative prior π(θ1, θ2, σ2) ∝ 1/σ2. The poste-

rior is

p(θ1, θ2, σ|x, y) ∝ (σ2)−1−n2e

− 12σ2

[n1(x̄−θ1)2+n2(ȳ−θ2)2+(n−2)s2p]

where

n = n1 + n2, s2p =

(n1 − 1)s2x + (n2 − 1)s2y(n1 − 1) + (n2 − 2)

Analogously, one have the marginal posterior distribution

p(σ2|x, y) ∝(σ2)−n2

e−(n−2)s

2p

2σ2

or ((n − 2)s2p

σ2

∣∣∣∣∣ x, y)

= χ2(n − 2).

The conditional posterior distributions of θ1, θ2 given σ are inde-pendent, and

p(θ1|σ, x, y) = N(x̄, σ2/n1), p(θ2|σ, x, y) = N(ȳ, σ2/n2).

Remark on simulation. To draw samples of (θ1, θ2, σ). One can draw u from

χ2(n − 2) and let σ2 = (n − 2)s2p/u, then draw θ1, θ2 independently fromN(x̄, σ2/n1) and N(ȳ, σ

2/n2) respectively. If one is interested in θ1 − θ2, foreach sample point of (θ1, θ2, σ) compute θ1 − θ2. If one is interested θ1θ2, foreach sample point compute θ1θ2. And so on so forth.

The theoretical posterior distribution of θ1−θ2 can be obtained asfollows. Note that the conditional posterior distribution of θ1−θ2

given σ is

p(θ1 − θ2|σ, x, y) = N(x̄ − ȳ, σ2[1/n1 + 1/n2]).Therefore

p(θ1 − θ2, σ2|x, y) = p(θ1 − θ2|σ2, x, y) · p(σ2|x, y)

∝(σ2)−n+12

e− 1

2σ2[(1/n1+1/n2)−1((θ1−θ2)−(x̄−ȳ))2+(n−2)s2p]

Integrating out σ2, we have similarly((θ1 − θ2) − (x̄ − ȳ)sp ·

√1/n1 + 1/n2

∣∣∣∣∣x, y)

= t(n − 2)

Example. Who is a better hitter, Ted Williams (Boston Red Sox)or Joe DiMaggio (NY Yankees)? Their major league career statis-tics are given below.

Player At-bats Hits Batting Average Home Run Home Run AverageT.W. 7706 2654 .3444 521 .0676J.D. 6821 2214 .3246 361 .0529

Find the posterior probability that Ted Williams is a better hitterthan Joe Dimaggio.

Solution: We consider the hits, and leave the home runs as exercise. Let θ1 bethe hit proportion for T.W. and θ2 for that of J.D. Assume a non-informativeprior π(θ1, θ2) ∝ 1. Then the posterior is

p(θ1, θ2|y) ∝ θ26541 (1 − θ1)5052 · θ22142 (1 − θ2)4607

We are interested in P (θ1 − θ2 > 0|y). We simulate 1000 draws of θ1 − θ2 [wesimulate θ1 and θ2 independently from Beta(2655,5053) and Beta(2215, 4608),respectively, and compute θ1 − θ2 for each (θ1, θ2).]

Below is the histogram of θ1 − θ2. Among 1000 draws, 995 are positive. There-fore the posterior probability P (θ1 − θ2 > 0|y) ≈ 0.995.

−0.

020.

00.

020.

040.

06

01020304050

T.W

. − J

.D.

If we use normal approximation, θ1 − θ2 are approximately distributed asN

(2654

2654 + 5052− 2214

2214 + 4607,

2654 ∗ 5052(2654 + 5052)2(2654 + 5052 + 1)

+2214 ∗ 4607

(2214 + 4607)2(2214 + 4607 + 1)

)= N(0.0198, 0.00782).

Its density is super-imposed on the histogram.

Example. Does birth weight increase when a mother quits smok-ing? Below is a data set.

Smokes Quit4.5 6.1 6.9 7.5 9.9 5.4 7.25.4 6.4 6.9 7.6 6.6 7.35.6 6.6 7.1 7.6 6.8 7.45.9 6.6 7.1 7.8 6.86.0 6.6 7.2 8.0 6.9

Assume the birth weight of a baby whose mother who smokesis N(θ1, σ

2) and the birth weight of a baby whose mother oncesmoked but quit is N(θ2, σ

2). Find the posterior probability ofθ1 − θ2 > 0, and give a 95% posterior interval for θ1 − θ2.Solution: The data n1 = 21, n2 = 8, and (for smoke) x̄ = 6.824, sx = 1.093,

(for quit) ȳ = 6.800, sy = 0.589. The pooled estimate

s2p =(n1 − 1)s2x + (n2 − 1)s2y

n1 + n2 − 2= 0.9749, sp = 0.987

To simulate θ1−θ2, we first draw u from χ2(n−2) and let σ2 = (n−2)s2p/u, andthen simulate θ1 and θ2 independently from N(x̄, σ

2/n1) and N(ȳ, σ2/n2). The

histogram of 1000 draws are below. The 95% posterior interval from simulationis [−0.807, 0.863]. Out of these 1000 draws of θ1 − θ2, 499 are positive. So theposterior probability of θ1 − θ2 > 0 is 0.499.

Note that theoretically((θ1 − θ2) − (x̄ − ȳ)sp√

1/n1 + 1/n2

∣∣∣∣∣x, y)

= t(n − 2).

Therefore the theoretical 95% posterior interval is

(x̄ − ȳ) ± t0.025(n − 2) ∗ sp√

1/n1 + 1/n2 = [−0.818, 0.866]

and

P (θ1 − θ2 > 0|x, y) = P

[t(n − 2) ≥ −

(x̄ − ȳ)sp√

1/n1 + 1/n2

]= 0.523.

−2

−1

01

2

0.00.20.40.60.81.0

Sm

okes

− Q

uit

An example of generalized linear model

It is rare that multiparameter models allow simple calculationof posterior distribution. Simulation is often the only availabletool for data analysis. In this section we discuss in detail a two-parameter generalized linear model for a bioassay experiment.

The problem and the data. In the development of drugs, acutetoxicity test or bioassay are commonly performed on animals. The animalresponses are typically dichotomous: alive or dead, tumor or no tumor, and soon. The experiments are often administered by injecting various dose levels ofthe compound to batches of animals, which generate data of form (xi, ni, yi),where xi is the dose level (often measured in logarithmic scale), ni is the sizeof the batch of animals receiving dose xi, and yi is the number of animals withpositive response. The specific real data set is shown below.

Dose xi (log g/ml) Size of batch ni Number of deaths yi−0.86 5 0−0.30 5 1−0.05 5 30.73 5 5

Statistical model. Assume that yi is Binomial (ni, θi), with θi the populationdeath rate for animals receiving dose xi. We would like θi to be dependenton xi, and by definition θi ∈ [0, 1]. The following logistic regression model isadopted.

logit(θi) = α + βxi

where logit(θ).= log(θ/(1 − θ)). The inverse function of logit(·) is

logit−1(u) = eu/(1 + eu).

Note that in this model xi’s are explanatory variables and regarded as fixed.

Prior and likelihood. We use a flat prior π(α, β) ∝ 1 and the likelihoodp(yi|α, β) ∝

[logit−1(α + βxi)

]yi ·[1 − logit−1(α + βxi)

]ni−yi .

The posterior p(α, β|y). We have

p(α, β|y) ∝ π(α, β) ·4∏

i=1

p(yi|α, β) ∝4∏

i=1

p(yi|α, β)

Discretization of the posterior distribution. There is no analytical expressionto the posterior distribution, and we will use simulation to obtain numericalsummaries. Since the problem is only two dimensional, it is reasonable to expectthat simulating from a discretized approximation of the continuous posteriordistribution will do a good job. We will restrict the region to (α, β) ∈ [−2, 6]×[−5, 30]. The contour plot is shown below.

The discretization is done on a uniform 400× 700 grid. For each grid point, wecompute the unnormalized posterior density. Afterwards we normalize thesequantities such that their sum over all the grid points become one. In otherwords, we now have a discrete approximation of the posterior distribution.

Remark. A very popular methodology to simulate the posterior distribution isthe so-called Markov Chain Monte Carlo (MCMC) method. It is very differentfrom the discretization method we used in this example. When the dimension

gets higher, discretization becomes obviously much more difficult.

alpha

beta

−2 0 2 4 6

010

2030

Figure 1: contour plot for the posterior distribution

Simulating from the discrete approximation of the posterior distribution.

1. Draw α from its discrete marginal distribution p(α|y).2. Given α, draw β from the discrete conditional distribution p(β|α, y).3. Jitter the sample α and β by adding a uniform random perturbation centered

at zero with a width equal to the spacing of the sampling grid.

4. Repeated these three steps 1000 times to obtain 1000 samples of (α, β).

The histogram is attached below

The quantities of interest. The sign of β is important. For all the 1000 sampleswe have β > 0, which indicates the compound is harmful. Another quantity ofinterest in LD50 – the dose level at which the probability of death is 50%, or

α + β · LD50 = logit−1(0.5) = 0 ⇒ LD50 = −α/β.The histogram of LD50 is attached.

alpha

beta

−2 0 2 4 6

010

2030

−0.4 −0.2 0.0 0.2 0.4

01

23

45

LD50

Chapter 5. Bayesian Statistics (II) - BrownExample. Suppose a stock’s daily return Y was recorded...

Documents

Transcript of Chapter 5. Bayesian Statistics (II) - BrownExample. Suppose a stock’s daily return Y was recorded...

Ca and 18O in living planktic foraminifers from the Caribbean ......outline indicate SST (3.5m water depth) recorded during cruise M78/1 with the shipboard thermosalinograph (Schönfeld

esfuerzo y deformacion, fatiga y torsion

Objective - City Tech OpenLab · Web viewto determine their nominal resistance & recorded their values. Afterwards, we measured the resistance of each resistor using a multimeter

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 50, NO. 1 ...rikky/... · Fig. 1. (a) ECoG system concept. (b) Photograph of microfabricated components. transmit the recorded signals wirelessly

aTTîî...2020/10/16 · plat thereof, recorded in Volume 14, Page 31, Plat Records of El Paso County, Texas. 2018DTX0815, Account No(s). G55000000300050, CITY OF EL PASO VS. ELVA

MISTERIOS Y ORÁCULOS: ELEUSIS Y DELFOS

Y x- intercepts x. y x x- intercepts x y x y x y.

1 General remarks 1H-NMR spectra: These were recorded on Bruker DPX-250

Efficient Oxygen Reduction Reaction · were recorded at room temperature by means of a conventional constant-acceleration spectrometer with γ-ray source of mCi57Co in a palladium

Observation of a new electronic state of CO perturbing W Π ... · cal depths. Most spectra were recorded at room temperature (295 K) but some others were taken while cooling with

Status of Standard Model Higgs Searches ATLASpartsem/fy12/2011-12-20-higgs.pdf · 12/20/2011 · Aug. (LP): 2:5 fb 1 Delivered: 5:6 fb 1 Recorded: 5:3 fb 1 90%{96% of recorded data

Supporting Information - The Royal Society of Chemistry · · 2017-09-13Supporting Information ... The Raman spectra were recorded by a Horiba JY H-800 ... 0.45 mL of ethanol and

Límite funcional y continuidadLímite funcional y continuidad

FILLER STRUCTURE ANISOTROPY AND ELECTRICAL …pesxm10.chemeng.upatras.gr/sites/default/files/papers/P01/Osazuw… · recorded immediately once the desired thickness was reached (5

y 0, y xmaterias.fi.uba.ar/6103/contribuciones/edos/TPVI.pdf2 0 2 0 2 0 2 0 y :[a,∞) →R talque y(x)= x2 −(x −y ),siendoa = x −y Finalmente, consideramos resultará también

Figure 2 – Annual burned area recorded in Portugal during the period 1975-2007

Figure 2 – Annual burned area recorded in Portugal during the period 1975- 2007 Results Parameter Period 1 (1987 -1991)Period 2 (1990-1994)Period 3 (2000-2004)

Waves. Wave Terminology H = Height A = Amplitude = 1/2H L = λ = Wave Length ( distance 2 consecutive crests) T = Wave Period (Time between 2 consecutive.

Na Maternal Egg Care in the Bridled Triggerfish103.39.193.214/www/NATURAL/contents/1521167948530/... · , 1 recorded the frequency of egg tending and guarding on a field map. Their

Supporting Information 1212 · chromatography, Wakogel C-300HG (particle size 40–60 µm, silica) was used. 2. Measurements 1H and 13C NMR spectra were recorded on Bruker AVANCE