Download - Section III Gaussian distribution Probability distributions (Binomial, Poisson)

Transcript
Page 1: Section III Gaussian distribution Probability distributions (Binomial, Poisson)

Section IIIGaussian distribution

Probability distributions(Binomial, Poisson)

Page 2: Section III Gaussian distribution Probability distributions (Binomial, Poisson)

Notation

Statistic Sample Population

mean Y μ

Std deviation S or SD σ

proportion P π

mean difference d δ

Correlation coeff r ρ

rate (regression) b β

Num of obs n N

Page 3: Section III Gaussian distribution Probability distributions (Binomial, Poisson)

Densities –PercentilesBMI=22 is the 88th percentile

0%

5%

10%

15%

20%

25%

14 15 16 17 18 19 20 21 22 23 24 25 26 27

x=BMI

88%

Page 4: Section III Gaussian distribution Probability distributions (Binomial, Poisson)

Standard Z scoresDefinition:

Z = (Y – mean)/ SD Y = mean + Z SD

Z is how many SD units Y is above or below mean.

Mean & SD might be sample (Y, S) or population (μ,σ) values if population

values are known.

Page 5: Section III Gaussian distribution Probability distributions (Binomial, Poisson)

Y Y - mean Z= (Y - mean)/SD4 -13.54 -1.166 -11.54 -0.998 -9.54 -0.828 -9.54 -0.8212 -5.54 -0.4714 -3.54 -0.3015 -2.54 -0.2217 -0.54 -0.0519 1.46 0.1322 4.46 0.3824 6.46 0.5534 16.46 1.4145 27.46 2.35

Survival data, mean=17.54, SD=11.68

Page 6: Section III Gaussian distribution Probability distributions (Binomial, Poisson)

Standard Gaussian (Normal)a distribution model

Standard Gaussian, μ=0, σ=1

0.00

0.05

0.10

0.15

0.20

0.25

0.30

0.35

0.40

0.45

-3.50 -3.00 -2.50 -2.00 -1.50 -1.00 -0.50 0.00 0.50 1.00 1.50 2.00 2.50 3.00 3.50

Z

34% 34%

16% 16%

Page 7: Section III Gaussian distribution Probability distributions (Binomial, Poisson)
Page 8: Section III Gaussian distribution Probability distributions (Binomial, Poisson)

Selected Gaussian percentiles Z lower area (P <Z)

-2.00 2.28%-1.96 2.50%-1.50 6.68%-1.00 15.87%

0.00 50.00% 1.00 84.13% 1.50 93.32% 1.96 97.50% 2.00 97.72%

Page 9: Section III Gaussian distribution Probability distributions (Binomial, Poisson)

Gaussian percentiles

0.00

0.05

0.10

0.15

0.20

0.25

0.30

0.35

0.40

0.45

-3.5 -3.0 -2.5 -2.0 -1.5 -1.0 -0.5 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5

Z

area = 0.933=93.3%Z= 1.5

Standard Gaussian, μ=0, σ=1

EXCEL function =NORMSDIST(Z) gives percentile from Z.EXCEL function =NORMSINV(p) gives Z from the percentile

Page 10: Section III Gaussian distribution Probability distributions (Binomial, Poisson)

Example- SAT VerbalMean=μ=500, SD=σ=100

What is your percentile if Y=700? Z= (700-500)/100=2.0, area=0.977=97.7%

What score is the 80th percentile, Z0.80=0.842 Y = 500 + 0.842 (100) = 584

What percent are between 450 and 500?For Y=450, Z=(450-500)/100=-.5, area=0.3085

For Y=500, Z=0, area=0.5000, so area between is 0.500-0.3085=0.1915=19%

Page 11: Section III Gaussian distribution Probability distributions (Binomial, Poisson)

Example- Anesthesia

Effective dose, μ=50 mg/kg, σ=10 mg/kgLethal dose, μ=110 mg/kg, σ=20 mg/kg

Q1= What dose with put 90% to sleep?

Q2- What is the risk of death from this dose?

Page 12: Section III Gaussian distribution Probability distributions (Binomial, Poisson)

Example- Anesthesia

Effective dose, μ=50 mg/kg, σ=10 mg/kgLethal dose, μ=110 mg/kg, σ=20 mg/kg

Q1= What dose with put 90% to sleep? Z0.90=1.28, Y=50+1.28 (10) = 62.8 mg/kg

Q2- What is the risk of death from this dose? Z=(62.8-110)/20= -2.36, area < 1%

Page 13: Section III Gaussian distribution Probability distributions (Binomial, Poisson)

Prediction intervals (not CI)If μ and σ are known and the data is

known to have a Gaussian distribution, the interval formed by

(μ-Zσ, μ+Zσ)is the (2k-100th) prediction interval for

the kth percentile Z (Z>0).Z=2, (μ-2σ, μ+2σ) is (approximately) the

95% prediction intervalImplies SD ≈ range/4 (extremes excluded)

Page 14: Section III Gaussian distribution Probability distributions (Binomial, Poisson)

Normal dist-differences & sums If Y1,Y2 each have independent normal distributions with means and SDs as below variable mean SD Y1 µ1 σ1 Y2 µ2 σ2 Then the difference & sum have normal dists. mean SD . diff=Y1-Y2 µ1-µ2 sqrt(σ1

2 + σ22)

sum=Y1+Y2 µ1+µ2 sqrt(σ12 + σ2

2)

Q: If σ1=σ2,what is mean diff with100% overlap?

Page 15: Section III Gaussian distribution Probability distributions (Binomial, Poisson)

Difference of two normals

Page 16: Section III Gaussian distribution Probability distributions (Binomial, Poisson)

Specificity & SensitivityFor serum Creatinine in normal adults

= 1.1 mg/dl = 0.2 mg/dl In one type of renal disease = 1.7 mg/dl = 0.4 mg/dl

If a cutoff value of 1.6 mg/dl is usedProb false pos= prob Y > 1.6 given normal

Prob false neg = prob Y < 1.6 given disease

Page 17: Section III Gaussian distribution Probability distributions (Binomial, Poisson)

Data transformations & logs Some continuous variables follow the

Gaussian on a transformed scale, not the original scale. Statland implies that perhaps 80% of continuous lab test variables follow a Gaussian on either the original (50%) or a transformed scale, usually the log scale.

(Clinical Decision Levels for lab Tests, 2nd ed, 1987, Med Econ)

Page 18: Section III Gaussian distribution Probability distributions (Binomial, Poisson)

Example-Bilirubin

0 50 100 150 200 250 300 350 0.25 0.5 0.75 1 1.25 1.5 1.75 2 2.25 2.5 2.75 3 3.25

Bilirubin umol/L Log Bilirubin, log10 umol/L

Mean=64.3Median=34.7

SD=104.3n=216

Mean=1.55Median=1.54

SD=0.456n=216

Page 19: Section III Gaussian distribution Probability distributions (Binomial, Poisson)

95% prediction intervals Original scale log 10 scale Mean 64.3 1.55 SD 104.3 0.456 2 SD 208.6 0.912

Lower -144.3 0.64Upper 272.9 2.46 *******************************************Geometric mean=101.55=35.5 mmol/LPrediction interval (100.64,102.46) or (4.3, 290)

Page 20: Section III Gaussian distribution Probability distributions (Binomial, Poisson)

Normal probability plotBilirubin – original scale

Normal plot- Bilirubin

-3.0

-2.0

-1.0

0.0

1.0

2.0

3.0

-1.0 0.0 1.0 2.0 3.0 4.0 5.0

observed Z = (Y-mean)/SD

Z as

sum

ing

Gau

ssia

n

Data is Gaussian if plot is a straight line- above not Gaussian

Page 21: Section III Gaussian distribution Probability distributions (Binomial, Poisson)

Normal probability plotBilirubin- log scale

Normal plot - log Bilirubin

-3.0

-2.0

-1.0

0.0

1.0

2.0

3.0

-3.0 -2.0 -1.0 0.0 1.0 2.0 3.0

observed Z=(Y-mean)/SD

Z as

sum

ing

Gau

ssia

n

Data is Gaussian if plot is a straight line as above

Page 22: Section III Gaussian distribution Probability distributions (Binomial, Poisson)

Log transformation (cont).The distribution of ratios is much closer to

Gaussian on the log scale The “inverse” of 3/1 is 1/3. This is

symmetric only on the log scale Original: 100/1, 10/1, 1/1, 1/10, 1/100 Log: 2, 1, 0, -1, -2 true for OR, RR and HR Measures of growth & proliferation have

distribution closer to the Gaussian on the log scale

Page 23: Section III Gaussian distribution Probability distributions (Binomial, Poisson)

Data distributions that tend to be Gaussian on the log scale

Growth measures - bacterial CFUAb or Ag titers (IgA, IgG, …)pHNeurological stimuli (dB, Snellen units)Steroids, hormones (Estrogen, Testosterone)Cytokines (IL-1, MCP-1, …)Liver function (Bilirubin, Creatinine)Hospital Length of stay (can be Poisson)

Page 24: Section III Gaussian distribution Probability distributions (Binomial, Poisson)

Quick Probability TheoryMutually exclusive events: levels of one

variable Blood type probability A 30% B 12% AB 8% O 50%Probability A or O = 30% + 50%=80%.Mutually exclusive probabilities add. All

(exhaustive) categories sum to 100%

Page 25: Section III Gaussian distribution Probability distributions (Binomial, Poisson)

Probability-Independent eventsThe probabilities of two independent events

multiply. (two or more variables)If 5% of pregnant women have gestational

diabetesIf 8% of pregnant women have pre-eclampsia Probability of gest. diabetes and pre-

eclampsia = 5% x 8% = 0.4% if independent.

Page 26: Section III Gaussian distribution Probability distributions (Binomial, Poisson)

Conditional probabilityProbability of an event changes if made conditional on another event. Probability

(prevalence) of TB is 0.1% in general population.

In Vietnamese immigrants, TB probability is 4%.

Conditional on being a Vietnamese immigrant, probability is 4%.

Page 27: Section III Gaussian distribution Probability distributions (Binomial, Poisson)

Conditional Probability & Bayes

A=Vietnamese n=5000

B=TB+ n=1000A∩B

N=200

n=1,000,000

Want prob TB|Vietnamese but can’t check all Vietnamese for TB

Page 28: Section III Gaussian distribution Probability distributions (Binomial, Poisson)

Conditonal Prob & Bayes RuleWhat is TB prevalence in Orange Co

Vietnamese population? Too hard to take census of all Vietnamese.Assume we know: P(A)=prop in Orange Co who are Viet=0.5% P(B)=prop in Orange Co who have TB = 0.1% P(A|B)=prop of those with TB who are Viet=20% Want P(B|A) = P(A|B) P(B)/ P(A) = (0.2 x 0.001)/(0.005) = 0.04=4%

Page 29: Section III Gaussian distribution Probability distributions (Binomial, Poisson)

Bayes rule for conditional probability (formula)Probability of B given A = P(B|A)=

Joint probability of A and B/Probability of A= P(A ∩ B)/P(A) =

Probability of A given B x Probability of BProbability of A

Bayes rule: P(B|A)=[ P(A|B)P(B)] / P(A)

If A and B are independent, P(B|A)=P(B) Also P(B) = ∑ P(B|Ai) (sum over all Ai)

Page 30: Section III Gaussian distribution Probability distributions (Binomial, Poisson)

Example: Bayes ruleA=Vietnamese, B=TB+In pop of 1,000,000,

5000 (0.5%=0.005) are Vietnamese=P(A), 1000 (0.1%=0.001) have TB+ =P(B).

Of 1000 with TB+, 200 (20%=0.20) are Vietnamese=P(A|B)

Want prob. of TB given Vietnamese? =P(B|A).P(B|A)= 0.20 (0.001)/0.005 = 0.04=4%.

=200/5000Can’t test all Viet for TB+, can check all TB+ for Viet

Page 31: Section III Gaussian distribution Probability distributions (Binomial, Poisson)

Bayes rule (graph)

1,000,000 pop

1000 TB+

200 Viet + TB+

5000 Viet

Conditional probability of TB+ given Vietnamese = 200/5000=4% B|A

B

A ∩ B

A

Check all TB+ for Viet rather than check all Viet for TB

Page 32: Section III Gaussian distribution Probability distributions (Binomial, Poisson)

Bayesian vs Frequentist Bayesian computes Prob(hypothesis|data) = Prob(data|hypothesis) P(hypothesis) Prob(data) = Data Likelihood x prior probability If data (evidence) refutes a hypothesis Prob(data | hypothesis)=0 so Prob(hypothesis | data)=0 Frequentist computes Prob(data*|hypothesis)= p value * p value is prob of observed data or more extreme data

Page 33: Section III Gaussian distribution Probability distributions (Binomial, Poisson)

Binomial distribution

0.000.100.200.300.400.500.600.70

0 1

Population: Positive= π = 0.30, negative = 1- π = 0.70 Y= number of positive responses out of n trials

n=1Y probability0 0.7001 0.300

n=2Y probability0 0.49=0.7 x 0.7 1 0.42= 0.7 x 0.3 x 22 0.09= 0.3 x 0.3

0.000.100.200.300.400.50

0 1 2

Page 34: Section III Gaussian distribution Probability distributions (Binomial, Poisson)

Binomial (cont.)

0.000.100.200.300.400.50

0 1 2 3

n=3Y probability0 0.3431 0.4412 0.1893 0.027

n=4Y probability0 0.24011 0.41162 0.26463 0.07564 0.0810

0.00

0.100.20

0.300.400.50

0 1 2 3 4

Page 35: Section III Gaussian distribution Probability distributions (Binomial, Poisson)

General binomial formulaProbability of y positive out of n

where π is prob of a single positive = n!/[y!(n-y)!] πy (1-π)(n-y)

Mean=πn, SD=√nπ(1-π)Ex:Prob of y=5 herpes cases out n=50 teens if herpes incidence=π=4%=0.04Prob=50!/(5! 45!)(0.04)5(0.96)45=3.4%

Can compute using “=Binomdist(y,n,π,0)” in EXCELFor example, =BINOMDIST(5,50,0.04,0) is 0.034

Page 36: Section III Gaussian distribution Probability distributions (Binomial, Poisson)

Binomial-fair coin examplefor π=0.5, easy to compute y=number of “heads” (success) out of n prob y out of n = n!/[y!(n-y)!] / 2n

Ex: n=3, flip 3 fair coins, 23=8 possibilities 0+0+0=0=y y freq prob 0+0+1=1=y 0 1 1/8 0+1+0=1=y 1 3 3/8 1+0+0=1=y 2 3 3/8 0+1+1=2=y 3 1 1/8 1+0+1=2=y total 8 8/8 1+1+0=2=y 1+1+1=3=y

Page 37: Section III Gaussian distribution Probability distributions (Binomial, Poisson)

Pascal’s triangle n y: 0 to n “success” 2n

- 1 1 1 1 2 2 1 2 1 4 3 1 3 3 1 8 4 1 4 6 4 1 16 5 1 5 10 10 5 1 32

For n=5, prob(y=2) is 10/32 prob(y≤2) is (1+5+10)/32=16/32

Page 38: Section III Gaussian distribution Probability distributions (Binomial, Poisson)

Headache remedy success

The “old” headache remedy was successful π=50% of the time, a true “population” value well established after years of study.

A “new” remedy is tried in 10 persons and is successful in 7 of the 10 (70%).

Is this enough evidence to “prove” that the new remedy is better?

Page 39: Section III Gaussian distribution Probability distributions (Binomial, Poisson)

Hypothesis testing-BinomialHow likely is y=7 success out of n=10 if

π=0.5, prob = 10!/(7!3!) / 210 = 120/1024=0.1172How likely y=7 or more (p value)? y probability 7 120/1024 = 0.1172 8 45/1024 = 0.0439 9 10/1024 = 0.0098 10 1/1024 = 0.0010 total 176/1024= 0.1719 <- p value

Page 40: Section III Gaussian distribution Probability distributions (Binomial, Poisson)

How likely is observing y=70 success out of n=100 if π=0.5 for each trial?

Prob(y=70)=[100!/(70! 30!)] / 2100 = 2.32 x 10-5How likely is it to observe 70 or more successes

out of 100? pr(y=70) + pr(y=71) + …+pr(y=100) = 3.93 x 10-5This is a simple example of hypothesis

testing. The probability of observing y=70 or more successes out of n=100 under the “null hypothesis” that the true population π=0.5 is called a one sided p value.

Page 41: Section III Gaussian distribution Probability distributions (Binomial, Poisson)

0.00

0.05

0.10

0.15

0.20

0.25

0.30

0 1 2 3 4 5 6 7 8 9 10

rel f

req

num of success = y

num success out of n=10, π=0.5

Page 42: Section III Gaussian distribution Probability distributions (Binomial, Poisson)

Gaussian approximation to Binomialok for large n, π not near 0 or 1

Binomial dist

0.000.020.040.060.080.100.120.140.160.18

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18

π =0.15, n=50, mean=0.15(50)=7.5, SD=√50(0.15)(0.85)=2.52

Actual 2.5th percentile is between 2 & 3, Gaussian 7.5-2(2.5)=2.5Actual 97.5th percentile is between 12 and 13, Gaussian=7.5 +2(2.5)=12.5

Page 43: Section III Gaussian distribution Probability distributions (Binomial, Poisson)

Poisson distribution for count dataFor a patient, y is a positive integer: 0,1,2,3,

…Probability of “y” responses (or events)

given mean μ= (μy e-μ)/ (y!)

(Note: μ0=1 by definition)

For Poisson, if mean=μ then SD=√μExamples: Number of colds in a season,

num neurons fired in 30 sec (firing rate)

Page 44: Section III Gaussian distribution Probability distributions (Binomial, Poisson)

Poisson example Q: If average num colds in a single winter is

μ=1.9, what is the probability that a given patient will have 4 colds in one winter?

A: (1.9)4e-1.9/4x3x2x1 = 0.0812 ≈ 8%.

What is the probability of 4 or more (find for 0-3, subtract from 1), prob=12%

Can compute in EXCEL with “=POISSON(y,mean,0)”.=POISSON(4, 1.9, 0) gives 0.0812. =POISSON(4, 1.9, 1) gives cumulative probability of 4 or

less (4,3,2,1,0) which is 0.9559.

Page 45: Section III Gaussian distribution Probability distributions (Binomial, Poisson)

Poisson distribution

Poisson distribution, mean=1.9, SD=1.38

0.000.050.100.150.200.250.30

0 1 2 3 4 5 6 7 8

num colds

prob

abili

ty

Page 46: Section III Gaussian distribution Probability distributions (Binomial, Poisson)

Poisson processMean rate of events is h events/unit=h

(Hazard rate). In T units, we expect μ=hT events on average. Can substitute this average (μ) into

(μy e-μ)/ (y!) to get probability of “y” events in T

units.

Page 47: Section III Gaussian distribution Probability distributions (Binomial, Poisson)

Poisson process exampleExample: Cancer clustersQ: Given a cancer rate of h=3/1000 person-years, what is

the expected number of cases in 2 years in a population of 1500?

A: Rate in 2 years is 2 x (3/1000) =h= 6/1000. Expected is μ=hT= 6/1000 x 1500 = 9 cases.

Q: What is the probability of observing exactly 15 cases?A: μ=9, Probability =(915 e-9)/15! = 0.019431≈ 2%.

Q: What is the probability of observing 15 or more cases in 1500 persons?

A: Plug in 0,1,2, …14 and add to get Q= probability of 14 or less. Probability is 1-Q = 1-0.958534 = 0.041466 ≈ 4%.

Can compute with “=Poisson(y,μ,0)” in EXCEL for probability of y events with mean μ. =Poisson(y,μ,1) gives cumulative probability of y or less.

Page 48: Section III Gaussian distribution Probability distributions (Binomial, Poisson)

Summary: Descriptive stats for Normal, Binomial & Poisson

n = sample size   Distribution mean variance SD SE Normal µ σ2 σ σ/√n Binomial π π(1-π) √π(1-π) √π(1-π)/n Poisson µ µ √µ √µ/n

SD = √variance, SE= SD/√n