Section III Gaussian distribution Probability distributions (Binomial, Poisson)

48
Section III Gaussian distribution Probability distributions (Binomial, Poisson)

description

Section III Gaussian distribution Probability distributions (Binomial, Poisson). Notation Statistic Sample Population mean Y μ Std deviation S or SD σ proportion P π - PowerPoint PPT Presentation

Transcript of Section III Gaussian distribution Probability distributions (Binomial, Poisson)

Page 1: Section III Gaussian distribution Probability distributions (Binomial, Poisson)

Section IIIGaussian distribution

Probability distributions(Binomial, Poisson)

Page 2: Section III Gaussian distribution Probability distributions (Binomial, Poisson)

Notation

Statistic Sample Population

mean Y μ

Std deviation S or SD σ

proportion P π

mean difference d δ

Correlation coeff r ρ

rate (regression) b β

Num of obs n N

Page 3: Section III Gaussian distribution Probability distributions (Binomial, Poisson)

Densities –PercentilesBMI=22 is the 88th percentile

0%

5%

10%

15%

20%

25%

14 15 16 17 18 19 20 21 22 23 24 25 26 27

x=BMI

88%

Page 4: Section III Gaussian distribution Probability distributions (Binomial, Poisson)

Standard Z scoresDefinition:

Z = (Y – mean)/ SD Y = mean + Z SD

Z is how many SD units Y is above or below mean.

Mean & SD might be sample (Y, S) or population (μ,σ) values if population

values are known.

Page 5: Section III Gaussian distribution Probability distributions (Binomial, Poisson)

Y Y - mean Z= (Y - mean)/SD4 -13.54 -1.166 -11.54 -0.998 -9.54 -0.828 -9.54 -0.8212 -5.54 -0.4714 -3.54 -0.3015 -2.54 -0.2217 -0.54 -0.0519 1.46 0.1322 4.46 0.3824 6.46 0.5534 16.46 1.4145 27.46 2.35

Survival data, mean=17.54, SD=11.68

Page 6: Section III Gaussian distribution Probability distributions (Binomial, Poisson)

Standard Gaussian (Normal)a distribution model

Standard Gaussian, μ=0, σ=1

0.00

0.05

0.10

0.15

0.20

0.25

0.30

0.35

0.40

0.45

-3.50 -3.00 -2.50 -2.00 -1.50 -1.00 -0.50 0.00 0.50 1.00 1.50 2.00 2.50 3.00 3.50

Z

34% 34%

16% 16%

Page 7: Section III Gaussian distribution Probability distributions (Binomial, Poisson)
Page 8: Section III Gaussian distribution Probability distributions (Binomial, Poisson)

Selected Gaussian percentiles Z lower area (P <Z)

-2.00 2.28%-1.96 2.50%-1.50 6.68%-1.00 15.87%

0.00 50.00% 1.00 84.13% 1.50 93.32% 1.96 97.50% 2.00 97.72%

Page 9: Section III Gaussian distribution Probability distributions (Binomial, Poisson)

Gaussian percentiles

0.00

0.05

0.10

0.15

0.20

0.25

0.30

0.35

0.40

0.45

-3.5 -3.0 -2.5 -2.0 -1.5 -1.0 -0.5 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5

Z

area = 0.933=93.3%Z= 1.5

Standard Gaussian, μ=0, σ=1

EXCEL function =NORMSDIST(Z) gives percentile from Z.EXCEL function =NORMSINV(p) gives Z from the percentile

Page 10: Section III Gaussian distribution Probability distributions (Binomial, Poisson)

Example- SAT VerbalMean=μ=500, SD=σ=100

What is your percentile if Y=700? Z= (700-500)/100=2.0, area=0.977=97.7%

What score is the 80th percentile, Z0.80=0.842 Y = 500 + 0.842 (100) = 584

What percent are between 450 and 500?For Y=450, Z=(450-500)/100=-.5, area=0.3085

For Y=500, Z=0, area=0.5000, so area between is 0.500-0.3085=0.1915=19%

Page 11: Section III Gaussian distribution Probability distributions (Binomial, Poisson)

Example- Anesthesia

Effective dose, μ=50 mg/kg, σ=10 mg/kgLethal dose, μ=110 mg/kg, σ=20 mg/kg

Q1= What dose with put 90% to sleep?

Q2- What is the risk of death from this dose?

Page 12: Section III Gaussian distribution Probability distributions (Binomial, Poisson)

Example- Anesthesia

Effective dose, μ=50 mg/kg, σ=10 mg/kgLethal dose, μ=110 mg/kg, σ=20 mg/kg

Q1= What dose with put 90% to sleep? Z0.90=1.28, Y=50+1.28 (10) = 62.8 mg/kg

Q2- What is the risk of death from this dose? Z=(62.8-110)/20= -2.36, area < 1%

Page 13: Section III Gaussian distribution Probability distributions (Binomial, Poisson)

Prediction intervals (not CI)If μ and σ are known and the data is

known to have a Gaussian distribution, the interval formed by

(μ-Zσ, μ+Zσ)is the (2k-100th) prediction interval for

the kth percentile Z (Z>0).Z=2, (μ-2σ, μ+2σ) is (approximately) the

95% prediction intervalImplies SD ≈ range/4 (extremes excluded)

Page 14: Section III Gaussian distribution Probability distributions (Binomial, Poisson)

Normal dist-differences & sums If Y1,Y2 each have independent normal distributions with means and SDs as below variable mean SD Y1 µ1 σ1 Y2 µ2 σ2 Then the difference & sum have normal dists. mean SD . diff=Y1-Y2 µ1-µ2 sqrt(σ1

2 + σ22)

sum=Y1+Y2 µ1+µ2 sqrt(σ12 + σ2

2)

Q: If σ1=σ2,what is mean diff with100% overlap?

Page 15: Section III Gaussian distribution Probability distributions (Binomial, Poisson)

Difference of two normals

Page 16: Section III Gaussian distribution Probability distributions (Binomial, Poisson)

Specificity & SensitivityFor serum Creatinine in normal adults

= 1.1 mg/dl = 0.2 mg/dl In one type of renal disease = 1.7 mg/dl = 0.4 mg/dl

If a cutoff value of 1.6 mg/dl is usedProb false pos= prob Y > 1.6 given normal

Prob false neg = prob Y < 1.6 given disease

Page 17: Section III Gaussian distribution Probability distributions (Binomial, Poisson)

Data transformations & logs Some continuous variables follow the

Gaussian on a transformed scale, not the original scale. Statland implies that perhaps 80% of continuous lab test variables follow a Gaussian on either the original (50%) or a transformed scale, usually the log scale.

(Clinical Decision Levels for lab Tests, 2nd ed, 1987, Med Econ)

Page 18: Section III Gaussian distribution Probability distributions (Binomial, Poisson)

Example-Bilirubin

0 50 100 150 200 250 300 350 0.25 0.5 0.75 1 1.25 1.5 1.75 2 2.25 2.5 2.75 3 3.25

Bilirubin umol/L Log Bilirubin, log10 umol/L

Mean=64.3Median=34.7

SD=104.3n=216

Mean=1.55Median=1.54

SD=0.456n=216

Page 19: Section III Gaussian distribution Probability distributions (Binomial, Poisson)

95% prediction intervals Original scale log 10 scale Mean 64.3 1.55 SD 104.3 0.456 2 SD 208.6 0.912

Lower -144.3 0.64Upper 272.9 2.46 *******************************************Geometric mean=101.55=35.5 mmol/LPrediction interval (100.64,102.46) or (4.3, 290)

Page 20: Section III Gaussian distribution Probability distributions (Binomial, Poisson)

Normal probability plotBilirubin – original scale

Normal plot- Bilirubin

-3.0

-2.0

-1.0

0.0

1.0

2.0

3.0

-1.0 0.0 1.0 2.0 3.0 4.0 5.0

observed Z = (Y-mean)/SD

Z as

sum

ing

Gau

ssia

n

Data is Gaussian if plot is a straight line- above not Gaussian

Page 21: Section III Gaussian distribution Probability distributions (Binomial, Poisson)

Normal probability plotBilirubin- log scale

Normal plot - log Bilirubin

-3.0

-2.0

-1.0

0.0

1.0

2.0

3.0

-3.0 -2.0 -1.0 0.0 1.0 2.0 3.0

observed Z=(Y-mean)/SD

Z as

sum

ing

Gau

ssia

n

Data is Gaussian if plot is a straight line as above

Page 22: Section III Gaussian distribution Probability distributions (Binomial, Poisson)

Log transformation (cont).The distribution of ratios is much closer to

Gaussian on the log scale The “inverse” of 3/1 is 1/3. This is

symmetric only on the log scale Original: 100/1, 10/1, 1/1, 1/10, 1/100 Log: 2, 1, 0, -1, -2 true for OR, RR and HR Measures of growth & proliferation have

distribution closer to the Gaussian on the log scale

Page 23: Section III Gaussian distribution Probability distributions (Binomial, Poisson)

Data distributions that tend to be Gaussian on the log scale

Growth measures - bacterial CFUAb or Ag titers (IgA, IgG, …)pHNeurological stimuli (dB, Snellen units)Steroids, hormones (Estrogen, Testosterone)Cytokines (IL-1, MCP-1, …)Liver function (Bilirubin, Creatinine)Hospital Length of stay (can be Poisson)

Page 24: Section III Gaussian distribution Probability distributions (Binomial, Poisson)

Quick Probability TheoryMutually exclusive events: levels of one

variable Blood type probability A 30% B 12% AB 8% O 50%Probability A or O = 30% + 50%=80%.Mutually exclusive probabilities add. All

(exhaustive) categories sum to 100%

Page 25: Section III Gaussian distribution Probability distributions (Binomial, Poisson)

Probability-Independent eventsThe probabilities of two independent events

multiply. (two or more variables)If 5% of pregnant women have gestational

diabetesIf 8% of pregnant women have pre-eclampsia Probability of gest. diabetes and pre-

eclampsia = 5% x 8% = 0.4% if independent.

Page 26: Section III Gaussian distribution Probability distributions (Binomial, Poisson)

Conditional probabilityProbability of an event changes if made conditional on another event. Probability

(prevalence) of TB is 0.1% in general population.

In Vietnamese immigrants, TB probability is 4%.

Conditional on being a Vietnamese immigrant, probability is 4%.

Page 27: Section III Gaussian distribution Probability distributions (Binomial, Poisson)

Conditional Probability & Bayes

A=Vietnamese n=5000

B=TB+ n=1000A∩B

N=200

n=1,000,000

Want prob TB|Vietnamese but can’t check all Vietnamese for TB

Page 28: Section III Gaussian distribution Probability distributions (Binomial, Poisson)

Conditonal Prob & Bayes RuleWhat is TB prevalence in Orange Co

Vietnamese population? Too hard to take census of all Vietnamese.Assume we know: P(A)=prop in Orange Co who are Viet=0.5% P(B)=prop in Orange Co who have TB = 0.1% P(A|B)=prop of those with TB who are Viet=20% Want P(B|A) = P(A|B) P(B)/ P(A) = (0.2 x 0.001)/(0.005) = 0.04=4%

Page 29: Section III Gaussian distribution Probability distributions (Binomial, Poisson)

Bayes rule for conditional probability (formula)Probability of B given A = P(B|A)=

Joint probability of A and B/Probability of A= P(A ∩ B)/P(A) =

Probability of A given B x Probability of BProbability of A

Bayes rule: P(B|A)=[ P(A|B)P(B)] / P(A)

If A and B are independent, P(B|A)=P(B) Also P(B) = ∑ P(B|Ai) (sum over all Ai)

Page 30: Section III Gaussian distribution Probability distributions (Binomial, Poisson)

Example: Bayes ruleA=Vietnamese, B=TB+In pop of 1,000,000,

5000 (0.5%=0.005) are Vietnamese=P(A), 1000 (0.1%=0.001) have TB+ =P(B).

Of 1000 with TB+, 200 (20%=0.20) are Vietnamese=P(A|B)

Want prob. of TB given Vietnamese? =P(B|A).P(B|A)= 0.20 (0.001)/0.005 = 0.04=4%.

=200/5000Can’t test all Viet for TB+, can check all TB+ for Viet

Page 31: Section III Gaussian distribution Probability distributions (Binomial, Poisson)

Bayes rule (graph)

1,000,000 pop

1000 TB+

200 Viet + TB+

5000 Viet

Conditional probability of TB+ given Vietnamese = 200/5000=4% B|A

B

A ∩ B

A

Check all TB+ for Viet rather than check all Viet for TB

Page 32: Section III Gaussian distribution Probability distributions (Binomial, Poisson)

Bayesian vs Frequentist Bayesian computes Prob(hypothesis|data) = Prob(data|hypothesis) P(hypothesis) Prob(data) = Data Likelihood x prior probability If data (evidence) refutes a hypothesis Prob(data | hypothesis)=0 so Prob(hypothesis | data)=0 Frequentist computes Prob(data*|hypothesis)= p value * p value is prob of observed data or more extreme data

Page 33: Section III Gaussian distribution Probability distributions (Binomial, Poisson)

Binomial distribution

0.000.100.200.300.400.500.600.70

0 1

Population: Positive= π = 0.30, negative = 1- π = 0.70 Y= number of positive responses out of n trials

n=1Y probability0 0.7001 0.300

n=2Y probability0 0.49=0.7 x 0.7 1 0.42= 0.7 x 0.3 x 22 0.09= 0.3 x 0.3

0.000.100.200.300.400.50

0 1 2

Page 34: Section III Gaussian distribution Probability distributions (Binomial, Poisson)

Binomial (cont.)

0.000.100.200.300.400.50

0 1 2 3

n=3Y probability0 0.3431 0.4412 0.1893 0.027

n=4Y probability0 0.24011 0.41162 0.26463 0.07564 0.0810

0.00

0.100.20

0.300.400.50

0 1 2 3 4

Page 35: Section III Gaussian distribution Probability distributions (Binomial, Poisson)

General binomial formulaProbability of y positive out of n

where π is prob of a single positive = n!/[y!(n-y)!] πy (1-π)(n-y)

Mean=πn, SD=√nπ(1-π)Ex:Prob of y=5 herpes cases out n=50 teens if herpes incidence=π=4%=0.04Prob=50!/(5! 45!)(0.04)5(0.96)45=3.4%

Can compute using “=Binomdist(y,n,π,0)” in EXCELFor example, =BINOMDIST(5,50,0.04,0) is 0.034

Page 36: Section III Gaussian distribution Probability distributions (Binomial, Poisson)

Binomial-fair coin examplefor π=0.5, easy to compute y=number of “heads” (success) out of n prob y out of n = n!/[y!(n-y)!] / 2n

Ex: n=3, flip 3 fair coins, 23=8 possibilities 0+0+0=0=y y freq prob 0+0+1=1=y 0 1 1/8 0+1+0=1=y 1 3 3/8 1+0+0=1=y 2 3 3/8 0+1+1=2=y 3 1 1/8 1+0+1=2=y total 8 8/8 1+1+0=2=y 1+1+1=3=y

Page 37: Section III Gaussian distribution Probability distributions (Binomial, Poisson)

Pascal’s triangle n y: 0 to n “success” 2n

- 1 1 1 1 2 2 1 2 1 4 3 1 3 3 1 8 4 1 4 6 4 1 16 5 1 5 10 10 5 1 32

For n=5, prob(y=2) is 10/32 prob(y≤2) is (1+5+10)/32=16/32

Page 38: Section III Gaussian distribution Probability distributions (Binomial, Poisson)

Headache remedy success

The “old” headache remedy was successful π=50% of the time, a true “population” value well established after years of study.

A “new” remedy is tried in 10 persons and is successful in 7 of the 10 (70%).

Is this enough evidence to “prove” that the new remedy is better?

Page 39: Section III Gaussian distribution Probability distributions (Binomial, Poisson)

Hypothesis testing-BinomialHow likely is y=7 success out of n=10 if

π=0.5, prob = 10!/(7!3!) / 210 = 120/1024=0.1172How likely y=7 or more (p value)? y probability 7 120/1024 = 0.1172 8 45/1024 = 0.0439 9 10/1024 = 0.0098 10 1/1024 = 0.0010 total 176/1024= 0.1719 <- p value

Page 40: Section III Gaussian distribution Probability distributions (Binomial, Poisson)

How likely is observing y=70 success out of n=100 if π=0.5 for each trial?

Prob(y=70)=[100!/(70! 30!)] / 2100 = 2.32 x 10-5How likely is it to observe 70 or more successes

out of 100? pr(y=70) + pr(y=71) + …+pr(y=100) = 3.93 x 10-5This is a simple example of hypothesis

testing. The probability of observing y=70 or more successes out of n=100 under the “null hypothesis” that the true population π=0.5 is called a one sided p value.

Page 41: Section III Gaussian distribution Probability distributions (Binomial, Poisson)

0.00

0.05

0.10

0.15

0.20

0.25

0.30

0 1 2 3 4 5 6 7 8 9 10

rel f

req

num of success = y

num success out of n=10, π=0.5

Page 42: Section III Gaussian distribution Probability distributions (Binomial, Poisson)

Gaussian approximation to Binomialok for large n, π not near 0 or 1

Binomial dist

0.000.020.040.060.080.100.120.140.160.18

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18

π =0.15, n=50, mean=0.15(50)=7.5, SD=√50(0.15)(0.85)=2.52

Actual 2.5th percentile is between 2 & 3, Gaussian 7.5-2(2.5)=2.5Actual 97.5th percentile is between 12 and 13, Gaussian=7.5 +2(2.5)=12.5

Page 43: Section III Gaussian distribution Probability distributions (Binomial, Poisson)

Poisson distribution for count dataFor a patient, y is a positive integer: 0,1,2,3,

…Probability of “y” responses (or events)

given mean μ= (μy e-μ)/ (y!)

(Note: μ0=1 by definition)

For Poisson, if mean=μ then SD=√μExamples: Number of colds in a season,

num neurons fired in 30 sec (firing rate)

Page 44: Section III Gaussian distribution Probability distributions (Binomial, Poisson)

Poisson example Q: If average num colds in a single winter is

μ=1.9, what is the probability that a given patient will have 4 colds in one winter?

A: (1.9)4e-1.9/4x3x2x1 = 0.0812 ≈ 8%.

What is the probability of 4 or more (find for 0-3, subtract from 1), prob=12%

Can compute in EXCEL with “=POISSON(y,mean,0)”.=POISSON(4, 1.9, 0) gives 0.0812. =POISSON(4, 1.9, 1) gives cumulative probability of 4 or

less (4,3,2,1,0) which is 0.9559.

Page 45: Section III Gaussian distribution Probability distributions (Binomial, Poisson)

Poisson distribution

Poisson distribution, mean=1.9, SD=1.38

0.000.050.100.150.200.250.30

0 1 2 3 4 5 6 7 8

num colds

prob

abili

ty

Page 46: Section III Gaussian distribution Probability distributions (Binomial, Poisson)

Poisson processMean rate of events is h events/unit=h

(Hazard rate). In T units, we expect μ=hT events on average. Can substitute this average (μ) into

(μy e-μ)/ (y!) to get probability of “y” events in T

units.

Page 47: Section III Gaussian distribution Probability distributions (Binomial, Poisson)

Poisson process exampleExample: Cancer clustersQ: Given a cancer rate of h=3/1000 person-years, what is

the expected number of cases in 2 years in a population of 1500?

A: Rate in 2 years is 2 x (3/1000) =h= 6/1000. Expected is μ=hT= 6/1000 x 1500 = 9 cases.

Q: What is the probability of observing exactly 15 cases?A: μ=9, Probability =(915 e-9)/15! = 0.019431≈ 2%.

Q: What is the probability of observing 15 or more cases in 1500 persons?

A: Plug in 0,1,2, …14 and add to get Q= probability of 14 or less. Probability is 1-Q = 1-0.958534 = 0.041466 ≈ 4%.

Can compute with “=Poisson(y,μ,0)” in EXCEL for probability of y events with mean μ. =Poisson(y,μ,1) gives cumulative probability of y or less.

Page 48: Section III Gaussian distribution Probability distributions (Binomial, Poisson)

Summary: Descriptive stats for Normal, Binomial & Poisson

n = sample size   Distribution mean variance SD SE Normal µ σ2 σ σ/√n Binomial π π(1-π) √π(1-π) √π(1-π)/n Poisson µ µ √µ √µ/n

SD = √variance, SE= SD/√n