Section III Gaussian distribution Probability distributions (Binomial, Poisson)

Click here to load reader

  • date post

    24-Feb-2016
  • Category

    Documents

  • view

    38
  • download

    0

Embed Size (px)

description

Section III Gaussian distribution Probability distributions (Binomial, Poisson). Notation Statistic Sample Population mean Y μ Std deviation S or SD σ proportion P π - PowerPoint PPT Presentation

Transcript of Section III Gaussian distribution Probability distributions (Binomial, Poisson)

Slide 1

Section IIIGaussian distributionProbability distributions(Binomial, Poisson)

NotationStatistic Sample Population mean Y Std deviation S or SD proportion P mean difference d Correlation coeff r rate (regression) b Num of obs n NDensities PercentilesBMI=22 is the 88th percentile

Standard Z scoresDefinition: Z = (Y mean)/ SD Y = mean + Z SD Z is how many SD units Y is above or below mean. Mean & SD might be sample (Y, S) or population (,) values if population values are known. YY - meanZ= (Y - mean)/SD4-13.54-1.166-11.54-0.998-9.54-0.828-9.54-0.8212-5.54-0.4714-3.54-0.3015-2.54-0.2217-0.54-0.05191.460.13224.460.38246.460.553416.461.414527.462.35Survival data, mean=17.54, SD=11.68Standard Gaussian (Normal)a distribution model

Selected Gaussian percentiles Z lower area (P 0).Z=2, (-2, +2) is (approximately) the 95% prediction intervalImplies SD range/4 (extremes excluded) Normal dist-differences & sums If Y1,Y2 each have independent normal distributions with means and SDs as below variable mean SD Y1 1 1 Y2 2 2 Then the difference & sum have normal dists. mean SD . diff=Y1-Y2 1-2 sqrt(12 + 22) sum=Y1+Y2 1+2 sqrt(12 + 22)

Q: If 1=2,what is mean diff with100% overlap?

14Difference of two normals

Specificity & SensitivityFor serum Creatinine in normal adults = 1.1 mg/dl = 0.2 mg/dl In one type of renal disease = 1.7 mg/dl = 0.4 mg/dl If a cutoff value of 1.6 mg/dl is usedProb false pos= prob Y > 1.6 given normalProb false neg = prob Y < 1.6 given diseaseData transformations & logs Some continuous variables follow the Gaussian on a transformed scale, not the original scale. Statland implies that perhaps 80% of continuous lab test variables follow a Gaussian on either the original (50%) or a transformed scale, usually the log scale. (Clinical Decision Levels for lab Tests, 2nd ed, 1987, Med Econ)Example-Bilirubin

Bilirubin umol/LLog Bilirubin, log10 umol/LMean=64.3Median=34.7SD=104.3n=216Mean=1.55Median=1.54SD=0.456n=21695% prediction intervals Original scale log 10 scale Mean 64.3 1.55 SD 104.3 0.456 2 SD 208.6 0.912

Lower -144.3 0.64Upper 272.9 2.46 *******************************************Geometric mean=101.55=35.5 mmol/LPrediction interval (100.64,102.46) or (4.3, 290)

Normal probability plotBilirubin original scale

Data is Gaussian if plot is a straight line- above not GaussianNormal probability plotBilirubin- log scale

Data is Gaussian if plot is a straight line as above Log transformation (cont).The distribution of ratios is much closer to Gaussian on the log scale The inverse of 3/1 is 1/3. This is symmetric only on the log scale Original: 100/1, 10/1, 1/1, 1/10, 1/100 Log: 2, 1, 0, -1, -2 true for OR, RR and HR Measures of growth & proliferation have distribution closer to the Gaussian on the log scale Data distributions that tend to be Gaussian on the log scaleGrowth measures - bacterial CFUAb or Ag titers (IgA, IgG, )pHNeurological stimuli (dB, Snellen units)Steroids, hormones (Estrogen, Testosterone)Cytokines (IL-1, MCP-1, )Liver function (Bilirubin, Creatinine)Hospital Length of stay (can be Poisson)Quick Probability TheoryMutually exclusive events: levels of one variable Blood type probability A 30% B 12% AB 8% O 50%Probability A or O = 30% + 50%=80%.Mutually exclusive probabilities add. All (exhaustive) categories sum to 100%Probability-Independent eventsThe probabilities of two independent events multiply. (two or more variables)If 5% of pregnant women have gestational diabetesIf 8% of pregnant women have pre-eclampsia Probability of gest. diabetes and pre- eclampsia = 5% x 8% = 0.4% if independent.

Conditional probabilityProbability of an event changes if made conditional on another event. Probability (prevalence) of TB is 0.1% in general population. In Vietnamese immigrants, TB probability is 4%. Conditional on being a Vietnamese immigrant, probability is 4%. Conditional Probability & BayesA=Vietnamese n=5000B=TB+ n=1000ABN=200n=1,000,000Want prob TB|Vietnamese but cant check all Vietnamese for TBConditonal Prob & Bayes RuleWhat is TB prevalence in Orange Co Vietnamese population? Too hard to take census of all Vietnamese.Assume we know: P(A)=prop in Orange Co who are Viet=0.5% P(B)=prop in Orange Co who have TB = 0.1% P(A|B)=prop of those with TB who are Viet=20% Want P(B|A) = P(A|B) P(B)/ P(A) = (0.2 x 0.001)/(0.005) = 0.04=4%Bayes rule for conditional probability (formula)Probability of B given A = P(B|A)=Joint probability of A and B/Probability of A= P(A B)/P(A) =

Probability of A given B x Probability of BProbability of A

Bayes rule: P(B|A)=[ P(A|B)P(B)] / P(A)

If A and B are independent, P(B|A)=P(B) Also P(B) = P(B|Ai) (sum over all Ai)Example: Bayes ruleA=Vietnamese, B=TB+In pop of 1,000,000, 5000 (0.5%=0.005) are Vietnamese=P(A), 1000 (0.1%=0.001) have TB+ =P(B). Of 1000 with TB+, 200 (20%=0.20) are Vietnamese=P(A|B)Want prob. of TB given Vietnamese? =P(B|A).P(B|A)= 0.20 (0.001)/0.005 = 0.04=4%. =200/5000Cant test all Viet for TB+, can check all TB+ for VietBayes rule (graph)1,000,000 pop1000 TB+200 Viet + TB+5000 VietConditional probability of TB+ given Vietnamese = 200/5000=4% B|ABA BACheck all TB+ for Viet rather than check all Viet for TBBayesian vs Frequentist Bayesian computes Prob(hypothesis|data) = Prob(data|hypothesis) P(hypothesis) Prob(data) = Data Likelihood x prior probability If data (evidence) refutes a hypothesis Prob(data | hypothesis)=0 so Prob(hypothesis | data)=0 Frequentist computes Prob(data*|hypothesis)= p value * p value is prob of observed data or more extreme data Binomial distribution

Population: Positive= = 0.30, negative = 1- = 0.70 Y= number of positive responses out of n trialsn=1Y probability0 0.7001 0.300 n=2Y probability0 0.49=0.7 x 0.7 0.42= 0.7 x 0.3 x 22 0.09= 0.3 x 0.3

Binomial (cont.)

n=3Y probability0 0.343 0.441 0.1893 0.027 n=4Y probability0 0.2401 0.4116 0.2646 0.07564 0.0810

General binomial formulaProbability of y positive out of nwhere is prob of a single positive = n!/[y!(n-y)!] y (1-)(n-y)

Mean=n, SD=n(1-)Ex:Prob of y=5 herpes cases out n=50 teens if herpes incidence==4%=0.04Prob=50!/(5! 45!)(0.04)5(0.96)45=3.4%

Can compute using =Binomdist(y,n,,0) in EXCELFor example, =BINOMDIST(5,50,0.04,0) is 0.034

Binomial-fair coin examplefor =0.5, easy to compute y=number of heads (success) out of n prob y out of n = n!/[y!(n-y)!] / 2n

Ex: n=3, flip 3 fair coins, 23=8 possibilities 0+0+0=0=y y freq prob 0+0+1=1=y 0 1 1/8 0+1+0=1=y 1 3 3/8 1+0+0=1=y 2 3 3/8 0+1+1=2=y 3 1 1/8 1+0+1=2=y total 8 8/8 1+1+0=2=y 1+1+1=3=yPascals triangle n y: 0 to n success 2n - 1 1 1 1 2 2 1 2 1 4 3 1 3 3 1 8 4 1 4 6 4 1 16 5 1 5 10 10 5 1 32

For n=5, prob(y=2) is 10/32 prob(y2) is (1+5+10)/32=16/32Headache remedy successThe old headache remedy was successful =50% of the time, a true population value well established after years of study. A new remedy is tried in 10 persons and is successful in 7 of the 10 (70%). Is this enough evidence to prove that the new remedy is better? Hypothesis testing-BinomialHow likely is y=7 success out of n=10 if =0.5, prob = 10!/(7!3!) / 210 = 120/1024=0.1172How likely y=7 or more (p value)? y probability 7 120/1024 = 0.1172 8 45/1024 = 0.0439 9 10/1024 = 0.0098 10 1/1024 = 0.0010 total 176/1024= 0.1719