Chapt 2. Variation How to: summarize/display random data

28

description

Chapt 2. Variation How to: summarize/display random data appreciate variation due to randomness Data summaries. single observation y (number, curve, image,...) sample y 1 ..., y n statistic s(y 1 ..., y n ). Features: location scale (spread) - PowerPoint PPT Presentation

Transcript of Chapt 2. Variation How to: summarize/display random data

Page 1: Chapt 2. Variation How to: summarize/display random data
Page 2: Chapt 2. Variation How to: summarize/display random data
Page 3: Chapt 2. Variation How to: summarize/display random data

Chapt 2. Variation

How to: summarize/display random data

appreciate variation due to randomness

Data summaries.

single observation y (number, curve, image,...)

sample y1 ..., yn

statistic s(y1 ..., yn)

Page 4: Chapt 2. Variation How to: summarize/display random data

Features: location

scale (spread)

Sample moments

= (y1 + ... + yn)/n average

s2 = Σ (y - )2 /(n-1) sample variance

Order statistics

y(1) y(2) ... y(n)

minimum, maximum, median, range

quartiles, quantiles

p 100% trimmed average

IQR, MAD = median{|yi - median(yi)|}

y

y

Page 5: Chapt 2. Variation How to: summarize/display random data

Bad data

Outlier - observation unusual compared to the others

Resistance

Trimmed average

Example (Midwife birth data). Hours in labor by day

n = 95

= 7.57 hr s2 = 12.97 hr2

min, med, max = 1.5, 7.5, 19 hr

quartiles 4.95, 9.75 hr

y

Page 6: Chapt 2. Variation How to: summarize/display random data
Page 7: Chapt 2. Variation How to: summarize/display random data

Graphs. Indispensable in data analysis

Histogram

disjoint bins [L+(k-1),L+k)

Plot count, nk , or proportion nk /n

EDF

#{yj y}/n

Estimates CDF, Prob{Y y}

Scatter plot (uj , vj )

Parallel boxplots - location, scale, shape, outliers, comparative

median, quartiles, 1.5 IQR

Page 8: Chapt 2. Variation How to: summarize/display random data
Page 9: Chapt 2. Variation How to: summarize/display random data

Random sample

Y1,...,Yn independent CDF F

Mean

E(Y) = y dF(y) (= yf(y)dy if density f)

p quantile

yp = F-1 (p)

Laplace (continuous)

f(y) = exp{-|y-|/}/2 , -<y<

Poisson (discrete)

Prob(Y=y) = f(y) = yexp{- }/y! , y=0,1,2, ...

Count of daily arrivals + poisson

Hours of labor + gamma

Page 10: Chapt 2. Variation How to: summarize/display random data
Page 11: Chapt 2. Variation How to: summarize/display random data

Gamma

f(y) =

Will be providing many examples of useful distributions in these beginning chapters

Some discrete, some continuous

0),(/}exp{1 yyy

Page 12: Chapt 2. Variation How to: summarize/display random data

SF Chron 01/26/09

Page 13: Chapt 2. Variation How to: summarize/display random data

Sampling variation.

"the data y1 ,..., yn will be regarded as the observed values of random variables" - probabilities defined

"ask how we would expect s(y1,...,yn) to behave on average, ..., understand the properties of S = S(Y1 ,...,Yn )"

Y1,...,Yn sample from distribution mean , variance 2

Sample moment ; E( ) = nE(Yj )/n = , unbiased

E(X + Y) = E(X) + E(Y)

Y Y

Page 14: Chapt 2. Variation How to: summarize/display random data

var( ) = 2/n

var(X+Y) = Var(X) + var(Y), if uncorrelated

var(aX) = a2 var(X)

(Yj - )2 = (Yj - + - )2

= (Yj - )2 + ( - )2

n2 = E( (Yj - )2 ) + 2

E(S2) = 2, unbiased

Birth data. n = 95, = 7.57 hr, s/n = 0.137 hr

Y

Y Y

Y Y

Y

y

Page 15: Chapt 2. Variation How to: summarize/display random data

Probability plot. Checking probability model

plot y(j) versus F-1(j/(n+1))

For normal take F =

from table or statistical package

Normal prob plot "works" if , unknown

For N(, 2 ), E(Y(j)) = + E(Z(j) )

Page 16: Chapt 2. Variation How to: summarize/display random data
Page 17: Chapt 2. Variation How to: summarize/display random data

Tools for approximation

Weak law of large numbers.

in probability as n

is a consistent estimate of

Definition.

{Sn} S in probability if for any > 0

Pr(|Sn - S| > ) 0

as n

If S = s0, constant and h(s) continuous at s0 then

h(Sn) h(s0) in probability

Y

Y

Page 18: Chapt 2. Variation How to: summarize/display random data
Page 19: Chapt 2. Variation How to: summarize/display random data

Central limit theorem.

n( - )/ Z = N(0,1) in distribution as n

Definition.

{Zn} converges in distribution to Z if

Pr(Zn z) Pr(Z z)

as n at every z for which Pr(Z z) is continuous

The CLT provides an approximation for "large" n

Y

Page 20: Chapt 2. Variation How to: summarize/display random data
Page 21: Chapt 2. Variation How to: summarize/display random data

Average as an estimate of .

If X is N( ,2) then (X - )/ is N(0,1)

Writing Zn = n( - )/

= + n-1/2 Zn

Indicates how efficiency of depends on n and

Y

Y

Y

Page 22: Chapt 2. Variation How to: summarize/display random data

Covariance and correlation.

cov(X,Y) = xy = E[{X-E(X)}{Y-E(Y)}]

sample covariance

Cxy = nj=1 (Xj - )(Yj - )/(n-1)

Cxy xy in probability

correlation

= cov(X,Y)/[var(X)var(Y)] -1 1

R = Cxy/[Cxx Cyy ]

R in probability

X Y

Page 23: Chapt 2. Variation How to: summarize/display random data

R = -.340

Page 24: Chapt 2. Variation How to: summarize/display random data

Some more distributions.

Cauchy

f(y) = 1/[{1 + (y - )2}] - < y <

distribution of same as that of Y1

no moments, long tails

Uniform

F(u) = 0 u 0

= u 0<u1

= 1 1 < u

E(U) = 1/2, center of gravity

Y

Page 25: Chapt 2. Variation How to: summarize/display random data

Exponential

f(y) = 0 y < 0

= exp{-y} y 0

Pareto

F(y) = 0 y < a

= 1 - (y/a)- y a a, > 0

Poisson process

Times of events y(1), y(2), y(3), ...

y(1), y(3)-y(2), y(4)-y(3),... i.i.d. exponential

Page 26: Chapt 2. Variation How to: summarize/display random data

Chi-squared distribution

Z1 , Z2 ,..., Z IN(0,1)

W = j=1 Z2

j

E(W) = var(W) = 2

Multinomial

page 47

p classes with probs 1 ,..., p adding to 1

Page 27: Chapt 2. Variation How to: summarize/display random data

Linear combination

L = a + bj Yj

E(L) = a + bj j

If independent

var(L) = bj2 j

2

If {Yj} are IN(j,j2), then L is

N(a + bj j, bj2 j

2 )

Page 28: Chapt 2. Variation How to: summarize/display random data

Moment-generating function

MY(t) = E(exp{tY}), t real

X, Y independent

MX+Y (t) = MX(t)MY(t)

For N(,2)

M(t) = exp{t + t2 2/2)

The normal is determined by its moments