23
Topic 4: Statistical Inference

may-welch
• Category

## Documents

• view

290

4

### Transcript of Topic 4: Statistical Inference. Outline Statistical inference –confidence intervals...

Topic 4: Statistical Inference

Outline

• Statistical inference – confidence intervals – significance tests

• Statistical inference for β1

• Statistical inference for β0

• Tower of Pisa example

Theory for Statistical Inference

• Xi iid Normal(μ,σ2), parameters unknown

2i i2

2

X X XX ,

n n 1

, Xn

( )

( )

s

ss s s

Theory for Statistical Inference

• Consider variable

• t is distributed as t(n-1) • Use distribution in inference for m

– confidence intervals– significance tests

X

(X)t

s

Confidence Intervals

where tc= t(1-α/2,n-1), the upper (1-a/2)100 percentile of the t distribution with n-1 degrees of freedom

• 1-a is the confidence level

X (X)ct s

Confidence Intervals

• is the sample mean (center of interval)

• s( ) is the estimated standard deviation of , sometimes called the standard error of the mean

• is the margin of error and describes the precision of the estimate

XX

X

(X)ct s

Confidence Intervals

• Procedure such that (1-a)100% of the time, the true mean will be contained in interval

• Do not know whether a single interval is one that contains the mean or not

• Confidence describes “long-run” behavior of procedure

• If data non-Normal, procedure only approximate (central limit theorem)

Significance tests

0 0 0

*0

*0 c c

*

vs

Reject if t t(1 2 n 1)

Prob where t t(n 1)

: :

t (X ) (X)

t | t |,

( t t ),

a

α / ,

~ -

H H

s

H

P

Significance tests

• Under H0 t* will have distribution t(n-1)

• P(reject H0 | H0 true) = a (Type I error)

• Under Ha, t* will have noncentral t(n-1) dists

• P(DNR H0 | Ha true) = b (Type II error)

• Type II error related to the power of the test

NOTE

IN THIS COURSE USE α=.05

UNLESS SPECIFIED OTHERWISE

Theory for β1 Inference

21 1 1

2 2 21 i

*1 1 1

2 21 i

*

b ~ ( , (b ))

where (b ) (X X)

t (b ) / (b )

where (b ) (X X)

Under , t ~ t(n 2)0

N

s

s s

H

Confidence Interval for β1

b1 ± tcs(b1) where tc = t(1-α/2,n-2), the upper (1-α/2)100 percentile of the t distribution with n-2 degrees of freedom

•1-α is the confidence level

Significance tests for β1

0 1 1

*1 1

*0 c c

*

vs

Reject if t 1 2 n 2)

Prob where t~t(n 2)

: 0 : 0

t (b 0) (b )

t | t |, t(

( t t ),

a

α / ,

H H

s

H

P

Theory for β0 Inference2

0 0 0

22 2

0 2i

*0 0 0

2 20

*0

b ~ ( , (b ))

1 X where (b )

n (X X)

t (b ) / (b )

for (b ) replace by and take

Under , t ~ t(n 2)

N

s

s s

H

Confidence Interval for β0

b0 ± tcs(b0) where tc = t(1-α/2,n-2), the upper (1-α/2)100 percentile of the t distribution with n-2 degrees of freedom

•1-α is the confidence level

Significance tests for β0

0 0 0

*0 0

*0 c c

*

vs

Reject if t t(1 2 n 2)

Prob where t~t(n 2)

: 0 : 0

t (b 0) (b )

t | t |,

( t t ),

a

α / ,

H H

s

H

P

Notes

• The normality of b0 and b1 follows from the fact that each of these is a linear combination of the Yi, each of which is an independent normal

• For b1 see KNNL p42

• For b0 try this as an exercise

Notes

• Usually the CI and significance test for β0 is not of interest

• If the ei are not normal but are relatively symmetric, then the CIs and significance tests are reasonable approximations

Notes

• These procedures can easily be modified to produce one-sided confidence intervals and significance tests

• Because we can make this quantity small by making

large.

2 2 21( ) ( )ib X X

2

1

( )n

ii

X X

SAS Proc Reg

proc reg data=a1; model lean=year/clb;run;

clb option generates confidence intervals

Parameter Estimates

Variable DFParameter

EstimateStandard

Error t Value Pr > |t|95% Confidence

LimitsIntercept 1 -61.12088 25.12982 -2.43 0.0333 -116.43124 -5.81052year 1 9.31868 0.30991 30.07 <.0001 8.63656 10.00080

CIs given here….CI for intercept is uninteresting

Review

• What is the default value of α that we will use in this class?

• What is the default confidence level that we use in this class?

• Suppose you could choose the X’s. How would you choose them if you wanted a precise estimate of the slope? intercept? both?