Stat 31, Section 1, Last Time Hypothesis Testing –Careful about 1-sided vs. 2-sided Connection:...

Post on 16-Dec-2015

223 views 4 download

Transcript of Stat 31, Section 1, Last Time Hypothesis Testing –Careful about 1-sided vs. 2-sided Connection:...

Stat 31, Section 1, Last Time• Hypothesis Testing

– Careful about 1-sided vs. 2-sided

• Connection: CIs - Hypo Tests

• 3 Traps of Hypo Testing

– Statistically Sign’t ≠ Really Sign’t

– Non-sign’t ≠ Nothing there

– In many tests, will find some sign’t

• T Distribution (handles unknown σ)

Reading In Textbook

Approximate Reading for Today’s Material:

Pages 450-471, 485-504

Approximate Reading for Next Class:

Pages 536-549

Midterm IIComing on Tuesday, April 10

Think about:

• Sheet of Formulas– Again single 8 ½ x 11 sheet– New, since now more formulas

• Redoing HW…

• Asking about those not understood

• Will schedule Extra Office Hours

Sec. 7.1: Deeper look at Inference

Recall: “inference” = CIs and Hypo Tests

Main Issue: In sampling distribution

Usually is unknown, so replace with an estimate, .

For n large, should be “OK”, but what about:

• n small?

• How large is n “large”?

nNX /,0~

s

Unknown SD

Approach: Account for “extra variability in the approximation”

Mathematics: Assume individual

I.e.

• Data have mound shaped histogram

• Recall averages generally normal

• But now must focus on individuals

s ,~ NX i

Unknown SD

Then

Replace by , then

has a distribution named:

“t-distribution with n-1 degrees of freedom”

nNX /,~

1,0~ N

n

X

sn

sX

t - Distribution

Notes:

1. n is a parameter (like ) that controls “added variability from approximation

,,, ps

t - Distribution

Notes:

2. Careful: set “degrees of freedom” =

= n – 1 (not n)

• Easy to forget later

• Good to add to sheet of notes for exam

t - Distribution

Notes:

3. Must work with standardized version of

i.e.

• No longer can plug mean and SD

• into EXCEL formulas

• In text this was already done,

• Since need this for Normal table calc’ns

nsX X

t - Distribution

Notes:

4. Calculate t probs, i.e. areas,

using TDIST & TINV

Caution: these are set up differently from NORMDIST & NORMINV

See Class Example 26http://stat-or.unc.edu/webspace/postscript/marron/Teaching/stor155-2007/Stor155Eg26.xls

EXCEL Functions

Summary:

Normal:

plug in: get out:

NORMDIST: cutoff area

NORMINV: area cutoff

(but TDIST is set up really differently)

EXCEL Functionst distribution:

1 tail:

plug in: get out:

TDIST: cutoff area

EXCEL notes: - no explicit inverse

- backwards from Normal…

EXCEL Functions

t distribution:

Area

2 tail:

plug in: get out:

TDIST: cutoff area

TINV: area cutoff

(EXCEL note: this one has the inverse)

EXCEL Functions

Note: when need to invert the 1-tail TDIST,

Use twice the area.

Area = A Area = 2 A

t - Distribution

HW: C21

For T ~ t, with degrees of freedom:

(a) 3 (b) 12 (c) 150 (d) N(0,1)

Find:

i. P{T> 1.7} (0.094, 0.057, 0.046, 0.045)

ii. P{T < 2.14} (0.939, 0.973, 0.983, 0.984)

iii. P{T < -0.74} (0.256, 0.237, 0.230, 0.230)

iv. P{T > -1.83} (0.918, 0.954, 0.965, 0.966)

t - Distribution

HW: C21

v. P{|T| > 1.18} (0.323, 0.261, 0.240, 0.238)

vi. P{|T| < 2.39} (0.903, 0.966, 0.982, 0.983)

vii. P{|T| < -2.74} (0, 0, 0, 0)

viii. C so that 0.05 = P{|T| > C}

(3.18, 2.17, 1.98, 1.96)

ix. C so that 0.99 = P{|T| < C}

(5.84, 3.05, 2.61, 2.58)

t - Distribution

Application 1: Confidence Intervals

Recall:

margin of error

from NORMINV

or CONFIDENCE

Using TINV? Careful need to standardize

mX

t - DistributionUsing TINV? Careful need to standardize

# spaces on number line

Need to work into use TINV

mXmXbyveredcoP ,95.0

mXmXP

mXP

ns

mns

XP

ns

t - Distribution

distribution

So want:

i.e. want:

ns

mns

XP

95.0

nsm

nTINV )1,05.0(

ns

nTINVm )1,05.0(

nsm

nsX

t - Distribution

Terminology:

TINV(0.05,n-1) is called a critical value

(from connection between CIs and Tests)

HW: 7.19

t - Distribution

Class Example 27, Part Ihttp://stat-or.unc.edu/webspace/postscript/marron/Teaching/stor155-2007/Stor155Eg27.xls

Old text book problem 7.24:

In a study of DDT poisoning, researchers fed several rats a measured amount. They measured the “absolutely refractory period” required for a nerve to recover after a stimulus. Measurements on 4 rats gave:

t - Distribution

Class Example 27, Part Ihttp://stat-or.unc.edu/webspace/postscript/marron/Teaching/stor155-2007/Stor155Eg27.xls

Old text book problem 7.24:

Measurements on 4 rats gave:

1.6 1.7 1.8 1.9

a) Find the mean refractory period, and the standard error of the mean

b) Give a 95% CI for the mean “absolutely refractory period” for all rats of this strain

t - Distribution

Confidence Interval HW:

7.5, 7.7

And now for somethingcompletely different…

Two issues:

• What do professional statisticians think

about EXCEL?

• Why are the EXCEL functions so poorly

organized?

And now for somethingcompletely different…

Professional Statisticians Dislike Excel:

Very poor handling of numerics

Unacceptable?!?

Jeff Simonoff Example:http://www.stern.nyu.edu/~jsimonof/classes/1305/pdf/excelreg.pdf

And now for somethingcompletely different…

A similar example:

Class Example 28:http://stat-or.unc.edu/webspace/postscript/marron/Teaching/stor155-2007/Stor155Eg28.xls

Problem 1: Excel doesn’t keep enough

significant digits (relative to other

software)

[single precision vs. double precision]

And now for somethingcompletely different…

Problem 2: Excel doesn’t warn when

troubles are encountered…

• All software has this problem sometimes

• But is easy to provide warnings…

• “Competent software does this…”

And now for somethingcompletely different…

More discussion of Excel accuracy issues:

http://www.bus.ualberta.ca/eerkut/TMSSdraft3.html

By Erhan Erkut, University of Alberta:

http://www.bus.ualberta.ca/eerkut/

And now for somethingcompletely different…

Why are the EXCEL functions so poorly

organized?

E.g. NORMDIST uses left areas

TDIST uses right or 2-sided areas

E.g. NORMINV uses left areas

TINV uses 2-sided areas

More to come…

And now for somethingcompletely different…

Why are the EXCEL functions so poorly

organized?

Looks like programmer was handed a

statistics text, and told “turn these into

functions”…

Problem: organization was good for table

look ups, but looks clunky now…

And now for somethingcompletely different…

Fun personal story:

• Colin Bell AT Microsoft heard about

“complaints from statisticians on EXCEL”

• Decided to “try to fix these”

• Contacted Jeff Simonoff about numerics

• Asked Jeff to work with him

• Jeff refused, doesn’t like or use EXCEL

And now for somethingcompletely different…

Fun personal story:

• Jeff told Colin about me

• Colin asked me

• I agreed about numerical problems, but

said I had bigger objections about

organization

• Colin asked me to write these up

And now for somethingcompletely different…

Fun personal story:

• I said I was too busy, but…

• I would teach (similar course) soon.

• I offered to send an email, every time I

noted an organizational inconsistency

• Over the semester, I sent around 30

emails about all of these

And now for somethingcompletely different…

Fun personal story:

• Colin agreed with each of the points

made

• Colin approached the statistical people

at Microsoft

• They agreed that organization could

have been done better

And now for somethingcompletely different…

Fun personal story:

• But for “backwards compatibility”

reasons, refused to change anything

• Colin apologetically archived all my

emails…

And now for somethingcompletely different…

How much should we worry:

• Organization is a pain, but you can live

with it

(OK to complain when you feel like it)

• Usually (except for weird rounding)

numerical issues don’t arise, but need to

be aware of potential!

t - Distribution

Application 2: Hypothesis Tests

Idea: Calculate P-values using TDIST

t – Distribution Hypo Testing

E.g. Old Textbook Example 7.26

For the above DDT poisoning example, Suppose that the mean “absolutely refractory period” is known to be 1.3. DDT poisoning should slow nerve recovery, and so increase this period. Do the data give good evidence for this supposition?

t – Distribution Hypo Testing

E.g. Old Textbook Example 7.26

Let = population mean absolutely

refractory period for poisoned rats.

(from before)

3.1:0 H

3.1: AH

75.1X

t – Distribution Hypo Testing

E.g. Old Textbook Example 7.26 P-value = P{what saw or more conclusive | H0 – HA Bdry}

3.1|75.1 XP

3.1|

3.175.1 nsns

XP

1,3,

3.175.13.175.13 ns

TDISTns

tP

t – Distribution Hypo TestingE.g. Old Textbook Example 7.26

From Class Example 27, part 2:http://stat-or.unc.edu/webspace/postscript/marron/Teaching/stor155-2007/Stor155Eg27.xls

= 0.003

Interpretation: very strong evidence, for either yes-no or gray-level

t – Distribution Hypo TestingVariations:

• For “opposite direction” hypotheses:

P-value =

Then use symmetry, i.e. put - into TDIST.

:AH

tP

t – Distribution Hypo TestingVariations:

• For 2-sided hypotheses:

Use 2-tailed version of TDIST.

t – Distribution Hypo Testing

HW: 7.13

7.16 (0.04), 7.17, 7.21 a, f

Interpret P-values:

(i) yes-no

(ii) gray-level