Post on 16-Dec-2015
Stat 31, Section 1, Last Time• Hypothesis Testing
– Careful about 1-sided vs. 2-sided
• Connection: CIs - Hypo Tests
• 3 Traps of Hypo Testing
– Statistically Sign’t ≠ Really Sign’t
– Non-sign’t ≠ Nothing there
– In many tests, will find some sign’t
• T Distribution (handles unknown σ)
Reading In Textbook
Approximate Reading for Today’s Material:
Pages 450-471, 485-504
Approximate Reading for Next Class:
Pages 536-549
Midterm IIComing on Tuesday, April 10
Think about:
• Sheet of Formulas– Again single 8 ½ x 11 sheet– New, since now more formulas
• Redoing HW…
• Asking about those not understood
• Will schedule Extra Office Hours
Sec. 7.1: Deeper look at Inference
Recall: “inference” = CIs and Hypo Tests
Main Issue: In sampling distribution
Usually is unknown, so replace with an estimate, .
For n large, should be “OK”, but what about:
• n small?
• How large is n “large”?
nNX /,0~
s
Unknown SD
Approach: Account for “extra variability in the approximation”
Mathematics: Assume individual
I.e.
• Data have mound shaped histogram
• Recall averages generally normal
• But now must focus on individuals
s ,~ NX i
Unknown SD
Then
Replace by , then
has a distribution named:
“t-distribution with n-1 degrees of freedom”
nNX /,~
1,0~ N
n
X
sn
sX
t - Distribution
Notes:
1. n is a parameter (like ) that controls “added variability from approximation
,,, ps
t - Distribution
Notes:
2. Careful: set “degrees of freedom” =
= n – 1 (not n)
• Easy to forget later
• Good to add to sheet of notes for exam
t - Distribution
Notes:
3. Must work with standardized version of
i.e.
• No longer can plug mean and SD
• into EXCEL formulas
• In text this was already done,
• Since need this for Normal table calc’ns
nsX X
t - Distribution
Notes:
4. Calculate t probs, i.e. areas,
using TDIST & TINV
Caution: these are set up differently from NORMDIST & NORMINV
See Class Example 26http://stat-or.unc.edu/webspace/postscript/marron/Teaching/stor155-2007/Stor155Eg26.xls
EXCEL Functions
Summary:
Normal:
plug in: get out:
NORMDIST: cutoff area
NORMINV: area cutoff
(but TDIST is set up really differently)
EXCEL Functionst distribution:
1 tail:
plug in: get out:
TDIST: cutoff area
EXCEL notes: - no explicit inverse
- backwards from Normal…
EXCEL Functions
t distribution:
Area
2 tail:
plug in: get out:
TDIST: cutoff area
TINV: area cutoff
(EXCEL note: this one has the inverse)
EXCEL Functions
Note: when need to invert the 1-tail TDIST,
Use twice the area.
Area = A Area = 2 A
t - Distribution
HW: C21
For T ~ t, with degrees of freedom:
(a) 3 (b) 12 (c) 150 (d) N(0,1)
Find:
i. P{T> 1.7} (0.094, 0.057, 0.046, 0.045)
ii. P{T < 2.14} (0.939, 0.973, 0.983, 0.984)
iii. P{T < -0.74} (0.256, 0.237, 0.230, 0.230)
iv. P{T > -1.83} (0.918, 0.954, 0.965, 0.966)
t - Distribution
HW: C21
v. P{|T| > 1.18} (0.323, 0.261, 0.240, 0.238)
vi. P{|T| < 2.39} (0.903, 0.966, 0.982, 0.983)
vii. P{|T| < -2.74} (0, 0, 0, 0)
viii. C so that 0.05 = P{|T| > C}
(3.18, 2.17, 1.98, 1.96)
ix. C so that 0.99 = P{|T| < C}
(5.84, 3.05, 2.61, 2.58)
t - Distribution
Application 1: Confidence Intervals
Recall:
margin of error
from NORMINV
or CONFIDENCE
Using TINV? Careful need to standardize
mX
t - DistributionUsing TINV? Careful need to standardize
# spaces on number line
Need to work into use TINV
mXmXbyveredcoP ,95.0
mXmXP
mXP
ns
mns
XP
ns
t - Distribution
distribution
So want:
i.e. want:
ns
mns
XP
95.0
nsm
nTINV )1,05.0(
ns
nTINVm )1,05.0(
nsm
nsX
t - Distribution
Terminology:
TINV(0.05,n-1) is called a critical value
(from connection between CIs and Tests)
HW: 7.19
t - Distribution
Class Example 27, Part Ihttp://stat-or.unc.edu/webspace/postscript/marron/Teaching/stor155-2007/Stor155Eg27.xls
Old text book problem 7.24:
In a study of DDT poisoning, researchers fed several rats a measured amount. They measured the “absolutely refractory period” required for a nerve to recover after a stimulus. Measurements on 4 rats gave:
t - Distribution
Class Example 27, Part Ihttp://stat-or.unc.edu/webspace/postscript/marron/Teaching/stor155-2007/Stor155Eg27.xls
Old text book problem 7.24:
Measurements on 4 rats gave:
1.6 1.7 1.8 1.9
a) Find the mean refractory period, and the standard error of the mean
b) Give a 95% CI for the mean “absolutely refractory period” for all rats of this strain
t - Distribution
Confidence Interval HW:
7.5, 7.7
And now for somethingcompletely different…
Two issues:
• What do professional statisticians think
about EXCEL?
• Why are the EXCEL functions so poorly
organized?
And now for somethingcompletely different…
Professional Statisticians Dislike Excel:
Very poor handling of numerics
Unacceptable?!?
Jeff Simonoff Example:http://www.stern.nyu.edu/~jsimonof/classes/1305/pdf/excelreg.pdf
And now for somethingcompletely different…
A similar example:
Class Example 28:http://stat-or.unc.edu/webspace/postscript/marron/Teaching/stor155-2007/Stor155Eg28.xls
Problem 1: Excel doesn’t keep enough
significant digits (relative to other
software)
[single precision vs. double precision]
And now for somethingcompletely different…
Problem 2: Excel doesn’t warn when
troubles are encountered…
• All software has this problem sometimes
• But is easy to provide warnings…
• “Competent software does this…”
And now for somethingcompletely different…
More discussion of Excel accuracy issues:
http://www.bus.ualberta.ca/eerkut/TMSSdraft3.html
By Erhan Erkut, University of Alberta:
http://www.bus.ualberta.ca/eerkut/
And now for somethingcompletely different…
Why are the EXCEL functions so poorly
organized?
E.g. NORMDIST uses left areas
TDIST uses right or 2-sided areas
E.g. NORMINV uses left areas
TINV uses 2-sided areas
More to come…
And now for somethingcompletely different…
Why are the EXCEL functions so poorly
organized?
Looks like programmer was handed a
statistics text, and told “turn these into
functions”…
Problem: organization was good for table
look ups, but looks clunky now…
And now for somethingcompletely different…
Fun personal story:
• Colin Bell AT Microsoft heard about
“complaints from statisticians on EXCEL”
• Decided to “try to fix these”
• Contacted Jeff Simonoff about numerics
• Asked Jeff to work with him
• Jeff refused, doesn’t like or use EXCEL
And now for somethingcompletely different…
Fun personal story:
• Jeff told Colin about me
• Colin asked me
• I agreed about numerical problems, but
said I had bigger objections about
organization
• Colin asked me to write these up
And now for somethingcompletely different…
Fun personal story:
• I said I was too busy, but…
• I would teach (similar course) soon.
• I offered to send an email, every time I
noted an organizational inconsistency
• Over the semester, I sent around 30
emails about all of these
And now for somethingcompletely different…
Fun personal story:
• Colin agreed with each of the points
made
• Colin approached the statistical people
at Microsoft
• They agreed that organization could
have been done better
And now for somethingcompletely different…
Fun personal story:
• But for “backwards compatibility”
reasons, refused to change anything
• Colin apologetically archived all my
emails…
And now for somethingcompletely different…
How much should we worry:
• Organization is a pain, but you can live
with it
(OK to complain when you feel like it)
• Usually (except for weird rounding)
numerical issues don’t arise, but need to
be aware of potential!
t - Distribution
Application 2: Hypothesis Tests
Idea: Calculate P-values using TDIST
t – Distribution Hypo Testing
E.g. Old Textbook Example 7.26
For the above DDT poisoning example, Suppose that the mean “absolutely refractory period” is known to be 1.3. DDT poisoning should slow nerve recovery, and so increase this period. Do the data give good evidence for this supposition?
t – Distribution Hypo Testing
E.g. Old Textbook Example 7.26
Let = population mean absolutely
refractory period for poisoned rats.
(from before)
3.1:0 H
3.1: AH
75.1X
t – Distribution Hypo Testing
E.g. Old Textbook Example 7.26 P-value = P{what saw or more conclusive | H0 – HA Bdry}
3.1|75.1 XP
3.1|
3.175.1 nsns
XP
1,3,
3.175.13.175.13 ns
TDISTns
tP
t – Distribution Hypo TestingE.g. Old Textbook Example 7.26
From Class Example 27, part 2:http://stat-or.unc.edu/webspace/postscript/marron/Teaching/stor155-2007/Stor155Eg27.xls
= 0.003
Interpretation: very strong evidence, for either yes-no or gray-level
t – Distribution Hypo TestingVariations:
• For “opposite direction” hypotheses:
P-value =
Then use symmetry, i.e. put - into TDIST.
:AH
tP
t – Distribution Hypo TestingVariations:
• For 2-sided hypotheses:
Use 2-tailed version of TDIST.
t – Distribution Hypo Testing
HW: 7.13
7.16 (0.04), 7.17, 7.21 a, f
Interpret P-values:
(i) yes-no
(ii) gray-level