Math 141 - Lecture 15: Z-tests and t-tests - jones/Courses/P15.pdf · PDF fileMath 141...

28
Math 141 Lecture 15: Z-tests and t-tests Albyn Jones 1 1 Library 304 [email protected] www.people.reed.edu/jones/courses/141 Albyn Jones Math 141

Transcript of Math 141 - Lecture 15: Z-tests and t-tests - jones/Courses/P15.pdf · PDF fileMath 141...

Page 1: Math 141 - Lecture 15: Z-tests and t-tests - jones/Courses/P15.pdf · PDF fileMath 141 Lecture 15: Z-tests and t-tests Albyn Jones1 1Library 304 jones@reed.edu jones/courses/141 Albyn

Math 141Lecture 15: Z-tests and t-tests

Albyn Jones1

1Library [email protected]

www.people.reed.edu/∼jones/courses/141

Albyn Jones Math 141

Page 2: Math 141 - Lecture 15: Z-tests and t-tests - jones/Courses/P15.pdf · PDF fileMath 141 Lecture 15: Z-tests and t-tests Albyn Jones1 1Library 304 jones@reed.edu jones/courses/141 Albyn

The Z-test

Suppose we have a single observation from a N(µ,1)distribution. We reject H0 : µ = 0 if we get the relatively rareoutcomes in the tails: for α = .05, if |Z | > 1.96.

−4 −2 0 2 4

0.0

0.1

0.2

0.3

0.4

X

Den

sity

alpha = 0.05

qnorm(.975) = 1.96qnorm(.025) = −1.96

Albyn Jones Math 141

Page 3: Math 141 - Lecture 15: Z-tests and t-tests - jones/Courses/P15.pdf · PDF fileMath 141 Lecture 15: Z-tests and t-tests Albyn Jones1 1Library 304 jones@reed.edu jones/courses/141 Albyn

The Z-test for X

Suppose that X1,X2, . . . ,Xn are IID RV’s with mean µ and SDσ. To test H0 : µ = µ0, reason as follows:

If H0 were true, X would have approximately a N(µ0, σ2/n)

distribution.Therefore, if we standardize, we have approximately astandard normal RV:

Z =X − µ0

σ/√

n

Reject H0 : µ = µ0 at α = .05 if |Z | > 1.96

Albyn Jones Math 141

Page 4: Math 141 - Lecture 15: Z-tests and t-tests - jones/Courses/P15.pdf · PDF fileMath 141 Lecture 15: Z-tests and t-tests Albyn Jones1 1Library 304 jones@reed.edu jones/courses/141 Albyn

The Z-test for X

Suppose that X1,X2, . . . ,Xn are IID RV’s with mean µ and SDσ. To test H0 : µ = µ0, reason as follows:

If H0 were true, X would have approximately a N(µ0, σ2/n)

distribution.

Therefore, if we standardize, we have approximately astandard normal RV:

Z =X − µ0

σ/√

n

Reject H0 : µ = µ0 at α = .05 if |Z | > 1.96

Albyn Jones Math 141

Page 5: Math 141 - Lecture 15: Z-tests and t-tests - jones/Courses/P15.pdf · PDF fileMath 141 Lecture 15: Z-tests and t-tests Albyn Jones1 1Library 304 jones@reed.edu jones/courses/141 Albyn

The Z-test for X

Suppose that X1,X2, . . . ,Xn are IID RV’s with mean µ and SDσ. To test H0 : µ = µ0, reason as follows:

If H0 were true, X would have approximately a N(µ0, σ2/n)

distribution.Therefore, if we standardize, we have approximately astandard normal RV:

Z =X − µ0

σ/√

n

Reject H0 : µ = µ0 at α = .05 if |Z | > 1.96

Albyn Jones Math 141

Page 6: Math 141 - Lecture 15: Z-tests and t-tests - jones/Courses/P15.pdf · PDF fileMath 141 Lecture 15: Z-tests and t-tests Albyn Jones1 1Library 304 jones@reed.edu jones/courses/141 Albyn

The Z-test for X

Suppose that X1,X2, . . . ,Xn are IID RV’s with mean µ and SDσ. To test H0 : µ = µ0, reason as follows:

If H0 were true, X would have approximately a N(µ0, σ2/n)

distribution.Therefore, if we standardize, we have approximately astandard normal RV:

Z =X − µ0

σ/√

n

Reject H0 : µ = µ0 at α = .05 if |Z | > 1.96

Albyn Jones Math 141

Page 7: Math 141 - Lecture 15: Z-tests and t-tests - jones/Courses/P15.pdf · PDF fileMath 141 Lecture 15: Z-tests and t-tests Albyn Jones1 1Library 304 jones@reed.edu jones/courses/141 Albyn

The Z-test: Continued

Note: the condition for rejection:

|Z | =

∣∣∣∣∣X − µ0

σ/√

n

∣∣∣∣∣ > 1.96

is equivalent to|X − µ0| > 1.96σ/

√n

In other words, reject if X is more than 1.96 SE’s from µ0

Albyn Jones Math 141

Page 8: Math 141 - Lecture 15: Z-tests and t-tests - jones/Courses/P15.pdf · PDF fileMath 141 Lecture 15: Z-tests and t-tests Albyn Jones1 1Library 304 jones@reed.edu jones/courses/141 Albyn

Choices for α

Here is a table with standard normal quantiles corresponding tocommon choices for α:

α Z1−α2

R

.10 1.64 qnorm(.95)

.05 1.96 qnorm(.975)

.01 2.58 qnorm(.995)

Albyn Jones Math 141

Page 9: Math 141 - Lecture 15: Z-tests and t-tests - jones/Courses/P15.pdf · PDF fileMath 141 Lecture 15: Z-tests and t-tests Albyn Jones1 1Library 304 jones@reed.edu jones/courses/141 Albyn

The Z-test: P-Values

It is easy to compute p-values for the Z-test: just twice the areapast the Z-score:

p = 2 ∗ pnorm(−abs(Z ))

Remember: the p-value is also the smallest α for which wewould reject with a size α rejection region!

SOP: compute the p-value and compare to your favorite α.

Albyn Jones Math 141

Page 10: Math 141 - Lecture 15: Z-tests and t-tests - jones/Courses/P15.pdf · PDF fileMath 141 Lecture 15: Z-tests and t-tests Albyn Jones1 1Library 304 jones@reed.edu jones/courses/141 Albyn

Confidence intervals

Since a 95% CI consists of the set of values for µ that we wouldnot reject at significance level .05, the CI has a simple form.Suppose X ∼ N(µ,1), and we observe X = 2

−2 0 2 4 6

0.0

0.1

0.2

0.3

0.4

x

y1

95% CI for Z=2: (2−1.96, 2+1.96)

L = .04 U = 3.96

Albyn Jones Math 141

Page 11: Math 141 - Lecture 15: Z-tests and t-tests - jones/Courses/P15.pdf · PDF fileMath 141 Lecture 15: Z-tests and t-tests Albyn Jones1 1Library 304 jones@reed.edu jones/courses/141 Albyn

Standard choices for CI’s

Here is a table with confidence intervals corresponding totypical choices for α:

Level Z1−α2

Interval

90% 1.64 X ± 1.64σ/√

n

95% 1.96 X ± 1.96σ/√

n

99% 2.58 X ± 2.58σ/√

n

In practice, you will often see people replace 1.96 with 2.

Albyn Jones Math 141

Page 12: Math 141 - Lecture 15: Z-tests and t-tests - jones/Courses/P15.pdf · PDF fileMath 141 Lecture 15: Z-tests and t-tests Albyn Jones1 1Library 304 jones@reed.edu jones/courses/141 Albyn

The Z-test for a proportion

Sample proportions are sample means, and thus approximatelynormally distributed. If we have n trials, each with probability pof success, then p̂ is approximately normally distributed withmean p and SD

√pq/n. Test H0 : p = p0 with

Z =p̂ − p0√

p0(1− p0)/n,

and reject at significance level α = .05 if |Z | > 1.96. Mostsources recommend using the observed p̂ in the SE:

Z =p̂ − p0√

p̂(1− p̂)/n

Albyn Jones Math 141

Page 13: Math 141 - Lecture 15: Z-tests and t-tests - jones/Courses/P15.pdf · PDF fileMath 141 Lecture 15: Z-tests and t-tests Albyn Jones1 1Library 304 jones@reed.edu jones/courses/141 Albyn

The CI for a proportion based on the NormalApproximation

The Z-based CI for a sample mean

X ± 1.96σ/√

n

becomes

p̂ ± 1.96

√p̂(1− p̂)

nThe margin of error published by pollsters usually uses 1/2 for

p̂ in the SE (the worst case):

p̂ ± 1.961

2√

n

Albyn Jones Math 141

Page 14: Math 141 - Lecture 15: Z-tests and t-tests - jones/Courses/P15.pdf · PDF fileMath 141 Lecture 15: Z-tests and t-tests Albyn Jones1 1Library 304 jones@reed.edu jones/courses/141 Albyn

Notes on testing proportions

The primary reason to use the Z-test for a sample proportion isease of computation, you can often do the arithmetic mentally,or on any calculator.

For any serious purpose, especially with small to moderatesample sizes, there are more precise tests available. In R,binom.test or prop.test are better options!

Albyn Jones Math 141

Page 15: Math 141 - Lecture 15: Z-tests and t-tests - jones/Courses/P15.pdf · PDF fileMath 141 Lecture 15: Z-tests and t-tests Albyn Jones1 1Library 304 jones@reed.edu jones/courses/141 Albyn

The t-test

The most obvious difference between the t-test and the Z-test iswhether we know the SE or not:

Z =X − µ0

σ/√

n

tn−1 =X − µ0

s/√

n.

The subscript (n − 1) is the degrees of freedom and s is theestimated SD:

s2 =

∑(Xi − X )2

n − 1

Albyn Jones Math 141

Page 16: Math 141 - Lecture 15: Z-tests and t-tests - jones/Courses/P15.pdf · PDF fileMath 141 Lecture 15: Z-tests and t-tests Albyn Jones1 1Library 304 jones@reed.edu jones/courses/141 Albyn

The t-test: Conditions

The t statistic tn−1 will have an exact t distribution if the dataX1,X2, . . . ,Xn are IID normally distributed RV’s.

We use the t-test when we have at least approximatelynormally distributed data, and we estimated the SE using thesample SD.

If the sample size n is large, the t and Z distributions areindistinguishable.

In older texts you will see the t-test referred to as a smallsample test.

Albyn Jones Math 141

Page 17: Math 141 - Lecture 15: Z-tests and t-tests - jones/Courses/P15.pdf · PDF fileMath 141 Lecture 15: Z-tests and t-tests Albyn Jones1 1Library 304 jones@reed.edu jones/courses/141 Albyn

The t distribution

−4 −2 0 2 4

0.0

0.1

0.2

0.3

0.4

Z

dens

ity

Normalt_1t_2t_3t_10

Normal Density and t densities

Albyn Jones Math 141

Page 18: Math 141 - Lecture 15: Z-tests and t-tests - jones/Courses/P15.pdf · PDF fileMath 141 Lecture 15: Z-tests and t-tests Albyn Jones1 1Library 304 jones@reed.edu jones/courses/141 Albyn

History

In the 19th century, the use of the normal approximation wasstandard, though usually to construct confidence intervals.

William Gossett, brewmaster at Guinness Brewery realized thatwith small samples (n = 2, 3, 4), the Z statistics seemed to bemuch less reliable than theory predicted. He managed to guessthe t distribution by doing simulations.

He published under the pseudonym Student, hence thecommon nomenclature Student’s t statistic.

Albyn Jones Math 141

Page 19: Math 141 - Lecture 15: Z-tests and t-tests - jones/Courses/P15.pdf · PDF fileMath 141 Lecture 15: Z-tests and t-tests Albyn Jones1 1Library 304 jones@reed.edu jones/courses/141 Albyn

Confidence Intervals

The Z-quantile based CI for a sample mean is

X ± qnorm(1− α

2)σ√n

The corresponding t-quantile based CI is

X ± qt(1− α

2, n − 1)

s√n

For moderate sample sizes (∼ 50), this will be approximately

X ± 2s/√

n

Albyn Jones Math 141

Page 20: Math 141 - Lecture 15: Z-tests and t-tests - jones/Courses/P15.pdf · PDF fileMath 141 Lecture 15: Z-tests and t-tests Albyn Jones1 1Library 304 jones@reed.edu jones/courses/141 Albyn

t quantiles for a 95% CI

df qt(.975,df)

2 4.30

5 2.57

30 2.04

50 2.01

100 1.98

Albyn Jones Math 141

Page 21: Math 141 - Lecture 15: Z-tests and t-tests - jones/Courses/P15.pdf · PDF fileMath 141 Lecture 15: Z-tests and t-tests Albyn Jones1 1Library 304 jones@reed.edu jones/courses/141 Albyn

R functions: t.test()

t.test(x, y = NULL,alternative = c("two.sided", "less", "greater"),mu = 0, paired = FALSE, var.equal = FALSE,conf.level = 0.95, ...)

Albyn Jones Math 141

Page 22: Math 141 - Lecture 15: Z-tests and t-tests - jones/Courses/P15.pdf · PDF fileMath 141 Lecture 15: Z-tests and t-tests Albyn Jones1 1Library 304 jones@reed.edu jones/courses/141 Albyn

Example: Darwin’s Zea Mays data

Charles Darwin’s data on the differences of heights of pairs ofplants, (cross-pollinated minus self-pollinated), in eighths of aninch.

49,−67,8,16,6,23,28,41,14,29,56,24,75,60,−48

Albyn Jones Math 141

Page 23: Math 141 - Lecture 15: Z-tests and t-tests - jones/Courses/P15.pdf · PDF fileMath 141 Lecture 15: Z-tests and t-tests Albyn Jones1 1Library 304 jones@reed.edu jones/courses/141 Albyn

t.test()

Diffs <- c(49, -67, 8, 16, 6, 23, 28, 41,14, 29, 56, 24, 75, 60, -48)

t.test(Diffs)-----------------------------------------t = 2.148, df = 14, p-value = 0.0497alternative hypothesis:

true mean is not equal to 095 percent confidence interval:0.03119332 41.83547335

sample estimates:mean of x20.93333

Albyn Jones Math 141

Page 24: Math 141 - Lecture 15: Z-tests and t-tests - jones/Courses/P15.pdf · PDF fileMath 141 Lecture 15: Z-tests and t-tests Albyn Jones1 1Library 304 jones@reed.edu jones/courses/141 Albyn

Alternative: the Sign Test

> binom.test(sum(Diffs > 0),length(Diffs))

number of successes = 13, number of trials = 15,p-value = 0.007385

alternative hypothesis: true probability of successis not equal to 0.5

95 percent confidence interval:0.5953973 0.9834241

Albyn Jones Math 141

Page 25: Math 141 - Lecture 15: Z-tests and t-tests - jones/Courses/P15.pdf · PDF fileMath 141 Lecture 15: Z-tests and t-tests Albyn Jones1 1Library 304 jones@reed.edu jones/courses/141 Albyn

Question

If the data are normally distributed, the t-test is exact, and canbe shown to be the most powerful test.

Why is the p-value for the sign test (0.007) so much smallerthan for the t-test (0.049)?

Albyn Jones Math 141

Page 26: Math 141 - Lecture 15: Z-tests and t-tests - jones/Courses/P15.pdf · PDF fileMath 141 Lecture 15: Z-tests and t-tests Albyn Jones1 1Library 304 jones@reed.edu jones/courses/141 Albyn

Answer

−1 0 1

−60

−40

−20

020

4060

80

Darwin's Data

Theoretical Quantiles

Sam

ple

Qua

ntile

s

Albyn Jones Math 141

Page 27: Math 141 - Lecture 15: Z-tests and t-tests - jones/Courses/P15.pdf · PDF fileMath 141 Lecture 15: Z-tests and t-tests Albyn Jones1 1Library 304 jones@reed.edu jones/courses/141 Albyn

R functions

function R

density dt(x,df)

CDF: P(T ≤ x) pt(x,df)

quantiles qt(p,df)

random numbers rt(n,df)

p-value 2*pt(-abs(t),df)

Albyn Jones Math 141

Page 28: Math 141 - Lecture 15: Z-tests and t-tests - jones/Courses/P15.pdf · PDF fileMath 141 Lecture 15: Z-tests and t-tests Albyn Jones1 1Library 304 jones@reed.edu jones/courses/141 Albyn

Summary

Most often you will see the t-test used: we usually mustestimate the variance of the data!The t-test is accurate with small samples if the data areapproximately normally distributed! If the sample size islarge, the Z and t tests and CI’s agree.Always look at the 95% CI!

Albyn Jones Math 141