Math 141 - Lecture 15: Z-tests and t-tests - jones/Courses/P15.pdf · PDF fileMath 141...
Transcript of Math 141 - Lecture 15: Z-tests and t-tests - jones/Courses/P15.pdf · PDF fileMath 141...
Math 141Lecture 15: Z-tests and t-tests
Albyn Jones1
1Library [email protected]
www.people.reed.edu/∼jones/courses/141
Albyn Jones Math 141
The Z-test
Suppose we have a single observation from a N(µ,1)distribution. We reject H0 : µ = 0 if we get the relatively rareoutcomes in the tails: for α = .05, if |Z | > 1.96.
−4 −2 0 2 4
0.0
0.1
0.2
0.3
0.4
X
Den
sity
alpha = 0.05
qnorm(.975) = 1.96qnorm(.025) = −1.96
Albyn Jones Math 141
The Z-test for X
Suppose that X1,X2, . . . ,Xn are IID RV’s with mean µ and SDσ. To test H0 : µ = µ0, reason as follows:
If H0 were true, X would have approximately a N(µ0, σ2/n)
distribution.Therefore, if we standardize, we have approximately astandard normal RV:
Z =X − µ0
σ/√
n
Reject H0 : µ = µ0 at α = .05 if |Z | > 1.96
Albyn Jones Math 141
The Z-test for X
Suppose that X1,X2, . . . ,Xn are IID RV’s with mean µ and SDσ. To test H0 : µ = µ0, reason as follows:
If H0 were true, X would have approximately a N(µ0, σ2/n)
distribution.
Therefore, if we standardize, we have approximately astandard normal RV:
Z =X − µ0
σ/√
n
Reject H0 : µ = µ0 at α = .05 if |Z | > 1.96
Albyn Jones Math 141
The Z-test for X
Suppose that X1,X2, . . . ,Xn are IID RV’s with mean µ and SDσ. To test H0 : µ = µ0, reason as follows:
If H0 were true, X would have approximately a N(µ0, σ2/n)
distribution.Therefore, if we standardize, we have approximately astandard normal RV:
Z =X − µ0
σ/√
n
Reject H0 : µ = µ0 at α = .05 if |Z | > 1.96
Albyn Jones Math 141
The Z-test for X
Suppose that X1,X2, . . . ,Xn are IID RV’s with mean µ and SDσ. To test H0 : µ = µ0, reason as follows:
If H0 were true, X would have approximately a N(µ0, σ2/n)
distribution.Therefore, if we standardize, we have approximately astandard normal RV:
Z =X − µ0
σ/√
n
Reject H0 : µ = µ0 at α = .05 if |Z | > 1.96
Albyn Jones Math 141
The Z-test: Continued
Note: the condition for rejection:
|Z | =
∣∣∣∣∣X − µ0
σ/√
n
∣∣∣∣∣ > 1.96
is equivalent to|X − µ0| > 1.96σ/
√n
In other words, reject if X is more than 1.96 SE’s from µ0
Albyn Jones Math 141
Choices for α
Here is a table with standard normal quantiles corresponding tocommon choices for α:
α Z1−α2
R
.10 1.64 qnorm(.95)
.05 1.96 qnorm(.975)
.01 2.58 qnorm(.995)
Albyn Jones Math 141
The Z-test: P-Values
It is easy to compute p-values for the Z-test: just twice the areapast the Z-score:
p = 2 ∗ pnorm(−abs(Z ))
Remember: the p-value is also the smallest α for which wewould reject with a size α rejection region!
SOP: compute the p-value and compare to your favorite α.
Albyn Jones Math 141
Confidence intervals
Since a 95% CI consists of the set of values for µ that we wouldnot reject at significance level .05, the CI has a simple form.Suppose X ∼ N(µ,1), and we observe X = 2
−2 0 2 4 6
0.0
0.1
0.2
0.3
0.4
x
y1
95% CI for Z=2: (2−1.96, 2+1.96)
L = .04 U = 3.96
Albyn Jones Math 141
Standard choices for CI’s
Here is a table with confidence intervals corresponding totypical choices for α:
Level Z1−α2
Interval
90% 1.64 X ± 1.64σ/√
n
95% 1.96 X ± 1.96σ/√
n
99% 2.58 X ± 2.58σ/√
n
In practice, you will often see people replace 1.96 with 2.
Albyn Jones Math 141
The Z-test for a proportion
Sample proportions are sample means, and thus approximatelynormally distributed. If we have n trials, each with probability pof success, then p̂ is approximately normally distributed withmean p and SD
√pq/n. Test H0 : p = p0 with
Z =p̂ − p0√
p0(1− p0)/n,
and reject at significance level α = .05 if |Z | > 1.96. Mostsources recommend using the observed p̂ in the SE:
Z =p̂ − p0√
p̂(1− p̂)/n
Albyn Jones Math 141
The CI for a proportion based on the NormalApproximation
The Z-based CI for a sample mean
X ± 1.96σ/√
n
becomes
p̂ ± 1.96
√p̂(1− p̂)
nThe margin of error published by pollsters usually uses 1/2 for
p̂ in the SE (the worst case):
p̂ ± 1.961
2√
n
Albyn Jones Math 141
Notes on testing proportions
The primary reason to use the Z-test for a sample proportion isease of computation, you can often do the arithmetic mentally,or on any calculator.
For any serious purpose, especially with small to moderatesample sizes, there are more precise tests available. In R,binom.test or prop.test are better options!
Albyn Jones Math 141
The t-test
The most obvious difference between the t-test and the Z-test iswhether we know the SE or not:
Z =X − µ0
σ/√
n
tn−1 =X − µ0
s/√
n.
The subscript (n − 1) is the degrees of freedom and s is theestimated SD:
s2 =
∑(Xi − X )2
n − 1
Albyn Jones Math 141
The t-test: Conditions
The t statistic tn−1 will have an exact t distribution if the dataX1,X2, . . . ,Xn are IID normally distributed RV’s.
We use the t-test when we have at least approximatelynormally distributed data, and we estimated the SE using thesample SD.
If the sample size n is large, the t and Z distributions areindistinguishable.
In older texts you will see the t-test referred to as a smallsample test.
Albyn Jones Math 141
The t distribution
−4 −2 0 2 4
0.0
0.1
0.2
0.3
0.4
Z
dens
ity
Normalt_1t_2t_3t_10
Normal Density and t densities
Albyn Jones Math 141
History
In the 19th century, the use of the normal approximation wasstandard, though usually to construct confidence intervals.
William Gossett, brewmaster at Guinness Brewery realized thatwith small samples (n = 2, 3, 4), the Z statistics seemed to bemuch less reliable than theory predicted. He managed to guessthe t distribution by doing simulations.
He published under the pseudonym Student, hence thecommon nomenclature Student’s t statistic.
Albyn Jones Math 141
Confidence Intervals
The Z-quantile based CI for a sample mean is
X ± qnorm(1− α
2)σ√n
The corresponding t-quantile based CI is
X ± qt(1− α
2, n − 1)
s√n
For moderate sample sizes (∼ 50), this will be approximately
X ± 2s/√
n
Albyn Jones Math 141
t quantiles for a 95% CI
df qt(.975,df)
2 4.30
5 2.57
30 2.04
50 2.01
100 1.98
Albyn Jones Math 141
R functions: t.test()
t.test(x, y = NULL,alternative = c("two.sided", "less", "greater"),mu = 0, paired = FALSE, var.equal = FALSE,conf.level = 0.95, ...)
Albyn Jones Math 141
Example: Darwin’s Zea Mays data
Charles Darwin’s data on the differences of heights of pairs ofplants, (cross-pollinated minus self-pollinated), in eighths of aninch.
49,−67,8,16,6,23,28,41,14,29,56,24,75,60,−48
Albyn Jones Math 141
t.test()
Diffs <- c(49, -67, 8, 16, 6, 23, 28, 41,14, 29, 56, 24, 75, 60, -48)
t.test(Diffs)-----------------------------------------t = 2.148, df = 14, p-value = 0.0497alternative hypothesis:
true mean is not equal to 095 percent confidence interval:0.03119332 41.83547335
sample estimates:mean of x20.93333
Albyn Jones Math 141
Alternative: the Sign Test
> binom.test(sum(Diffs > 0),length(Diffs))
number of successes = 13, number of trials = 15,p-value = 0.007385
alternative hypothesis: true probability of successis not equal to 0.5
95 percent confidence interval:0.5953973 0.9834241
Albyn Jones Math 141
Question
If the data are normally distributed, the t-test is exact, and canbe shown to be the most powerful test.
Why is the p-value for the sign test (0.007) so much smallerthan for the t-test (0.049)?
Albyn Jones Math 141
Answer
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
−1 0 1
−60
−40
−20
020
4060
80
Darwin's Data
Theoretical Quantiles
Sam
ple
Qua
ntile
s
Albyn Jones Math 141
R functions
function R
density dt(x,df)
CDF: P(T ≤ x) pt(x,df)
quantiles qt(p,df)
random numbers rt(n,df)
p-value 2*pt(-abs(t),df)
Albyn Jones Math 141
Summary
Most often you will see the t-test used: we usually mustestimate the variance of the data!The t-test is accurate with small samples if the data areapproximately normally distributed! If the sample size islarge, the Z and t tests and CI’s agree.Always look at the 95% CI!
Albyn Jones Math 141