Lecture 10: t-Test - Oxford Statisticsmassa/Lecture 10.pdf · Paired t-Test: Discussion I...

37
Lecture 10: t-Test S. Massa, Department of Statistics, University of Oxford 20 January 2016

Transcript of Lecture 10: t-Test - Oxford Statisticsmassa/Lecture 10.pdf · Paired t-Test: Discussion I...

Page 1: Lecture 10: t-Test - Oxford Statisticsmassa/Lecture 10.pdf · Paired t-Test: Discussion I Essentially we compared the sample means of two samples. I Our goal was to understand if

Lecture 10: t-Test

S. Massa, Department of Statistics, University of Oxford

20 January 2016

Page 2: Lecture 10: t-Test - Oxford Statisticsmassa/Lecture 10.pdf · Paired t-Test: Discussion I Essentially we compared the sample means of two samples. I Our goal was to understand if

Motivation

In lecture 7 the heights of 198 British men were analysed.

X1, . . . , Xn ∼ N(µ, σ2).

From the data

x̄ = 1732mm s = 68.8mm.

We want to construct a confidence interval for the true averageheight µ.Recall the z-statistic computed from the sample

Z :=X̄ − µσ/√n∼ N(0, 1),

If you pay close attention then you may notice that we do notknow the true population variance σ.

Page 3: Lecture 10: t-Test - Oxford Statisticsmassa/Lecture 10.pdf · Paired t-Test: Discussion I Essentially we compared the sample means of two samples. I Our goal was to understand if

Motivation

We do not know the population variance σ,we simply replaced it with the sample variance s.

Notice that the sample variance of a sample X1, · · · , Xnis givenby

S :=

√√√√ 1

n− 1

n∑i=1

(Xi − X̄)2.

This is only an estimate of σ: due to chance variation itsometimes is too low or too high.

So the quantity that we computed for Z, is not really Z butsomething else

T :=X̄ − µS/√n.

Page 4: Lecture 10: t-Test - Oxford Statisticsmassa/Lecture 10.pdf · Paired t-Test: Discussion I Essentially we compared the sample means of two samples. I Our goal was to understand if

Motivation

Z :=X̄ − µσ/√n, T :=

X̄ − µS/√n.

I On average the two are not too different.

I But the numerator of T is a random variable: the distributionof T will be more dispersed than that of Z.

I This implies thatyou underestimate probabilities of extreme observations, orthatyou reject a true null hypothesis more often than dictated bythe confidence level.

I As a result any confidence intervals you compute will be toonarrow.

Page 5: Lecture 10: t-Test - Oxford Statisticsmassa/Lecture 10.pdf · Paired t-Test: Discussion I Essentially we compared the sample means of two samples. I Our goal was to understand if

Motivation

T :=X̄ − µS/√n.

I So if we don’t know the population variance σ, we cannotcompute Z but instead we compute T .

I Is this a problem?

I T is also computed from the data, so it is a statistic.

I But for T to be an useful statistic, we need to know itsdistribution.

Page 6: Lecture 10: t-Test - Oxford Statisticsmassa/Lecture 10.pdf · Paired t-Test: Discussion I Essentially we compared the sample means of two samples. I Our goal was to understand if

Student’s t-Distribution

I William Gossett computed the distribution of the t-statisticwhile working for the Guiness brewery, trying to choose thebest yielding barley variety—he was concerned with smallsample sizes.

I He published it under the pseudonym Student, as it wasdeemed confidential information by the brewery.

I The t-distribution has a single parameter called the number ofdegrees of freedom—this is equal to the sample size minus 1.

I For large samples, typically more than 50, the sample varianceis very accurate.

I In this situation, for all practical reasons, the t-statisticbehaves identically to the z-statistic.

Assumption: X1, . . . , Xn are independent samples from N(µ, σ2)and hence X̄ is approximately normal.

Page 7: Lecture 10: t-Test - Oxford Statisticsmassa/Lecture 10.pdf · Paired t-Test: Discussion I Essentially we compared the sample means of two samples. I Our goal was to understand if

Probability density function

Figure: Probability density functions of t-distributions with 1, 2, 5, 10and 50 degrees of freedom and standard normal.

Page 8: Lecture 10: t-Test - Oxford Statisticsmassa/Lecture 10.pdf · Paired t-Test: Discussion I Essentially we compared the sample means of two samples. I Our goal was to understand if

Table of t

Table: Critical values at the 95% level, of the t-statistic. For very largenumber of degrees of freedom, they are identical to the Z-statistic.

degrees of criticalfreedom value

1 6.312 2.924 2.13

10 1.8150 1.68∞ 1.64

(a) one-tailed test

degrees of criticalfreedom value

1 12.72 4.304 2.78

10 2.2350 2.01∞ 1.96

(b) two-tailed test

Do the bottom values of the table look familiar?

Page 9: Lecture 10: t-Test - Oxford Statisticsmassa/Lecture 10.pdf · Paired t-Test: Discussion I Essentially we compared the sample means of two samples. I Our goal was to understand if

Confidence Intervals: One-Sample

I Suppose you have observations: x1, . . . , xn from a Normaldistribution N(µ, σ2).

I Both µ and σ2 are unknown and you want to compute aconfidence interval for µ.

I Look up the t-table and find the appropriate number ofdegrees of freedom.

I For two-sided confidence interval t is such thatP (T < t) = .975.

I The confidence interval is then given by:

(x̄− t× s√n, x̄+ t× s√

n).

Page 10: Lecture 10: t-Test - Oxford Statisticsmassa/Lecture 10.pdf · Paired t-Test: Discussion I Essentially we compared the sample means of two samples. I Our goal was to understand if

Example: Heights of British Men

I For the heights of British men example we used thedistribution of the Z-statistic.

I We should have used the t-distribution with 198-1=197degrees of freedom.

I Due to the large sample size, the critical values are almostidentical to those of the normal distribution, so the error inour confidence interval was not very large.

Page 11: Lecture 10: t-Test - Oxford Statisticsmassa/Lecture 10.pdf · Paired t-Test: Discussion I Essentially we compared the sample means of two samples. I Our goal was to understand if

Example: Kidney Dialysis I

Example 1

The level of phosphate, in mg/dl in the blood of a patientundergoing dialysis treatment was measured on six consecutivevisits.

5.6, 5.1, 4.6, 4.8, 5.7, 6.4.

Construct a symmetric 99% confidence interval.

The sample size is n = 6. We can compute the sample mean andsample variance as follows:

x̄ =1

6(5.6 + 5.1 + 4.6 + 4.8 + 5.7 + 6.4) = 5.4mg/dl

s2 =1

5(5.6− 5.4)2 + (5.1− 5.4)2 + (4.6− 5.4)2

+ (4.8− 5.4)2 + (5.7− 5.4)2 + (6.4− 5.4)2

= (0.67mg/dl)2.

Page 12: Lecture 10: t-Test - Oxford Statisticsmassa/Lecture 10.pdf · Paired t-Test: Discussion I Essentially we compared the sample means of two samples. I Our goal was to understand if

Example: Kidney Dialysis II

The number of degrees of freedom is n− 1 = 5.Thus, the symmetric confidence interval will be(

5.4− t0.67√6, 5.4 + t

0.67√6

)mg/dl,

where t is chosen so that the T variable with 5 degrees of freedomhas probability 0.01 of being bigger than t.

Page 13: Lecture 10: t-Test - Oxford Statisticsmassa/Lecture 10.pdf · Paired t-Test: Discussion I Essentially we compared the sample means of two samples. I Our goal was to understand if

Using the t-Table

I We want to find t sothat P (−t ≤ T ≤t) = 0.99

I We look at the officialtable and locate therow corresponding to5 degrees of freedom.

I The critical value is inthe final column andis t = 4.03

Page 14: Lecture 10: t-Test - Oxford Statisticsmassa/Lecture 10.pdf · Paired t-Test: Discussion I Essentially we compared the sample means of two samples. I Our goal was to understand if

Example: Kidney Dialysis I

I Having identified the critical value to be 4.03 we compute theconfidence interval as

(4.3 mg/dl, 6.5 mg/dl).

I If we repeat the experiment 100 times then the true meanwould be in this interval 99 times.

I If we had used the Z-distribution, the critical value wouldhave been 2.6 resulting in the much narrower interval

(4.7 mg/dl, 6.1 mg/dl).

I Here you can immediately see that in small samples thedifference between Z and T is significant.

Page 15: Lecture 10: t-Test - Oxford Statisticsmassa/Lecture 10.pdf · Paired t-Test: Discussion I Essentially we compared the sample means of two samples. I Our goal was to understand if

Hypothesis Testing I

Continuing with the kidney dialysis example, suppose that 4 mg/dlis a dangerously low phosphate level.We want to test this hypothesis at the 0.99 significance level.

Two-sided test

H0 : µ = 4.0 mg/dl,H1 : µ 6= 4.0 mg/dl,

Under the null hypothesis µ = 4.0, and therefore

T :=X̄ − 4.0

S/√n∼ t5, (under the null hypothesis)

has the t-distribution with n− 1 = 5 degrees of freedom.

Page 16: Lecture 10: t-Test - Oxford Statisticsmassa/Lecture 10.pdf · Paired t-Test: Discussion I Essentially we compared the sample means of two samples. I Our goal was to understand if

Hypothesis Testing II

We compute

tobs =x̄− 4.0

s/√n

=5.4− 4.0

0.67/√

6= 5.12.

At the 99% level, we already know the teoretical critical value tobe 4.03.Since the observed value is tobs = 5.12 > 4.03 we reject the nullhypothesis.

Page 17: Lecture 10: t-Test - Oxford Statisticsmassa/Lecture 10.pdf · Paired t-Test: Discussion I Essentially we compared the sample means of two samples. I Our goal was to understand if

Hypothesis Testing III

One-sided test

H0 : µ = 4.0 mg/dl,H1 : µ > 4.0 mg/dl,

In this case we look for the one-sided critical value.The table gives critical values for the two-sided test, so we needthe value of t such that P (−t < T < t) = 1− 2× 1% = 0.98.

The teoretical value is t = 3.36. From the structure of ouralternative hypothesis, we reject the null if the observed value tobsis more than 3.36.With the one-sided test we would have been more likely to rejectthe null hypothesis.

Page 18: Lecture 10: t-Test - Oxford Statisticsmassa/Lecture 10.pdf · Paired t-Test: Discussion I Essentially we compared the sample means of two samples. I Our goal was to understand if

Z or T?For testing and constructing a confidence interval for the truemean of a population:

know σ?

estimatinga pro-

portion?

largesample?

Use Z Use T

no no

no

yes

yesyes

Figure: Flowchart for deciding whether to use Z or T .

Page 19: Lecture 10: t-Test - Oxford Statisticsmassa/Lecture 10.pdf · Paired t-Test: Discussion I Essentially we compared the sample means of two samples. I Our goal was to understand if

Paired t-Test

The percentage of aggregated blood platelets in the the blood of11 individuals were measured before and after they smoked acigarette.

Before After Difference

25 27 225 29 427 37 1044 56 1230 46 1667 82 1553 57 453 80 2752 61 960 59 -128 43 15

It seems there was more clotting after the cigarette. Can we testthis hypothesis?

Page 20: Lecture 10: t-Test - Oxford Statisticsmassa/Lecture 10.pdf · Paired t-Test: Discussion I Essentially we compared the sample means of two samples. I Our goal was to understand if

Paired t-test: Standard Error

Suppose that a random individual has a normally distributed“Before” score Xi.Smoking a cigarette adds an independent random, normallydistributed effect Di.We want to know if Di tends to be positive on average. So wewant to compare the sample mean of the Dis to their standarderror.Notice that the standard error we should use is that of the Di’snot of the Xi’s.That would be much greater since

var(X +D) = var(X) + var(D) > var(D).

Page 21: Lecture 10: t-Test - Oxford Statisticsmassa/Lecture 10.pdf · Paired t-Test: Discussion I Essentially we compared the sample means of two samples. I Our goal was to understand if

Paired sample t-test: Standard Error

H0 : µd = 0H1 : µd > 0

Sample mean and sample standard deviation of the differences are

d̄ = 10.3, sd = 7.98.

The observed value of the T -statistic is

tobs =10.3− 0

7.98/√

11= 4.28.

We are doing a one-sided test.

We want P (T > t) = .05. Since the table is giving us t so thatP (|T | > t) = α, we want the t that corresponds to 2× 0.05 = 0.1.

Page 22: Lecture 10: t-Test - Oxford Statisticsmassa/Lecture 10.pdf · Paired t-Test: Discussion I Essentially we compared the sample means of two samples. I Our goal was to understand if

Paired t-Test

Now there are11 measurements, so theare 10 degrees of freedom.So look at first column.

Theteoretical value is 1.81.The observed value is4.28 (> 1.81) therefore wereject the null hypothesis.

Conclusion:There is a significantincrease in blood clottingafter smoking a cigarette.

Page 23: Lecture 10: t-Test - Oxford Statisticsmassa/Lecture 10.pdf · Paired t-Test: Discussion I Essentially we compared the sample means of two samples. I Our goal was to understand if

The critical region

Page 24: Lecture 10: t-Test - Oxford Statisticsmassa/Lecture 10.pdf · Paired t-Test: Discussion I Essentially we compared the sample means of two samples. I Our goal was to understand if

Paired t-Test: Discussion

I Essentially we compared the sample means of two samples.

I Our goal was to understand if the true mean of the firstsample was greater than the true mean of the second.

I In the next lecture we will see more about comparing themeans and distributions of two samples.

I In the paired test: the data is structured in pairs.

I This will not always be the case.

I We will see that this experiment design results in moreeffective hypothesis testing (power of a test).

Page 25: Lecture 10: t-Test - Oxford Statisticsmassa/Lecture 10.pdf · Paired t-Test: Discussion I Essentially we compared the sample means of two samples. I Our goal was to understand if

Does Population Size Matter?

I In our analysis of the heights of 198 married men we ignoredthe population sizeWhat if these 198 men were all the men in the UK?

I In that case there would be no sampling error at all.

I What if the total population were 300 or 400? There shouldbe less error than if there were 20 million.

I Indeed the sampling error does depend on the size of thepopulation, but this effect decays very quickly as thepopulation grows.

Page 26: Lecture 10: t-Test - Oxford Statisticsmassa/Lecture 10.pdf · Paired t-Test: Discussion I Essentially we compared the sample means of two samples. I Our goal was to understand if

Correction Factor

I Suppose that we sample n from a population of N .

I So far we have used the following formula for the standarderror:

SE = var(X̄) =σ√n.

I This is based on the premise that we are sampling from aninfinite population.

I Usually sampling is performed from a finite population andwithout replacement.

I In this case, if a significant proportion of the population > 5%is sampled we need to use the correction factor

correction factor =

√N − nN − 1

,

where N is the population size and n the sample size.

Page 27: Lecture 10: t-Test - Oxford Statisticsmassa/Lecture 10.pdf · Paired t-Test: Discussion I Essentially we compared the sample means of two samples. I Our goal was to understand if

Correction Factor

So the standard error becomes

SE =σ√n

√N − nN − 1

.

Let’s have a look at how the correction factor behaves with a fixedsample size of 200.

N 200 300 1000 10000

Corr. Factor 0 0.5783 0.8949 0.9899

Page 28: Lecture 10: t-Test - Oxford Statisticsmassa/Lecture 10.pdf · Paired t-Test: Discussion I Essentially we compared the sample means of two samples. I Our goal was to understand if

Confidence Intervals from Small Populations

I How does this affect our confidence intervals?

I In the heights of British men example, we calculated

x̄ = 1732m.m. s = 68.8m.m..

I Assuming a large population we computed the confidenceinterval as

1732± 1.96× 68.8√198

=(

1722, 1742)

mm

I Let’s see how this changes if we now assume that these weresampled from a village with total population 300

1732± 1.96× 68.8√198

√102

299= (1726, 1738)mm,

a narrower confidence interval.

Page 29: Lecture 10: t-Test - Oxford Statisticsmassa/Lecture 10.pdf · Paired t-Test: Discussion I Essentially we compared the sample means of two samples. I Our goal was to understand if

Measurement Bias vs Random error

I All measurements are prone to error.I We can split this error to two types

1. measurement bias, or systematic error, and2. random error.

Definition 2 (Random error)

This is the part of the error that is due to chance fluctuations.Random error is on average zero, and as the sample size increasesits contribution vanishes.

Page 30: Lecture 10: t-Test - Oxford Statisticsmassa/Lecture 10.pdf · Paired t-Test: Discussion I Essentially we compared the sample means of two samples. I Our goal was to understand if

Measurement Bias vs Random error

Example 3

Suppose that you are measuring length with a metal stick. Thestick’s length fluctuates due to changes in temperature. But thisshould on average cancel out, as sometimes the temperature willbe too high, and sometimes too low.Also purely by chance, sometimes your measurements will be toohigh, and sometimes too low.

Page 31: Lecture 10: t-Test - Oxford Statisticsmassa/Lecture 10.pdf · Paired t-Test: Discussion I Essentially we compared the sample means of two samples. I Our goal was to understand if

Definition 4 (Bias)

Bias, or systematic error, is not due to chance alone, but ratherdue to the measuring procedure itself.Bias cannot be removed by increasing the sample size, but only byimproving the measurement procedure.

Example 5

Example: suppose that you are measuring length with a stick whichis 99cm long, but you think it’s 1m long. All your measurementswill be off by 1cm /m before adding random variation to it.

Although random errors ”tend” to be normally distributed, and arefairly well understood, bias is application specific.

Page 32: Lecture 10: t-Test - Oxford Statisticsmassa/Lecture 10.pdf · Paired t-Test: Discussion I Essentially we compared the sample means of two samples. I Our goal was to understand if

Bias in surveys I

Statisticians often use surveys to gather data about the generalpopulation. These suffer from the following types of bias:

Selection bias: You are not sampling from a truly random subsetof the population.How do you pick a random sample of 1000 peoplefrom the 64 million people in the UK?

Ascertainment bias: You are not sampling from a representativesubset of the population.In medical sciences, subjects can not be identified ifthey are not diagnosed. Sometimes mild cases ofdiseases are not diagnosed and therefore aresystematically under-represented in surveys.

Page 33: Lecture 10: t-Test - Oxford Statisticsmassa/Lecture 10.pdf · Paired t-Test: Discussion I Essentially we compared the sample means of two samples. I Our goal was to understand if

Bias in surveys II

Non-response bias: The subjects who choose to respond to thesurvey may not represent the general population.Example: when polling about some type of illegalbehaviour, people engaging in it could refrain fromresponding.

Response bias: Subjects may choose to give answers they perceiveas more acceptable.Example: how many racists will declare themselves soin a poll?

Page 34: Lecture 10: t-Test - Oxford Statisticsmassa/Lecture 10.pdf · Paired t-Test: Discussion I Essentially we compared the sample means of two samples. I Our goal was to understand if

Recap

I When estimating a population mean, if we don’t know σ andestimate it using S we use the t-statistic,

T =X̄ − µS/√n.

I This has the t-distribution: a family of distributionsparameterised by number of degrees of freedom

d.f. = n− 1, n is the sample size.

I Use t-table for (1− α)× 100% confidence intervals

x̄± t1−α/2s√n, where P(|T | < t1−α/2) = 1− α.

Page 35: Lecture 10: t-Test - Oxford Statisticsmassa/Lecture 10.pdf · Paired t-Test: Discussion I Essentially we compared the sample means of two samples. I Our goal was to understand if

Recap: Z or T?For testing and constructing a confidence interval for the truemean of a population:

know σ?

estimatinga pro-

portion?

largesample?

Use Z Use T

no no

no

yes

yesyes

Figure: Flowchart for deciding whether to use Z or T .

Page 36: Lecture 10: t-Test - Oxford Statisticsmassa/Lecture 10.pdf · Paired t-Test: Discussion I Essentially we compared the sample means of two samples. I Our goal was to understand if

Recap

I When estimating a population mean, if we don’t know σ andestimate it using S we use the t-statistic,

T =X̄ − µS/√n.

I This has the t-distribution: a family of distributionsparameterised by number of degrees of freedom

d.f. = n− 1, n is the sample size.

I Use t-table for (1− α)× 100% confidence intervals

x̄± t1−α/2s√n, where P(|T | < t1−α/2) = 1− α.

Page 37: Lecture 10: t-Test - Oxford Statisticsmassa/Lecture 10.pdf · Paired t-Test: Discussion I Essentially we compared the sample means of two samples. I Our goal was to understand if

Recap

I In the paired t-test, the data are naturally structured in pairs.

I Bias is the systematic error in our measurements that cannotbe removed without changing the procedure.

I Surveys suffer from :I Selection/Ascertainment bias;I Non-response bias;I Response bias.