Test of Significance Presenter: Shib Sekhar Datta Moderator: M S Bharambe.

64
Test of Significance Presenter: Shib Sekhar Datta Moderator: M S Bharambe

Transcript of Test of Significance Presenter: Shib Sekhar Datta Moderator: M S Bharambe.

Page 1: Test of Significance Presenter: Shib Sekhar Datta Moderator: M S Bharambe.

Test of Significance

Presenter: Shib Sekhar Datta

Moderator: M S Bharambe

Page 2: Test of Significance Presenter: Shib Sekhar Datta Moderator: M S Bharambe.

Framework of presentation• Introduction to test of significance• A• B• C• D• E• F• G• I• J• K• L• References

Page 3: Test of Significance Presenter: Shib Sekhar Datta Moderator: M S Bharambe.

Normal curveBecause a two-sided test is symmetrical, you can also use a confidence interval to test a two-sided hypothesis.

α /2 α /2

In a two-sided test,

C = 1 – α.

C confidence level

α significance level

Page 4: Test of Significance Presenter: Shib Sekhar Datta Moderator: M S Bharambe.

Tests of Significance

Is it due to chance, or something else?

This kind of questions can be answered using tests of significance.

Since many journal articles use such tests, it is a good idea to understand what they mean.

Page 5: Test of Significance Presenter: Shib Sekhar Datta Moderator: M S Bharambe.

• We test SAMPLE to draw conclusions about POPULATION

• If two SAMPLES (group means) are different, can we be certain that POPULATIONS (from which the samples were drawn) are also different?

• Is the difference obtained TRUE or SPURIOUS?

• Will another set of samples be also different?

• What are the chances that the difference obtained is spurious?

• The above questions can be answered by STAT TEST.

Why should we test significance?

Page 6: Test of Significance Presenter: Shib Sekhar Datta Moderator: M S Bharambe.

Issues in significance tests

• Testing many hypotheses

• One-tailed or two-tailed?

• A statistically significant result may not be

important

• Sample size

• If chance model is wrong, test result can be

meaningless

• A significant test does not identify the cause

Page 7: Test of Significance Presenter: Shib Sekhar Datta Moderator: M S Bharambe.

Four Key Points: Repeated Samples

1. Plotting the means of repeated samples will produce a normal distribution: it will be more peaked than when raw data are plotted (as shown in Figure 9.1)

Page 8: Test of Significance Presenter: Shib Sekhar Datta Moderator: M S Bharambe.

Four Key Points (cont’d)

2. The larger the sample sizes, the more peaked the distribution and the closer the means of the samples to the population mean (shown in Figure 9.2)

Page 9: Test of Significance Presenter: Shib Sekhar Datta Moderator: M S Bharambe.

Four Key Points (cont’d)3. The greater the variability in the population, the

greater the variations in the samples

4. When sample sizes are above 100, even if a variable in the population is not normally distributed, the means will be normally distributed when repeated samples are plotted

– E.g., weight of population of males and females will be bimodal, but if we did repeated samples, the weights would be normally distributed

Page 10: Test of Significance Presenter: Shib Sekhar Datta Moderator: M S Bharambe.

One- and Two-Tailed Tests

• If the direction of a relationship is predicted, the appropriate test will be one-tailed– If the direction of the relationship is not predicted, use a

two-tailed test

• Example: – One tailed: Females are less approving of violence than

are males – Two-tailed: There is a gender difference in the

acceptance of violence

[Note: No prediction about which gender is more approving]

Page 11: Test of Significance Presenter: Shib Sekhar Datta Moderator: M S Bharambe.

Statistical test

Hypothesis testing for one sample problem consists of three parts:

I) Null hypothesis (Ho): The unknown average is equal to sample mean

II) Alternative hypothesis (Ha): The unknown average is not equal to

This is the hypothesis that the researcher would like to validate and it is chosen before collecting the data!

There are three possible alternative hypotheses – choose only one!

1. Ha: one-sided test

2. Ha: two-sided test

3. Ha: one-sided test

0

0 0

0

0

Page 12: Test of Significance Presenter: Shib Sekhar Datta Moderator: M S Bharambe.

Five Percent Probability Rejection Area: One- and Two-Tailed Tests

Page 13: Test of Significance Presenter: Shib Sekhar Datta Moderator: M S Bharambe.

Significance doesn’t rule out 100% Chance

• When a result is statistically significant, there is a 5% chance of obtaining something as extreme as, or more extreme than your observation, under the null hypothesis.

• Even if nothing is happening (the null hypothesis is true), you will get 5 significant results in 100 tests, just by chance.

Page 14: Test of Significance Presenter: Shib Sekhar Datta Moderator: M S Bharambe.

One-tailed vs Two-tailed

• Checking if the sample average is too big or too small before making an alternative hypothesis is a form of data snooping.

• A coin is tossed 100 times, and there are 61 heads. The null hypothesis says the coin is fair, so EV = 50, SE = 5, and z = (61 – 50) / 5 = 2.2.

• If the alternative hypothesis says the coin is biased towards the heads, then the P-value is roughly the area under the normal curve to the right of 2.2: 1.4%.

Page 15: Test of Significance Presenter: Shib Sekhar Datta Moderator: M S Bharambe.

• If the alternative hypothesis is that the coin is biased, but this bias can be either way, then the P-value should be the area to the left of 2.2 and right of 2.2: 0.7%.

• A one-tailed test is OK if it was known before the experiment that the coin is either fair or biased towards the heads.

• If a reported test is two-tailed, and you think it should be one-tailed, just double the P-value.

Page 16: Test of Significance Presenter: Shib Sekhar Datta Moderator: M S Bharambe.

• Suppose an investigator use z-test with two-tailed test and alpha at 0.05 (5%) and gets z = 1.85, so P ≈ 6%. Most journals won’t publish this result, since it is not “statistically significant”.

• The investigator could do a better experiment, but this is hard. A simple way out is to do a one-tailed test.

Page 17: Test of Significance Presenter: Shib Sekhar Datta Moderator: M S Bharambe.

Was the result important?

• To compare WISC vocabulary scores for big-city and rural children, investigators took a simple random sample of 2,500 big-city kids, and an independent simple random sample of 2,500 rural kids. Big-city: average = 26, SD = 10. Rural: average = 25, SD = 10.

• Two-sample z-test. SE for difference ≈ 0.3, z = 1 / 0.3 ≈ 3.3, P = 5 / 10,000.

• The difference is highly significant, and the investigators use this to support a proposal to pour money into rural schools.

Page 18: Test of Significance Presenter: Shib Sekhar Datta Moderator: M S Bharambe.

• The z-test only tells us that the difference of 1 point is very hard to explain away with chance variation. How important is this difference?

Page 19: Test of Significance Presenter: Shib Sekhar Datta Moderator: M S Bharambe.

P-value and sample size

• The P-value of a test depends on the sample size.

• With a large sample, even a small difference can be “statistically significant”, i.e., hard to explain by the luck of the draw. This doesn’t necessarily make it important.

• Conversely, an important difference may not be statistically significant if the sample is too small.

Page 20: Test of Significance Presenter: Shib Sekhar Datta Moderator: M S Bharambe.

The Role of the Chance Model

A test of significance asks the question, “Is the difference due to chance?”

But the test cannot be done until the word “chance” has been given a precise meaning.

Page 21: Test of Significance Presenter: Shib Sekhar Datta Moderator: M S Bharambe.

Does the difference prove the point?

• In an experiment of throwing a die: a die is rolled, and the subject tries to make it show 6 spots.

• In 720 trials, 6 spots turned up 143 times. If the die is fair, and the subject has no bias, expect 120 sixes, with SD(Hypothetical) 10.

• z = (143 – 120) / 10 = 2.3, P ≈ 1%.

Page 22: Test of Significance Presenter: Shib Sekhar Datta Moderator: M S Bharambe.

• A test of significance tells you that there is a difference, but can’t tell you the cause of the difference.

• A test of significance does not check the design of the study.

Page 23: Test of Significance Presenter: Shib Sekhar Datta Moderator: M S Bharambe.

How to test statistical significance?

State Null hypothesis Set alpha (level of significance) Identify the variables to be analyzed Identify the groups to be compared Choose a test

Calculate the test statistic

If necessary calculate degrees of freedom Find out the P value Compare with alpha Decision on hypothesis fail to reject or reject Calculate the CI of the difference Calculate Power if required

Page 24: Test of Significance Presenter: Shib Sekhar Datta Moderator: M S Bharambe.

How to interpret P?

• If P < alpha (0.05), the difference is statistically significant

• If P > alpha, the difference between groups is not statistically significant / the difference could not be detected.

• If P> alpha, calculate the power

If power < 80% - The difference could not be detected; repeat the study with deficit number of study subjects.

If power ≥ 80 % - The difference between groups is not statistically significant.

Page 25: Test of Significance Presenter: Shib Sekhar Datta Moderator: M S Bharambe.

Statistical Significance• The null hypothesis: Dependent and independent variables

are statistically unrelated

• If a relationship between an independent variable and a dependent variable is statistically non-significant– Null hypothesis is true– Research hypothesis is rejected

• If a relationship between an independent variable and a dependent variable is statistically significant– Null hypothesis is false– Research hypothesis is supported

Page 26: Test of Significance Presenter: Shib Sekhar Datta Moderator: M S Bharambe.

Tests of SignificanceHypotheses

– In a test of significance, we set up two hypotheses.• The null hypothesis or H0.• The alternative hypothesis or Ha.

– The null hypothesis (H0)is the statement being tested.• Usually we want to show evidence that the null hypothesis is not

true.• It is often called the “currently held belief” or “statement of no

effect” or “statement of no difference.”

– The alternative hypothesis (Ha) is the statement of what we want to show is true instead of H0.

• The alternative hypothesis can be one-sided or two-sided, depending on the statement of the question of interest.

– Hypotheses are always about parameters of populations, never about statistics from samples.

• It is often helpful to think of null and alternative hypotheses as opposite statements about the parameter of interest.

Page 27: Test of Significance Presenter: Shib Sekhar Datta Moderator: M S Bharambe.

Tests of Significance

Test Statistics

– A test statistic measures the compatibility between the null hypothesis and the data.

• An extreme test statistic (far from 0) indicates the data are not compatible with the null hypothesis.

• A common test statistic (close to 0) indicates the data are compatible with the null hypothesis.

Page 28: Test of Significance Presenter: Shib Sekhar Datta Moderator: M S Bharambe.

Tests of Significance

Significance Level (a)

– The significance level (a) is the point at which we say the p-value is small enough to reject H0.

– If the P-value is as small as a or smaller, we reject H0, and we say that the data are statistically significant at level a.

– Significance levels are related to confidence levels through the rule C = 1 – α

– Common significance levels (a’s) are

• 0.10 corresponding to confidence level 90%• 0.05 corresponding to confidence level 95%• 0.01 corresponding to confidence level 99%

Page 29: Test of Significance Presenter: Shib Sekhar Datta Moderator: M S Bharambe.

Tests of Significance

• Steps for Testing a Population Mean (with known)

1. State the null hypothesis:

2. State the alternative hypothesis:

3. State the level of significance • Assume a = 0.05 unless otherwise stated

4. Calculate the test statistic

00 :H

0a :H ):Hor :H be also (could 0a0a

n

xz

0

Page 30: Test of Significance Presenter: Shib Sekhar Datta Moderator: M S Bharambe.

• Steps for Testing a Population Mean (with known)

5. Find the P-value:• For a one or two-sided test:

T testConsult t distribution table and get standard value for

specific degree of freedom

Z test N need to consult table as it is independent of degrees of

freedom

– At 5% level of significance Z=1.96– At 1% level of significance Z =2.58

Page 31: Test of Significance Presenter: Shib Sekhar Datta Moderator: M S Bharambe.

• Steps for Testing a Population Mean (with known)

6. Reject or fail to reject H0 based on comparison of test statistic• If the P-value is less than or equal to a, reject H0.• It the P-value is greater than a, fail to reject H0.

7. State your conclusion

• Your conclusion should reflect your original statement of the hypotheses.

• Furthermore, your conclusion should be stated in terms of the alternative hypotheses

• For example, if Ha: μ ≠ μ0 as stated previously

– If H0 is rejected, “There is significant statistical evidence that the population mean is different than m0.”

– If H0 is not rejected, “There is not significant statistical evidence that the population mean is different than m0.”

Page 32: Test of Significance Presenter: Shib Sekhar Datta Moderator: M S Bharambe.

Example : Arsenic

– “A factory that discharges waste water into the sewage system is required to monitor the arsenic levels in its waste water and report the results to the Environmental Protection Agency (EPA) at regular intervals. Sixty beakers of waste water from the discharge are obtained at randomly chosen times during a certain month. The measurement of arsenic is in nanograms per liter for each beaker of water obtained.”

(From Graybill, Iyer and Burdick, Applied Statistics, 1998).

– Suppose the EPA wants to test if the average arsenic level exceeds 30 nanograms per liter at the 0.05 level of significance.

Page 33: Test of Significance Presenter: Shib Sekhar Datta Moderator: M S Bharambe.

Making a test of significance

Follow these steps:

1. Set up the null hypothesis H0– the hypothesis you want to test.

2. Set up the alternative hypothesis Ha– what we accept if H0 is

rejected3. Compute the value t* of the test statistic.4. Compute the observed significance level P. This is the

probability, calculated assuming that H0 is true, of getting a test

statistic as extreme or more extreme than the observed one in the direction of the alternative hypothesis.

5. State a conclusion. You could choose a significance level . If the P-value is less than or equal to , you conclude that the null hypothesis can be rejected at level , otherwise you conclude that

the data do not provide enough evidence to reject H0.

Page 34: Test of Significance Presenter: Shib Sekhar Datta Moderator: M S Bharambe.

Examples of Significance Tests

• Arsenic Example– Information given:

37.6 56.7 5.1 3.7 3.5 15.7

20.7 81.3 37.5 15.4 10.6 8.3

23.2 9.5 7.9 21.1 40.6 35

19.4 38.8 20.9 8.6 59.2 6.2

24 33.8 21.6 15.3 6.6 87.7

4.8 10.7 182.2 17.6 15.3 37.6

152 63.5 46.9 17.4 17.4 26.1

21.5 3.2 45.2 12 128.5 23.5

24.1 36.2 48.9 16.5 24.1 33.2

25.6 33.6 12.2 9.9 14.5 30

83.30x

Assume it is known that s = 34.

Sample size: n = 60.

Page 35: Test of Significance Presenter: Shib Sekhar Datta Moderator: M S Bharambe.

• Arsenic Example

1. State the null hypothesis:

2. State the alternative hypothesis:

3. State the level of significance

30:H0

30:Ha from “exceeds”

a = 0.05 for one tailed test

Page 36: Test of Significance Presenter: Shib Sekhar Datta Moderator: M S Bharambe.

Z Test Statistic

Tests using the z-statistic are called z-tests.

SE

expectedobserved z

Page 37: Test of Significance Presenter: Shib Sekhar Datta Moderator: M S Bharambe.

Observed Significance Level

• z is the number of SEs an observed value is away from its expected value, based on the null hypothesis.

• The observed z-statistic is –3. The chance of getting a sample average 3 SEs or more below its expected value is about 1 in 1,000.

This is an observed significance level, denoted by P, and referred to as a P-value.

Page 38: Test of Significance Presenter: Shib Sekhar Datta Moderator: M S Bharambe.

• Arsenic Example

4. Calculate the test statistic.

5. Z value is < 1.96

(At 5% level of significance)

So decisionEvidence are failed to reject the null hypothesisHence, alternate hypothesis is true

n

xz

0

6034

3083.30

39.4

83.0 19.0

Page 39: Test of Significance Presenter: Shib Sekhar Datta Moderator: M S Bharambe.

Arsenic Example

7. State the conclusion.

There is not significant statistical evidence that the average arsenic level exceeds 30 nanograms per liter at the 0.05 level of significance.

Page 40: Test of Significance Presenter: Shib Sekhar Datta Moderator: M S Bharambe.

Types of Error

• Type I Error: When we reject H0 (accept Ha) when in fact H0 is true.

• Type II Error: When we accept H0 (fail to reject H0) when in fact Ha is true.

Truth about thepopulation

Ho is true Ha is true

Decision basedon the sample Reject Ho

Type I error

Correctdecision

Accept Ho

Correctdecision

Type IIerror

Page 41: Test of Significance Presenter: Shib Sekhar Datta Moderator: M S Bharambe.

Power and Error

• Significance and Type I Error

– The significance level α of any fixed level test is the probability of a Type I Error: That is, α is the probability that the test will reject the null hypothesis, H0, when H0 is in fact true.

• Power and Type II Error

– The power of a fixed level test against a particular alternative is 1 minus the probability of a Type II Error for that alternative (1 - β).

Page 42: Test of Significance Presenter: Shib Sekhar Datta Moderator: M S Bharambe.

Power

• Power: The probability that a fixed level α significance test will reject H0 when a particular alternative value of the parameter is true is called the power of the test to detect that alternative.

• In other words, power is the probability that the test will reject H0 when the alternative is true (when the null really should be rejected.)

• Ways To Increase Power:

– Increase α– Increase the sample size – this is what you will typically

want to do– Decrease σ

Page 43: Test of Significance Presenter: Shib Sekhar Datta Moderator: M S Bharambe.

Significance Level

• The observed significance level is the chance of getting a test statistic as extreme as, or more extreme than, the observed one.

• It is computed on the basis that the null hypothesis is right.

• The smaller it is, the stronger the evidence against the null.

Page 44: Test of Significance Presenter: Shib Sekhar Datta Moderator: M S Bharambe.

The common practice of testing hypotheses

• The common practice of testing hypotheses mixes the reasoning of significance tests and decision rules as follows:

– State H0 and Ha

– Think of the problem as a decision problem, so that the probabilities of Type I and Type II errors are relevant.

– Because of Step 1, Type I errors are more serious. So choose an α (significance level) and consider only tests with probability of Type I error no greater than .α

– Among these tests, select one that makes the probability of a Type II error as small as possible (that is, power as large as possible.) If this probability is too large, you will have to take a larger sample to reduce the chance of an error.

Page 45: Test of Significance Presenter: Shib Sekhar Datta Moderator: M S Bharambe.

The t-test

• The z-test is good when the sample size (number of draws) is large enough.

• For small samples, the t-test should be used, provided the histogram of the contents is not too different from the normal curve.

Page 46: Test of Significance Presenter: Shib Sekhar Datta Moderator: M S Bharambe.

The t Distribution: t-Test Groups and Pairs

Used often for experimental data

t-test used when:

• Sample size is small (e.g.: < 30)• Dependent variable measured at ratio level• Random assignment to treatment/control groups• Treatment has two levels only• Sample statistic normally distributed

Page 47: Test of Significance Presenter: Shib Sekhar Datta Moderator: M S Bharambe.

The t-test represents the ratio between the difference in means between two groups and the standard error of the difference. Thus:

difference between the means

standard error of the differencet =

Page 48: Test of Significance Presenter: Shib Sekhar Datta Moderator: M S Bharambe.

Two t-Tests: Between- and Within-Subject Design

• Between-subjects:

Used in an experimental design, with an experimental and a control group, where the groups have been independently established

• Within-subjects:

In these designs the same person is subjected to different treatments and a comparison is made between the two treatments.

Page 49: Test of Significance Presenter: Shib Sekhar Datta Moderator: M S Bharambe.

Example

• A weight is known to weigh 70 g.

Five measurements were made: 78, 83, 68, 72, 88. The fluctuation is due to chance variation assuming that the calibrated instrument was used for measurement.

• Do the measurements fluctuate around 70, or do they indicate some bias in the measurement procedure?

Page 50: Test of Significance Presenter: Shib Sekhar Datta Moderator: M S Bharambe.

Summary for the t-test

Assumptions:

(a) The data are likely drawn from a normally distributed population.

(b) The SD of the population is unknown.

(c) The number of observations is small.

(d) The histogram of the contents does not look too different from the normal curve.

Page 51: Test of Significance Presenter: Shib Sekhar Datta Moderator: M S Bharambe.

Two Sample Tests

To compare two samples averages, the test statistic has the form

except that now these terms apply to difference of sample averages.

Need to find the SE for difference.

SE

expected - observed

Page 52: Test of Significance Presenter: Shib Sekhar Datta Moderator: M S Bharambe.

Summary for the t-test

The t-statistic has the same form as the z-statistic:

except that the SE is computed on the basis of SD of the data.

The degree of freedom is (sample size – 1).

SE

expectedobserved t

Page 53: Test of Significance Presenter: Shib Sekhar Datta Moderator: M S Bharambe.

The Chi Square Test

Used for frequencies in discrete categories.

(a) To check if the observed frequencies are like hypothesized frequencies.

(b) To check if two variables are independent in the population.

Page 54: Test of Significance Presenter: Shib Sekhar Datta Moderator: M S Bharambe.

Expected frequency for a cell

= (row total X column total)/N

The χ2 statistic is:

where the sum is taken over all the cell.

22 (obs freq - exp freq)

sum ofexp freq

Page 55: Test of Significance Presenter: Shib Sekhar Datta Moderator: M S Bharambe.

Is the die loaded?

A gambler is accused of using a loaded die, but he pleads innocent. The results of the last 60 throws are summarised below:

Value observed frequency 1 4 2 6 3 17 4 16 5 8 6 9

Page 56: Test of Significance Presenter: Shib Sekhar Datta Moderator: M S Bharambe.

• In our example, the χ2-statistic is

Large χ2 values

observed frequency is far from expected frequency: evidence against the null hypothesis.

Small χ2 values

they are close to each other: evidence for the null hypothesis.

2.1410

)109(

10

)108(

10

)1016(

10

)1017(

10

)106(

10

)104( 222222

Page 57: Test of Significance Presenter: Shib Sekhar Datta Moderator: M S Bharambe.

• What’s the P-value,

i.e., under the null hypothesis, what’s the chance of getting a χ2-statistic that is as large as, or larger than the observed statistic?

• Find out df, Chi square value and find out p

Page 58: Test of Significance Presenter: Shib Sekhar Datta Moderator: M S Bharambe.

Conclusion

• A test of significance answers a specific question:

How easy is it to explain the difference between the data and what is expected on the null hypothesis, on the basis of chance variation alone?

• It does not measure the size of a difference or its importance,

• It will not identify the cause of the difference.

Page 59: Test of Significance Presenter: Shib Sekhar Datta Moderator: M S Bharambe.

Choosing a stat test……

1. Data type 2. Distribution of data 3. Analysis type (goal) 4. No. of groups 5. Design

Page 60: Test of Significance Presenter: Shib Sekhar Datta Moderator: M S Bharambe.
Page 61: Test of Significance Presenter: Shib Sekhar Datta Moderator: M S Bharambe.

CorrelationHow we know a variable is normally distributed

• Histogram

• Mean, Mode and Median if are almost equal

• Kelmogorov Smirnov test

Page 62: Test of Significance Presenter: Shib Sekhar Datta Moderator: M S Bharambe.
Page 63: Test of Significance Presenter: Shib Sekhar Datta Moderator: M S Bharambe.

References• Armitadge, Edition• Bishweswar Rao, Edition 4, Year 2007.

Page 64: Test of Significance Presenter: Shib Sekhar Datta Moderator: M S Bharambe.

Thank You