CHAPTER 6 Statistical Inference & Hypothesis Testing

• 6.1 - One Sample

Mean μ, Variance σ 2, Proportion π

• 6.2 - Two Samples Means, Variances, Proportions μ1 vs. μ2 σ1

2 vs. σ22 π1 vs. π2

• 6.3 - Multiple Samples Means, Variances, Proportions μ1, …, μk σ1

2, …, σk2 π1, …, πk

CHAPTER 6 Statistical Inference & Hypothesis Testing

• 6.1 - One Sample

Mean μ, Variance σ 2, Proportion π

• 6.2 - Two Samples Means, Variances, Proportions μ1 vs. μ2 σ1

2 vs. σ22 π1 vs. π2

• 6.3 - Multiple Samples Means, Variances, Proportions μ1, …, μk σ1

2, …, σk2 π1, …, πk

CHAPTER 6 Statistical Inference & Hypothesis Testing

1, Yes (Success)0, No (Failure)

“Do you like olives?”

I = 1 I = 0

POPULATIONTwo random binary variables I and J

TWO POPULATIONSRandom binary variable I

“Do you like Brussel sprouts?”

Alternative HypothesisHA: 1 ≠ 2 “There is a difference in liking Brussel sprouts bet two pops.”

= P(Yes to Brussel sprouts)

Null HypothesisH0: 1 = 2 “No difference in liking Brussel sprouts between two pops.”

Binary Response: P(Success) =

“Test of Homogeneity”

“Do you like anchovies?”

1 = P(Yes to olives) 2 = P(Yes to anchovies)

Alternative HypothesisHA: 1 ≠ 2 “An association exists between liking olives and anchovies.”

Null HypothesisH0: 1 = 2 “No association exists between liking olives and anchovies.”

“Test of Independence”

I = 1 I = 0

Sample, size n1

Sample, size n2

Sample, size n1

Sample, size n2

(Assume “large” sample sizes.)

I = 1 I = 0

If n 15 and n (1 – ) 15, then via the Normal Approximation to the Binomial… , (1 ) .X N n n

If n 15 and n (1 – ) 15, then via the Normal Approximation to the Binomial…

(1 ), .X Nn n

Sample 1, size n1

Sample 2, size n2

X1 = # Successes X2 = # Successes

Sampling Distribution of

Solution: Use

ˆFor CI, Else, :H

Problem: s.e. depends on !!

Recall…

If n11 15 and n1 (1 – 1 ) 15, then via Normal Approximation to the Binomial

1 1 11 1

(1 )ˆ , .X Nn n

Sample 1, size n1

Sample 2, size n2

ˆ ˆ 1 2 ???π πSampling Distribution of

If n22 15 and n2 (1 – 2 ) 15, then via Normal Approximation to the Binomial

2 2 22 2

(1 )ˆ , .X Nn n

Mean(X – Y) = Mean(X) – Mean(Y)Recall from section 4.1 (Discrete Models):

and if X and Y are independent…

Var(X – Y) = Var(X) + Var(Y)

1 1 2 21 2

(1 ) (1 ),Nn n

ˆ ˆ1 2π πSampling Distribution of

Sample 1, size n1

Sample 2, size n2

1 1 2 21 2

(1 ) (1 ),Nn n

Similar problem as “one

proportion” inference s.e.!

For confidence interval, replace 1 and 2 respectively, by

standard error

1 2ˆ ˆ . and

For critical region and p-value, replace 1 and 2 respectively, by….. ????

Null Hypothesis H0: 1 = 2

…so replace their common value by a “pooled” estimate.

X Xn n

ˆ ˆ ˆ ˆ(1 ) (1 )p p p p

standard error estimate

1 1ˆ ˆ(1 )p p n n

= 0 under H0

“Null Distribution”

• Study Question: “Is there an association between liking Bruce Willis movies and gender, or not?”

Example: Two Proportions (of “Success”)

Test of Homogeneity or Independence?

• Design: Randomly select two large samples of males and females, and record their binary responses (Yes = 1, No = 0) to the question “Do you like Bruce Willis movies?”

Let the discrete random variable X = “# Successes” (i.e., “Yes” responses) in each gender of the samples, and use these data to test…

• Data: Sample 1) n1 = 60 males, X1 = 42 Sample 2) n2 = 40 females, X2 = 16

1ˆ 42 / 60 0.7 2

ˆ 16 / 40 0.4

• Analysis via Z-test: Point estimates ˆ /X n

1 2ˆ ˆ 0.3

• Null Hypothesis H0: P(“Yes” among Males) = P(“Yes” among Females), i.e.,

H0: π1 = π2 where π = P(Success) in each gender population. “No association exists.”π1 – π2 = 0,

NOTE: This is > 0.

pooled42 16 58ˆ60 40 100

0.58 Therefore, 0 pooled pooled

1 1ˆ ˆs.e. (1 )n n

01 1s.e. (0.58)(0.42)60 40

0.10075

1 2ˆ ˆ- 2 ( 0.3)P valuep

0.3 0 2.97750.10075

2 ( 2.9775)P Z REJECT H0

Conclusion: A significant association exists at the .05 level between “liking Bruce Willis movies” and gender, with males showing a 30% preference over females, on average.

.05 .0029

Test of Homogeneity (between two populations)• Study Question: “Is there an association between liking Bruce Willis

movies and gender, or not?”

“Do you like Bruce Willis movies?”

Alternative HypothesisHA: 1 ≠ 2 “There is a difference in liking Bruce Willis bet two pops.”

= P(Yes to Bruce Willis movies)

Null HypothesisH0: 1 = 2 “No difference in liking Bruce Willis between two pops.”

Alternative HypothesisHA: 1 ≠ 2 “An association exists between liking olives and anchovies.”

Null HypothesisH0: 1 = 2 “No association exists between liking olives and anchovies.”

I = 1 I = 0

Males Females

Conclusion: A significant association exists at the .05 level; “liking Bruce Willis movies” and gender are dependent, with males showing a 30% preference over females, on average.

1ˆ 42 / 60 0.7 2

ˆ 16 / 40 0.4

1 2ˆ ˆ 0.3

NOTE: This is > 0.

pooled42 16 58ˆ60 40 100

0.58 Therefore, 0

1 1s.e. (0.58)(0.42)60 40

0.10075

1 2ˆ ˆ- 2 ( 0.3)P valuep

0.3 0 2.97750.10075

2 ( 2.9775)P Z REJECT H0 .05 .0029

Test of Homogeneity or Independence

1 = P(Yes to Bruce) 2 = P(Yes to Male)

Alternative HypothesisHA: 1 ≠ 2 “An association exists between liking Bruce and Male.”

Null HypothesisH0: 1 = 2 “No association exists between liking Bruce and Male.”

I = 1 I = 0

Males Females

“Gender: Male?”“Do you like Bruce Willis?”

1 = P(Yes to Bruce) 2 = P(Yes to Male)

Alternative HypothesisHA: 1 ≠ 2 “Liking Bruce” and “Gender” are statistically dependent.

Null HypothesisH0: 1 = 2 “Liking Bruce” and “Gender” are statistically independent.

I = 1 I = 0

Males Females

“Gender: Male?”“Do you like Bruce Willis?”

~ ALTERNATE METHOD ~

I = 1 I = 0

Males FemalesYes 42 16

No60 40

Males Females

Yes E11 = ? E12 = ? 58

No E21 = ? E22 = ? 42

60 40 100

Observed

Expected(under H0)

Males FemalesYes 42 16 58

No 18 24 42

60 40 100

Recall Probability Tables from Chapter 3….

Under the null hypothesis, the binary variable I is statistically independent of the binary variable J, i.e., P(I ∩ J) = P(I) P(J).

J = 1 J = 2

I = 1 π11 π12 π11 + π12

I = 2 π21 π22 π21 + π22

π11 + π21 π12 + π22 1

Recall Probability Tables from Chapter 3….

Contingency Table

Under the null hypothesis, the binary variable I is statistically independent of the binary variable J, e.g., P(“I = 1” ∩ “J = 1”) = P(“I = 1”) P(“J = 1”).

J = 1 J = 2

I = 1 π11 π12 π11 + π12

I = 2 π21 π22 π21 + π22

π11 + π21 π12 + π22 1

J = 1 J = 2

I = 1 E11 E12 R1

I = 2 E21 E22 R2

C1 C2 n

J = 1 J = 2

I = 1 E11/n E12/n R1/n

I = 2 E21/n E22/n R2/n

C1/n C2/n 1

Probability TableTherefore…

11 1 1E R Cn n n

, etc.

H0: π1 = π2 where π = P(Success) in each gender population. “No association exists.”

Check: Is the null hypothesis true?

34.8 23.2 5860 40 100

• Data: Sample 1) n1 = 60 males, X1 = 42 Sample 2) n2 = 40 females, X2 = 16 Males Females

Yes 42 16

No60 40

Males Females

Yes E11 = ? E12 = ? 58

No E21 = ? E22 = ? 42

60 40 100

Observed

Expected(under H0)

Males FemalesYes 42 16 58

No 18 24 42

60 40 100

11(58)(60)

100E 34.8

12(58)(40)

100E 23.2

21(42)(60)

100E 25.2

22(42)(40)

100E 16.8

= 1, 2,..., # rowsIn general,

= 1, 2,..., # cols.i j

R C iE

“Chi-squared” Test Statistic2

all cells

Observed ExpectedExpected

where “degrees of freedom” df = (# rows – 1)(# cols – 1), = 1 for a 2 2 table.

π1 – π2 = 0,

Males FemalesYes 34.8 23.2 58

No 25.2 16.8 42

60 40 100

Observed Expected(under H0)Males Females

Yes 42 16 58

No 18 24 42

60 40 100

“Chi-squared” Test Statistic2

all cells

2 2 2 2(42 34.8) (16 23.2) (18 25.2) (24 16.8)34.8 23.2 25.2 16.8

= 8.867 on 1 df p = ?????

Because 8.867 is much greater than the α =.05 critical value of 3.841, it follows that p << .05.More precisely, 7.879 < 8.867 < 9.141; hence .0025 < p < .005.

The actual p-value = .0029, the same as that found using the Z-test!

Yes = c(42, 16)No = c(18, 24)Bruce = rbind(Yes, No)chisq.test(Bruce, correct = F)

Pearson's Chi-squared test

data: Bruce X-squared = 8.867, df = 1, p-value = 0.002904

α =.05

No 25.2 16.8 42

60 40 100

Yes 42 16 58

No 18 24 42

60 40 100

21“Chi-squared” Test Statistic

22 ( )

all cells

2 2 2 2(42 34.8) (16 23.2) (18 25.2) (24 16.8)34.8 23.2 25.2 16.8

= 8.867 on 1 df p = .0029

The α =.05 critical value is 3.841.

Recall…

H0: π1 = π2 where π = P(Success) in each gender population. “No association exists.”

1ˆ 42 / 60 0.7 2

ˆ 16 / 40 0.4

1 2ˆ ˆ 0.3

• Null Hypothesis H0: P(“Yes” in Male population) = P(“Yes” in Female population), i.e.,

π1 – π2 = 0,

NOTE: This is > 0.

pooled42 16 58ˆ60 40 100

0.58 Therefore, 0

1 1s.e. (0.58)(0.42)60 40

0.10075

1 2ˆ ˆ- 2 ( 0.3)P valuep

0.3 0 2.97750.10075

2 ( 2.9775)P Z REJECT H0

Conclusion: A significant association exists at the .05 level; “liking Bruce Willis movies” and gender are dependent, with males showing a 30% preference over females, on average.

.05 .0029

No 25.2 16.8 42

60 40 100

Yes 42 16 58

No 18 24 42

60 40 100

21“Chi-squared” Test Statistic

22 ( )

all cells

2 2 2 2(42 34.8) (16 23.2) (18 25.2) (24 16.8)34.8 23.2 25.2 16.8

p = .0029

The α =.05 critical value is 3.841.

NOTE: (Z-score)2 = (2.9775)2 Connection between Z-test and Chi-squared test !

= 8.867 on 1 df NOTE: (Z-score)2 = (2.9775)2 Connection between Z-test and Chi-squared test !

= 8.867 on 1 df

• Categorical data – contingency table with any number of rows and columns

• See notes for other details, comments, including “Goodness-of-Fit” Test.

• 2 2 Chi-squared Test is only valid if:

Null Hypothesis H0: 1 – 2 = 0. One-sided or nonzero null value Z-test!

Expected Values 5, in order to avoid “spurious significance” due to a possibly inflated Chi-squared value.

• Paired version of 2 2 Chi-squared Test = McNemar Test

Formal Null Hypothesis difficult to write mathematically in terms of 1, 2,… “Test of Independence” “Test of Homogeneity”

Informal H0: “No association exists between rows and columns.”

80% of Expected Values 5

CHAPTER 6 Statistical Inference & Hypothesis Testing

Documents

Transcript of CHAPTER 6 Statistical Inference & Hypothesis Testing

Non-parametric Hypothesis Testing Procedureshaalshraideh/Courses/IE347/Non...Non-parametric Hypothesis Testing Procedures Hypothesis Testing General Procedure for Hypothesis Tests

EGR 252 S10 Ch.10 8th edition Slide 1 Statistical Hypothesis Testing Review A statistical hypothesis is an assertion concerning one or more populations.

Probability: Hypothesis

qualsoln - lagrange.math.siu.edulagrange.math.siu.edu/Olive/siqualsoln.pdf · Some Math 580 Statistical Inference qualifying exam problems, often with solutions. Most solutions are

Part IV Statistical Inference - Uwasalipas.uwasa.fi/~sjp/Teaching/ecm/lectures/ecmc4.pdf · Part IV Statistical Inference As of Oct 2, 2019 Seppo Pynn onen Econometrics I. Statistical

Hypothesis Testing

Anthony Greene1 Simple Hypothesis Testing Detecting Statistical Differences In The Simplest Case: and are both known I The Logic of Hypothesis Testing:

EGR 252 Ch. 9 Lecture1 JMB 2014 9th edition Slide 1 Chapter 9: One- and Two- Sample Estimation Statistical Inference Estimation Tests of hypotheses.

One-sample normal hypothesis Testing, paired t …people.stat.sc.edu/hansont/stat704/notes2.pdfOne-sample normal hypothesis Testing, paired t-test, two-sample normal inference, normal

qualsoln - lagrange.math.siu.edulagrange.math.siu.edu/Olive/squalsoln.pdf · 2017-01-09 · Some Math 580 Statistical Inference qualifying exam problems, often with solutions. Most

Regularization Parameter Estimation forrosie/mypresentations/prague.pdf · Regularization Parameter Estimation for ... Rao, C. R., 1973, Linear Statistical Inference and its applications,

Annex of Statistical inference for the doubly stochastic ...potiron/ClinetPotiron... · Annex of Statistical inference for the doubly stochastic self-exciting process 10 Appendix:

HiGrad: Statistical Inference for Online Learning and ... · HiGrad: Statistical Inference for Online Learning and Stochastic Approximation Weijie J. Su University of Pennsylvania

Statistical Inference Wen, shu-hui shwen@mail.tcu.edu.tw.

Chapter 10 Statistical Inference About Means and ...dscsss/teaching/mgs9920/slides/ch10.pdf · 1 Slide 1 Chapter 10 Statistical Inference About Means and Proportions With Two Populations

15 jVariational inference - ssl2.cms.fu-berlin.de · Variational inference is a statistical inference framework for probabilistic models that comprise unobserv-able random variables.

Statistical Inference - Princeton Universityimai.princeton.edu/teaching/files/statistics.pdf · Statistical Inference Kosuke Imai Department of Politics Princeton University Fall

Hypothesis Testing

Confidence Interval Estimation For statistical inference in decision making:

Statistical Inference - Harvard University · Kosuke Imai (Princeton University) Statistical Inference POL 345 Lecture 16 / 46 Overview of Statistical Hypothesis Testing R EADINGS