What is a Hypothesis?mba.teipir.gr/files/presentation.pdf · 2013. 2. 14. · What is a Hypothesis?...

What is a Hypothesis?

A hypothesis is a claim(assumption) about apopulation parameter:

population mean

Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc.. Chap 9-1

population mean

population proportion

Example: The mean monthly cell phone billin this city is μ = $42

Example: The proportion of adults in thiscity with cell phones is π = 0.68

The Null Hypothesis, H0

States the claim or assertion to be tested

Example: The average number of TV sets in

U.S. Homes is equal to three ( )3μ:H0


Is always about a population parameter,not about a sample statistic

3μ:H0 3X:H0

The Null Hypothesis, H0

Begin with the assumption that the nullhypothesis is true

Similar to the notion of innocent untilproven guilty

(continued)


proven guilty

Refers to the status quo or historical value

Always contains “=” , “≤” or “” sign

May or may not be rejected

The Alternative Hypothesis, H1

Is the opposite of the null hypothesis

e.g., The average number of TV sets in U.S.homes is not equal to 3 ( H1: μ ≠ 3 )

Challenges the status quo


Challenges the status quo

Never contains the “=” , “≤” or “” sign

May or may not be proven

Is generally the hypothesis that theresearcher is trying to prove

The Hypothesis TestingProcess

Claim: The population mean age is 50. H0: μ = 50, H1: μ ≠ 50

Sample the population and find sample mean.


Population

Sample

The Hypothesis TestingProcess

SamplingDistribution of X

(continued)


μ = 50If H0 is true

If it is unlikely that youwould get a samplemean of this value ...

... then you rejectthe null hypothesis

that μ = 50.

20

... When in fact this werethe population mean…

X

The Test Statistic andCritical Values

If the sample mean is close to the assumedpopulation mean, the null hypothesis is notrejected.

If the sample mean is far from the assumed


If the sample mean is far from the assumedpopulation mean, the null hypothesis is rejected.

How far is “far enough” to reject H0?

The critical value of a test statistic creates a “line inthe sand” for decision making -- it answers thequestion of how far is far enough.

The Test Statistic andCritical Values

Sampling Distribution of the test statistic

Region ofRejection

Region ofRejection

Region of


Critical Values

“Too Far Away” From Mean of Sampling Distribution

RejectionRegion of

Non-Rejection

Possible Errors in Hypothesis TestDecision Making

Type I Error

Reject a true null hypothesis

Considered a serious type of error

The probability of a Type I Error is


Called level of significance of the test

Set by researcher in advance

Type II Error

Failure to reject false null hypothesis

The probability of a Type II Error is β


Possible Hypothesis Test Outcomes

Actual Situation

(continued)


Decision H0 True H0 False

Do NotReject H0

No Error

Probability 1 - α

Type II Error

Probability β

Reject H0 Type I Error

Probability α

No Error

Probability 1 - β


The confidence coefficient (1-α) is the probability of not rejecting H0 when it is true.

The confidence level of a hypothesis test is

(continued)


The confidence level of a hypothesis test is(1-α)*100%.

The power of a statistical test (1-β) is the probability of rejecting H0 when it is false.

Type I & II Error Relationship

Type I and Type II errors cannot happen atthe same time

A Type I error can only occur if H0 is true


A Type I error can only occur if H0 is true

A Type II error can only occur if H0 is false

If Type I error probability ( ) , then

Type II error probability ( β )

Factors Affecting Type II Error

All else equal,

β when the difference between

hypothesized parameter and its true value


β when

β when σ

β when n

Level of Significanceand the Rejection Region

Level of significance = aH0: μ = 3

H1: μ ≠ 3

/2a/2a


This is a two-tail test because there is a rejection region in both tails

Critical values

Rejection Region

0

2 Test of Independence

Similar to the 2 test for equality of more thantwo proportions, but extends the concept tocontingency tables with r rows and c columns


H0: The two categorical variables are independent

(i.e., there is no relationship between them)

H1: The two categorical variables are dependent

(i.e., there is a relationship between them)

Wilcoxon Rank-Sum Test:Hypothesis and Decision Rule

H : M = M H : M MH : M MTwo-Tail Test Left-Tail Test Right-Tail Test

M1 = median of population 1; M2 = median ofpopulation 2Test statistic = T1 (Sum of ranks from smallersample)


H0: M1 = M2

H1: M1 M2

H0: M1 M2

H1: M1 > M2

H0: M1 M2

H1: M1 < M2

Reject

T1L T1U

RejectDo NotReject

Reject

T1L

Do Not Reject

T1U

RejectDo Not Reject

Reject H0 if T1 ≤ T1Lor if T1 ≥ T1U

Reject H0 if T1 ≤ T1L Reject H0 if T1 ≥ T1U

Sample data are collected on the capacity rates

(% of capacity) for two factories.

Are the median operating rates for two factories

the same?

Wilcoxon Rank-Sum Test:Small Sample Example


the same?

For factory A, the rates are 71, 82, 77, 94, 88

For factory B, the rates are 85, 82, 92, 97

Test for equality of the population medians

at the 0.05 significance level

Wilcoxon Rank-Sum Test:Small Sample Example

Capacity Rank

Factory A Factory B Factory A Factory B

71 1

77 2

82 3.5Tie in 3rd

RankedCapacit

yvalues:

(continued)


82 3.5

82 3.5

85 5

88 6

92 7

94 8

97 9

Rank Sums: 20.5 24.5

Tie in 3rd

and 4th

places

Wilcoxon Signed Ranks Test

A nonparametric test for two related populations

Steps:

1. For each of n sample items, compute the difference,Di, between two measurements


Di, between two measurements

2. Ignore + and – signs and find the absolute values, |Di|

3. Omit zero differences, so sample size is n’

4. Assign ranks Ri from 1 to n’ (give average rank to

ties)

5. Reassign + and – signs to the ranks Ri

6. Compute the Wilcoxon test statistic W as the sum ofthe positive ranks

Wilcoxon Signed RanksTest Statistic

The Wilcoxon signed ranks test statistic is thesum of the positive ranks:

n'

)(RW


For small samples (n’ < 20), use Table E.9 for

the critical value of W

1i

)(iRW

Wilcoxon Signed RanksTest Statistic

For samples of n’ > 20, W is approximately

normally distributed with

1)(n'n'


24

1)1)(2n'(n'n'σ

4

1)(n'n'μ

W

W

Wilcoxon Signed Ranks Test

The large sample Wilcoxon signed ranks Ztest statistic is

4

1)(n'n'W

Z


To test for no median difference in the pairedvalues:

H0: MD = 0

H1: MD ≠ 0

24

1)1)(2n'(n'n'

4ZSTAT

Kruskal-Wallis Rank Test

Tests the equality of more than 2 populationmedians

Use when the normality assumption for one-way ANOVA is violated

Assumptions:


Assumptions: The samples are random and independent

Variables have a continuous distribution

The data can be ranked

Populations have the same variability

Populations have the same shape

Kruskal-Wallis Test Procedure

Obtain rankings for each value

In event of tie, each of the tied values gets theaverage rank

Sum the rankings for data from each of the c


Sum the rankings for data from each of the cgroups

Compute the H test statistic

Kruskal-Wallis Test Procedure

The Kruskal-Wallis H-test statistic:(with c – 1 degrees of freedom)

)1n(3T12

Hc 2

j

(continued)


)1n(3n)1n(n

12H

1j j

j

where:n = sum of sample sizes in all groupsc = Number of groupsTj = Sum of ranks in the jth groupnj = Number of values in the jth group (j = 1, 2, … , c)

Correlation vs. Regression

A scatter plot can be used to show therelationship between two variables

Correlation analysis is used to measure thestrength of the association (linear relationship)between two variables


between two variables

Correlation is only concerned with strength of therelationship

No causal effect is implied with correlation

Scatter plots were first presented in Ch. 2

Correlation was first presented in Ch. 3

Introduction toRegression Analysis

Regression analysis is used to:

Predict the value of a dependent variable based onthe value of at least one independent variable

Explain the impact of changes in an independent


Explain the impact of changes in an independentvariable on the dependent variable

Dependent variable: the variable we wish topredict or explain

Independent variable: the variable used to predictor explain the dependentvariable

Simple Linear RegressionModel

Only one independent variable, X

Relationship between X and Y isdescribed by a linear function


Changes in Y are assumed to be relatedto changes in X

Types of Relationships

Y Y

Linear relationships Curvilinear relationships


Y

X

X

Y

X

X


Y Y

Strong relationships Weak relationships

(continued)


Y

X

X

Y

X

X


Y

No relationship

(continued)


Y

X

X


PopulationY intercept

PopulationSlopeCoefficient

RandomErrorterm

Dependent

IndependentVariable


ii10i εXββY Linear component

DependentVariable

Random Errorcomponent

(continued)

Y

Observed Valueof Y for Xi

ii10i εXββY

ε



Random Errorfor this Xi

value

X

Predicted Valueof Y for Xi

Xi

Slope =β1

Intercept = β0

εi

The simple linear regression equationprovides an estimate of the populationregression line

Simple Linear RegressionEquation (Prediction Line)

Estimate ofthe regression

Estimate of theregression slope

Estimated(or predicted)Y value for


i10i XbbY

the regressionintercept

regression slopeY value forobservation i

Value of X forobservation i

SST = total sum of squares (Total Variation)

Measures the variation of the Yi values around theirmean Y

SSR = regression sum of squares (Explained Variation)

(continued)

Measures of Variation


SSR = regression sum of squares (Explained Variation)

Variation attributable to the relationship between Xand Y

SSE = error sum of squares (Unexplained Variation)

Variation in Y attributable to factors other than X

(continued)

Yi

SST = (Yi -

SSE = (Yi -Yi )

2

_

Y

Y

Measures of Variation


Xi

Y

X

SST = (Yi -Y)2

SSR = (Yi - Y)2

_

_

Y_Y

The coefficient of determination is the portionof the total variation in the dependent variablethat is explained by variation in theindependent variable

Coefficient of Determination, r2


The coefficient of determination is also calledr-squared and is denoted as r2

1r0 2 note:

squaresofsum

squaresofregression2

total

sum

SST

SSRr

Examples of Approximater2 Values

Y

r2 = 1

Perfect linear


r2 = 1

X

Y

X

r2 = 1

Perfect linearrelationship between Xand Y:

100% of the variation inY is explained byvariation in X


Y

0 < r2 < 1

Weaker linear


X

Y

X

Weaker linearrelationships betweenX and Y:

Some but not all of thevariation in Y isexplained by variationin X


r2 = 0

No linear relationship

Y


No linear relationshipbetween X and Y:

The value of Y does notdepend on X. (None ofthe variation in Y isexplained by variationin X)

Xr2 = 0

What is a Hypothesis?mba.teipir.gr/files/presentation.pdf · 2013. 2. 14. · What is a Hypothesis?...

Documents

Transcript of What is a Hypothesis?mba.teipir.gr/files/presentation.pdf · 2013. 2. 14. · What is a Hypothesis?...