What is a Hypothesis?mba.teipir.gr/files/presentation.pdf · 2013. 2. 14. · What is a Hypothesis?...
Transcript of What is a Hypothesis?mba.teipir.gr/files/presentation.pdf · 2013. 2. 14. · What is a Hypothesis?...
What is a Hypothesis?
A hypothesis is a claim(assumption) about apopulation parameter:
population mean
Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc.. Chap 9-1
population mean
population proportion
Example: The mean monthly cell phone billin this city is μ = $42
Example: The proportion of adults in thiscity with cell phones is π = 0.68
The Null Hypothesis, H0
States the claim or assertion to be tested
Example: The average number of TV sets in
U.S. Homes is equal to three ( )3μ:H0
Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc.. Chap 9-2
Is always about a population parameter,not about a sample statistic
3μ:H0 3X:H0
The Null Hypothesis, H0
Begin with the assumption that the nullhypothesis is true
Similar to the notion of innocent untilproven guilty
(continued)
Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc.. Chap 9-3
proven guilty
Refers to the status quo or historical value
Always contains “=” , “≤” or “” sign
May or may not be rejected
The Alternative Hypothesis, H1
Is the opposite of the null hypothesis
e.g., The average number of TV sets in U.S.homes is not equal to 3 ( H1: μ ≠ 3 )
Challenges the status quo
Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc.. Chap 9-4
Challenges the status quo
Never contains the “=” , “≤” or “” sign
May or may not be proven
Is generally the hypothesis that theresearcher is trying to prove
The Hypothesis TestingProcess
Claim: The population mean age is 50. H0: μ = 50, H1: μ ≠ 50
Sample the population and find sample mean.
Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc.. Chap 9-5
Population
Sample
The Hypothesis TestingProcess
SamplingDistribution of X
(continued)
Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc.. Chap 9-6
μ = 50If H0 is true
If it is unlikely that youwould get a samplemean of this value ...
... then you rejectthe null hypothesis
that μ = 50.
20
... When in fact this werethe population mean…
X
The Test Statistic andCritical Values
If the sample mean is close to the assumedpopulation mean, the null hypothesis is notrejected.
If the sample mean is far from the assumed
Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc.. Chap 9-7
If the sample mean is far from the assumedpopulation mean, the null hypothesis is rejected.
How far is “far enough” to reject H0?
The critical value of a test statistic creates a “line inthe sand” for decision making -- it answers thequestion of how far is far enough.
The Test Statistic andCritical Values
Sampling Distribution of the test statistic
Region ofRejection
Region ofRejection
Region of
Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc.. Chap 9-8
Critical Values
“Too Far Away” From Mean of Sampling Distribution
RejectionRegion of
Non-Rejection
Possible Errors in Hypothesis TestDecision Making
Type I Error
Reject a true null hypothesis
Considered a serious type of error
The probability of a Type I Error is
Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc.. Chap 9-9
Called level of significance of the test
Set by researcher in advance
Type II Error
Failure to reject false null hypothesis
The probability of a Type II Error is β
Possible Errors in Hypothesis TestDecision Making
Possible Hypothesis Test Outcomes
Actual Situation
(continued)
Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc.. Chap 9-10
Decision H0 True H0 False
Do NotReject H0
No Error
Probability 1 - α
Type II Error
Probability β
Reject H0 Type I Error
Probability α
No Error
Probability 1 - β
Possible Errors in Hypothesis TestDecision Making
The confidence coefficient (1-α) is the probability of not rejecting H0 when it is true.
The confidence level of a hypothesis test is
(continued)
Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc.. Chap 9-11
The confidence level of a hypothesis test is(1-α)*100%.
The power of a statistical test (1-β) is the probability of rejecting H0 when it is false.
Type I & II Error Relationship
Type I and Type II errors cannot happen atthe same time
A Type I error can only occur if H0 is true
Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc.. Chap 9-12
A Type I error can only occur if H0 is true
A Type II error can only occur if H0 is false
If Type I error probability ( ) , then
Type II error probability ( β )
Factors Affecting Type II Error
All else equal,
β when the difference between
hypothesized parameter and its true value
Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc.. Chap 9-13
β when
β when σ
β when n
Level of Significanceand the Rejection Region
Level of significance = aH0: μ = 3
H1: μ ≠ 3
/2a/2a
Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc.. Chap 9-14
This is a two-tail test because there is a rejection region in both tails
Critical values
Rejection Region
0
2 Test of Independence
Similar to the 2 test for equality of more thantwo proportions, but extends the concept tocontingency tables with r rows and c columns
Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc.. Chap 12-15
H0: The two categorical variables are independent
(i.e., there is no relationship between them)
H1: The two categorical variables are dependent
(i.e., there is a relationship between them)
Wilcoxon Rank-Sum Test:Hypothesis and Decision Rule
H : M = M H : M MH : M MTwo-Tail Test Left-Tail Test Right-Tail Test
M1 = median of population 1; M2 = median ofpopulation 2Test statistic = T1 (Sum of ranks from smallersample)
Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc.. Chap 12-16
H0: M1 = M2
H1: M1 M2
H0: M1 M2
H1: M1 > M2
H0: M1 M2
H1: M1 < M2
Reject
T1L T1U
RejectDo NotReject
Reject
T1L
Do Not Reject
T1U
RejectDo Not Reject
Reject H0 if T1 ≤ T1Lor if T1 ≥ T1U
Reject H0 if T1 ≤ T1L Reject H0 if T1 ≥ T1U
Sample data are collected on the capacity rates
(% of capacity) for two factories.
Are the median operating rates for two factories
the same?
Wilcoxon Rank-Sum Test:Small Sample Example
Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc.. Chap 12-17
the same?
For factory A, the rates are 71, 82, 77, 94, 88
For factory B, the rates are 85, 82, 92, 97
Test for equality of the population medians
at the 0.05 significance level
Wilcoxon Rank-Sum Test:Small Sample Example
Capacity Rank
Factory A Factory B Factory A Factory B
71 1
77 2
82 3.5Tie in 3rd
RankedCapacit
yvalues:
(continued)
Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc.. Chap 12-18
82 3.5
82 3.5
85 5
88 6
92 7
94 8
97 9
Rank Sums: 20.5 24.5
Tie in 3rd
and 4th
places
Wilcoxon Signed Ranks Test
A nonparametric test for two related populations
Steps:
1. For each of n sample items, compute the difference,Di, between two measurements
Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc.. Chap 12-19
Di, between two measurements
2. Ignore + and – signs and find the absolute values, |Di|
3. Omit zero differences, so sample size is n’
4. Assign ranks Ri from 1 to n’ (give average rank to
ties)
5. Reassign + and – signs to the ranks Ri
6. Compute the Wilcoxon test statistic W as the sum ofthe positive ranks
Wilcoxon Signed RanksTest Statistic
The Wilcoxon signed ranks test statistic is thesum of the positive ranks:
n'
)(RW
Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc.. Chap 12-20
For small samples (n’ < 20), use Table E.9 for
the critical value of W
1i
)(iRW
Wilcoxon Signed RanksTest Statistic
For samples of n’ > 20, W is approximately
normally distributed with
1)(n'n'
Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc.. Chap 12-21
24
1)1)(2n'(n'n'σ
4
1)(n'n'μ
W
W
Wilcoxon Signed Ranks Test
The large sample Wilcoxon signed ranks Ztest statistic is
4
1)(n'n'W
Z
Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc.. Chap 12-22
To test for no median difference in the pairedvalues:
H0: MD = 0
H1: MD ≠ 0
24
1)1)(2n'(n'n'
4ZSTAT
Kruskal-Wallis Rank Test
Tests the equality of more than 2 populationmedians
Use when the normality assumption for one-way ANOVA is violated
Assumptions:
Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc.. Chap 12-23
Assumptions: The samples are random and independent
Variables have a continuous distribution
The data can be ranked
Populations have the same variability
Populations have the same shape
Kruskal-Wallis Test Procedure
Obtain rankings for each value
In event of tie, each of the tied values gets theaverage rank
Sum the rankings for data from each of the c
Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc.. Chap 12-24
Sum the rankings for data from each of the cgroups
Compute the H test statistic
Kruskal-Wallis Test Procedure
The Kruskal-Wallis H-test statistic:(with c – 1 degrees of freedom)
)1n(3T12
Hc 2
j
(continued)
Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc.. Chap 12-25
)1n(3n)1n(n
12H
1j j
j
where:n = sum of sample sizes in all groupsc = Number of groupsTj = Sum of ranks in the jth groupnj = Number of values in the jth group (j = 1, 2, … , c)
Correlation vs. Regression
A scatter plot can be used to show therelationship between two variables
Correlation analysis is used to measure thestrength of the association (linear relationship)between two variables
Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc.. Chap 13-26
between two variables
Correlation is only concerned with strength of therelationship
No causal effect is implied with correlation
Scatter plots were first presented in Ch. 2
Correlation was first presented in Ch. 3
Introduction toRegression Analysis
Regression analysis is used to:
Predict the value of a dependent variable based onthe value of at least one independent variable
Explain the impact of changes in an independent
Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc.. Chap 13-27
Explain the impact of changes in an independentvariable on the dependent variable
Dependent variable: the variable we wish topredict or explain
Independent variable: the variable used to predictor explain the dependentvariable
Simple Linear RegressionModel
Only one independent variable, X
Relationship between X and Y isdescribed by a linear function
Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc.. Chap 13-28
Changes in Y are assumed to be relatedto changes in X
Types of Relationships
Y Y
Linear relationships Curvilinear relationships
Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc.. Chap 13-29
Y
X
X
Y
X
X
Types of Relationships
Y Y
Strong relationships Weak relationships
(continued)
Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc.. Chap 13-30
Y
X
X
Y
X
X
Types of Relationships
Y
No relationship
(continued)
Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc.. Chap 13-31
Y
X
X
Simple Linear RegressionModel
PopulationY intercept
PopulationSlopeCoefficient
RandomErrorterm
Dependent
IndependentVariable
Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc.. Chap 13-32
ii10i εXββY Linear component
DependentVariable
Random Errorcomponent
(continued)
Y
Observed Valueof Y for Xi
ii10i εXββY
ε
Simple Linear RegressionModel
Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc.. Chap 13-33
Random Errorfor this Xi
value
X
Predicted Valueof Y for Xi
Xi
Slope =β1
Intercept = β0
εi
The simple linear regression equationprovides an estimate of the populationregression line
Simple Linear RegressionEquation (Prediction Line)
Estimate ofthe regression
Estimate of theregression slope
Estimated(or predicted)Y value for
Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc.. Chap 13-34
i10i XbbY
the regressionintercept
regression slopeY value forobservation i
Value of X forobservation i
SST = total sum of squares (Total Variation)
Measures the variation of the Yi values around theirmean Y
SSR = regression sum of squares (Explained Variation)
(continued)
Measures of Variation
Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc.. Chap 13-35
SSR = regression sum of squares (Explained Variation)
Variation attributable to the relationship between Xand Y
SSE = error sum of squares (Unexplained Variation)
Variation in Y attributable to factors other than X
(continued)
Yi
SST = (Yi -
SSE = (Yi -Yi )
2
_
Y
Y
Measures of Variation
Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc.. Chap 13-36
Xi
Y
X
SST = (Yi -Y)2
SSR = (Yi - Y)2
_
_
Y_Y
The coefficient of determination is the portionof the total variation in the dependent variablethat is explained by variation in theindependent variable
Coefficient of Determination, r2
Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc.. Chap 13-37
The coefficient of determination is also calledr-squared and is denoted as r2
1r0 2 note:
squaresofsum
squaresofregression2
total
sum
SST
SSRr
Examples of Approximater2 Values
Y
r2 = 1
Perfect linear
Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc.. Chap 13-38
r2 = 1
X
Y
X
r2 = 1
Perfect linearrelationship between Xand Y:
100% of the variation inY is explained byvariation in X
Examples of Approximater2 Values
Y
0 < r2 < 1
Weaker linear
Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc.. Chap 13-39
X
Y
X
Weaker linearrelationships betweenX and Y:
Some but not all of thevariation in Y isexplained by variationin X
Examples of Approximater2 Values
r2 = 0
No linear relationship
Y
Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc.. Chap 13-40
No linear relationshipbetween X and Y:
The value of Y does notdepend on X. (None ofthe variation in Y isexplained by variationin X)
Xr2 = 0