Chapter 9
Power
DecisionsA null hypothesis significance test tells
us the probability of obtaining our results when the null hypothesis is true
p(Results|Ho is True) If that probability is small, smaller than
our significance level (α), it is probable that HO is not true and we reject it
Errors in Hypothesis TestingSometimes we make the correct
decision regarding HOSometimes we make mistakes when
conducting hypothesis tests– Remember: we are talking about
probability theory– Less than a .05 chance, doesn’t mean “no
chance at all”
Errors in Hypothesis Testing
Type 1 ErrorsThe null hypothesis is correct (in
reality) but we have rejected it in favor of the alternative hypothesis
The probability of making a Type 1 error is equal to α, the significance level we have selected– α - the probability of rejecting a null
hypothesis when it is true
Type 2 ErrorsThe null hypothesis is incorrect, but we
have failed to reject it in favor of the alternative hypothesis
The probability of a type 2 error is signified by β, and the “power” of a statistical test is 1 - β– Power (1- β) - the probability of rejecting a
null hypothesis when it is false
More on α and β
Relation between α and βAlthough they are related, the relation
is complex– If α = .05, the probability of making a
correct decision when the null hypothesis is true is 1 – α = .95
What if the null hypothesis is not true?– The probability of rejecting the null when it
is not true is 1 - β
Relation between α and β In general, we do not set β, but it is a
direct outcome of our experiment and can be determined (we can estimate β by designing our
experiment properly)
β is generally greater than αOne way to decrease β is by
increasing αBut, we don’t want to do that. Why,
you ask?
α and β reconsidered Minimize chances of finding an innocent
man guilty vs. finding a guilty man innocent Likewise, we should reduce the likelihood of
finding an effect when there isn’t one (making a type 1 error - reject HO when HO is true), vs. decreasing the likelihood of missing an effect when there is one (making a type 2 error - not rejecting HO when HO is false)
Power?The probability of rejecting a false null
hypothesis The probability of making a correct
decision (one type of) Addresses the type 2 error: “Not
finding any evidence of an effect when one is there”
More (on) PowerWhile most focus on type 1 errors, you
can’t be naïve (anymore) to type 2 errors, as well
Thus, power analyses are becoming the norm in psychological statistics (or they should be)
Hypothesis testing & Power
Sampling distribution of the sample mean, when HO is true
μ specified in HO
HO: μ =0
0Our sample mean M
HO: μ=0
0Our sample mean
The probability of obtainingour sample mean (or less) given that the null hypothesis is true
M
HO: μ=0
0Our sample mean
We reject the null that our sample came from the distribution specified by HO, because if it were true, our sample mean would be highly improbable,
M
HO: μ=0
0Our sample mean
Improbable means “not likely” but not “impossible”, so the probability that we made an error and rejected HO when it was true is this area
OOPS!
M
HO: μ=0
0Our sample mean
This area is our “p-value” and as long as it is less than α, we reject HO
M
HO: μ=0
0
As a reminder and a little “visual” help, α defines the critical value and the rejection region
Critical ValueRejection Region
HO: μ=0
0 Critical ValueRejection Region
Any sample mean that falls within the rejection region (< and/or > the critical value(s)), we will reject HO
Let’s say, though, that our sample mean is really from a different distribution than specified by HO, one that’s consistent with HA
Rejection Region
We assume that this second sampling distribution consistent with HA, is normally distributed around our sample mean
Rejection Region
Our M
If HO is false, the probability of rejecting then, is the area under the second distribution that’s part of the rejection region
Rejection Region
Namely, this area
Rejection Region
And, we all know the probability of rejecting a false HO is POWER
Rejection Region
POWER
Rejection Region
POWER1-β β
Rejection Region
1-αα
Factors that influence power: α
Rejection Region
POWER
Rejection Region
Factors that influence power: variability
Power
Rejection Region
Factors that influence power: sample size
Power
Rejection Region
Power
Factors that influence power: effect size
(this difference is increased)
Factors that Influence Powerα - significance level (the probability of
making a type 1 error)
Parametric Statistical TestsParametric statistical tests, those that
test hypotheses about specific population parameters, are generally more powerful than corresponding non-parametric tests
Therefore, parametric tests are preferred to non-parametric tests, when possible
VariabilityMeasure more accuratelyDesign a better experimentStandardize procedures for acquiring
dataUse a dependent-sample
Directional Alternative HypothesisA directional HA specifies which tail of
the distribution is of interest (e.g., HA is specified as < or > some value rather than “different than” or ≠ )
Increasing Sample Size (n) σM, the standard error of the mean, decreases
with increases in sample size
Increasing Sample size
n=25, σM = 2.0
n=400, σM = 0.5
n=100, σM = 1.0
Effect SizeEffect size is directly related to power
Effect SizeEffect size - measure of the magnitude
of the effect of the intervention being studied
Effect is related to the magnitude of the difference between a hypothesized mean (what we might think it is given the intervention) and the population mean (μ)
Cohen’s d .2 = small effect .5 = moderate effect .8 = large effectFor each statistical test, separate
formulae are needed to determine d, butWhen you do this, results are directly
comparable regardless of the test used
Implications of Effect SizeA study was conducted by Dr. Johnson
on productivity in the workplaceHe compared Method A with Method BUsing an n = 80, Johnson found that A
was better than B at p < .05(he rejected the null that A and B were
identical, and accepted the directional alternative that A was better)
Implications (cont.)Dr. Sockloff, who invented Method B,
disputed these claims and repeated the study
Using an n = 20, Sockloff found no difference between A and B at p > .30
(he did not reject the null that A and B were equal)
How can this be? In both cases the effect size was
determined to be .5 (the effectiveness of Method A was identical in both studies)
However, Johnson could detect an effect because he had the POWER
Sockloff had very low power, and did not detect an effect (he had a low probability of rejecting an incorrect null)
Power and Effect SizeA desirable level of power is .80
(Cohen, 1965)Thus, β = .20And, by setting an effect size (the
magnitude of the smallest discrepancy that, if it exists, we would be reasonably sure of detecting)
We can find an appropriate n (sample size)
Method for Determining Sample Size (n)A priori, or before the studyDirectional or Non-Directional?Set significance level, α What level of power do we want?Use table B to look up δ (“delta”)Determine effect size and use:
n = (δ/d)2
Example of Power Analysisα = .051-β = .80 look up in table B, δ = 2.5d = .5 (moderate effect)n = (δ/d)2 = (2.5/.5)2 = 25So, in order to detect a moderate effect
(.5) with power of .80 and α of .05, we need 25 subjects in our study
***Main Point*** (impress your Research Methods prof)
Good experimental design always utilizes power and effect size analyses prior to conducting the study
Inductive Leap The probability of obtaining a particular
result assuming the null is true (p level) is equal to a measure of effect size times a measure of the size of the sample
p = effect size × size of study Therefore, p (the probability of a type 1
error) is influenced by both the size of the effect and the size of the study
Remember, if we want to reject the null, we want a small p (less than alpha)
Top Related