Chapter 9 Power. Decisions A null hypothesis significance test tells us the probability of obtaining...

Post on 12-Jan-2016

224 views 0 download

Transcript of Chapter 9 Power. Decisions A null hypothesis significance test tells us the probability of obtaining...

Chapter 9

Power

DecisionsA null hypothesis significance test tells

us the probability of obtaining our results when the null hypothesis is true

p(Results|Ho is True) If that probability is small, smaller than

our significance level (α), it is probable that HO is not true and we reject it

Errors in Hypothesis TestingSometimes we make the correct

decision regarding HO

Sometimes we make mistakes when conducting hypothesis tests– Remember: we are talking about

probability theory– Less than a .05 chance, doesn’t mean “no

chance at all”

Errors in Hypothesis Testing

Type 1 ErrorsThe null hypothesis is correct (in

reality) but we have rejected it in favor of the alternative hypothesis

The probability of making a Type 1 error is equal to α, the significance level we have selected– α - the probability of rejecting a null

hypothesis when it is true

Type 2 ErrorsThe null hypothesis is incorrect, but we

have failed to reject it in favor of the alternative hypothesis

The probability of a type 2 error is signified by β, and the “power” of a statistical test is 1 - β– Power (1- β) - the probability of rejecting a

null hypothesis when it is false

More on α and β

Relation between α and βAlthough they are related, the relation

is complex– If α = .05, the probability of making a

correct decision when the null hypothesis is true is 1 – α = .95

What if the null hypothesis is not true?– The probability of rejecting the null when it

is not true is 1 - β

Relation between α and β In general, we do not set β, but it is a

direct outcome of our experiment and can be determined (we can estimate β by designing our

experiment properly)

β is generally greater than αOne way to decrease β is by

increasing αBut, we don’t want to do that. Why,

you ask?

α and β reconsidered Minimize chances of finding an innocent

man guilty vs. finding a guilty man innocent Likewise, we should reduce the likelihood of

finding an effect when there isn’t one (making a type 1 error - reject HO when HO is true), vs. decreasing the likelihood of missing an effect when there is one (making a type 2 error - not rejecting HO when HO is false)

Power?The probability of rejecting a false null

hypothesis The probability of making a correct

decision (one type of) Addresses the type 2 error: “Not

finding any evidence of an effect when one is there”

More (on) PowerWhile most focus on type 1 errors, you

can’t be naïve (anymore) to type 2 errors, as well

Thus, power analyses are becoming the norm in psychological statistics (or they should be)

Hypothesis testing & Power

Sampling distribution of the sample mean, when HO is true

μ specified in HO

HO: μ =0

0Our sample meanM

HO: μ=0

0Our sample mean

The probability of obtainingour sample mean (or less) given that the null hypothesis is true

M

HO: μ=0

0Our sample mean

We reject the null that our sample came from the distribution specified by HO, because if it were true, our sample mean would be highly improbable,

M

HO: μ=0

0Our sample mean

Improbable means “not likely” but not “impossible”, so the probability that we made an error and rejected HO when it was true is this area

OOPS!

M

HO: μ=0

0Our sample mean

This area is our “p-value” and as long as it is less than α, we reject HO

M

HO: μ=0

0

As a reminder and a little “visual” help, α defines the critical value and the rejection region

Critical ValueRejection Region

HO: μ=0

0 Critical ValueRejection Region

Any sample mean that falls within the rejection region (< and/or > the critical value(s)), we will reject HO

Let’s say, though, that our sample mean is really from a different distribution than specified by HO, one that’s consistent with HA

Rejection Region

We assume that this second sampling distribution consistent with HA, is normally distributed around our sample mean

Rejection Region

Our M

If HO is false, the probability of rejecting then, is the area under the second distribution that’s part of the rejection region

Rejection Region

Namely, this area

Rejection Region

And, we all know the probability of rejecting a false HO is POWER

Rejection Region

POWER

Rejection Region

POWER1-β β

Rejection Region

1-αα

Factors that influence power: α

Rejection Region

POWER

Rejection Region

Factors that influence power: variability

Power

Rejection Region

Factors that influence power: sample size

Power

Rejection Region

Power

Factors that influence power: effect size

(this difference is increased)

Factors that Influence Powerα - significance level (the probability of

making a type 1 error)

Parametric Statistical TestsParametric statistical tests, those that

test hypotheses about specific population parameters, are generally more powerful than corresponding non-parametric tests

Therefore, parametric tests are preferred to non-parametric tests, when possible

VariabilityMeasure more accuratelyDesign a better experimentStandardize procedures for acquiring

dataUse a dependent-sample

Directional Alternative HypothesisA directional HA specifies which tail of

the distribution is of interest (e.g., HA is specified as < or > some value rather than “different than” or ≠ )

Increasing Sample Size (n) σM, the standard error of the mean, decreases

with increases in sample size

Increasing Sample size

n=25, σM = 2.0

n=400, σM = 0.5

n=100, σM = 1.0

Effect SizeEffect size is directly related to power

Effect SizeEffect size - measure of the magnitude

of the effect of the intervention being studied

Effect is related to the magnitude of the difference between a hypothesized mean (what we might think it is given the intervention) and the population mean (μ)

Cohen’s d .2 = small effect .5 = moderate effect .8 = large effectFor each statistical test, separate

formulae are needed to determine d, butWhen you do this, results are directly

comparable regardless of the test used

Implications of Effect SizeA study was conducted by Dr. Johnson

on productivity in the workplaceHe compared Method A with Method BUsing an n = 80, Johnson found that A

was better than B at p < .05(he rejected the null that A and B were

identical, and accepted the directional alternative that A was better)

Implications (cont.)Dr. Sockloff, who invented Method B,

disputed these claims and repeated the study

Using an n = 20, Sockloff found no difference between A and B at p > .30

(he did not reject the null that A and B were equal)

How can this be? In both cases the effect size was

determined to be .5 (the effectiveness of Method A was identical in both studies)

However, Johnson could detect an effect because he had the POWER

Sockloff had very low power, and did not detect an effect (he had a low probability of rejecting an incorrect null)

Power and Effect SizeA desirable level of power is .80

(Cohen, 1965)Thus, β = .20And, by setting an effect size (the

magnitude of the smallest discrepancy that, if it exists, we would be reasonably sure of detecting)

We can find an appropriate n (sample size)

Method for Determining Sample Size (n)A priori, or before the studyDirectional or Non-Directional?Set significance level, α What level of power do we want?Use table B to look up δ (“delta”)Determine effect size and use:

n = (δ/d)2

Example of Power Analysisα = .051-β = .80 look up in table B, δ = 2.5d = .5 (moderate effect)n = (δ/d)2 = (2.5/.5)2 = 25So, in order to detect a moderate effect

(.5) with power of .80 and α of .05, we need 25 subjects in our study

***Main Point*** (impress your Research Methods prof)

Good experimental design always utilizes power and effect size analyses prior to conducting the study

Inductive Leap The probability of obtaining a particular

result assuming the null is true (p level) is equal to a measure of effect size times a measure of the size of the sample

p = effect size × size of study Therefore, p (the probability of a type 1

error) is influenced by both the size of the effect and the size of the study

Remember, if we want to reject the null, we want a small p (less than alpha)