RMTD 404 Lecture 8. 2 Power Recall what you learned about statistical errors in Chapter 4: Type I...

RMTD 404

Lecture 8

2

Power

Recall what you learned about statistical errors in Chapter 4:

• Type I Error: Finding a difference when there is no true difference in the populations (i.e., incorrectly rejecting a true null hypothesis), designated by α.

• Type II Error: Not finding a difference when there is a true difference in the populations (i.e., incorrectly retaining a false null hypothesis), designated by β.

Power is the probability of finding a difference when there is a true difference in the populations (i.e., correctly rejecting a false null hypothesis), designated 1-β.

4

Factors affecting power

There are four key factors that influence the power of a statistical test:

1. The alpha (α) that a researcher chooses;

2. The magnitude of the true population difference (effect size)

3. The sample size

4. The statistical test used

Let’s try some of these in R (http://homepages.luc.edu/~rwill5/code.html)

Power

http://homepages.luc.edu/~rwill5/code.html

5

Alpha’s influence on power

A small alpha (α) makes the critical value more extreme so that less of the alternative distribution is allocated to the rejection region. Hence, we have less power with smaller alphas.

A larger α makes the critical value less extreme so that more of the alternative distribution is allocated to the rejection region. Hence, we have more power with larger alphas.

Alpha = .05

Alpha = .10

6

Effect size’s influence on power

A small effect size makes the critical value more extreme on the alternative distribution so that less of that distribution’s area is allocated to the rejection region. Hence, we have less power with smaller effect sizes.

A larger effect size makes the critical value less extreme on the alternative distribution so that more of that distribution’s area is allocated to the rejection region. Hence, we have more power with larger effect sizes.

7

A small sample size makes the critical value more extreme on the alternative distribution so that less of that distribution’s area is allocated to the rejection region. Hence, we have less power with smaller sample sizes.

A larger sample size makes the critical value less extreme on the alternative distribution so that more of that distribution’s area is allocated to the rejection region. Hence, we have more power with larger sample sizes.

Power

8

Influence of sample size & variance on power

Recall that the central limit theorem defines the standard error of the mean as

Hence, as sample size increases, the size of the standard error of the mean decreases. As the sample size decreases, the size of the standard error of the mean increases.

Similarly, as σx decreases, the standard error of the mean would also decrease, indicating that effects are easier to detect with more homogeneous populations.

NX

X

9

Influence of statistical test on power

One last note, different statistical tests provide different levels of power, all other things being equal. The increase in power results from assumptions that are made about the data being analyzed. Of course, if these assumptions are invalid, then the data-based decision that you make based on a hypothesis test may also be invalid.

10

Estimating Sample Size

A good reason to perform power analysis is that these computations allow you to estimate the sample size that you would need to detect what you believe is a meaningful effect size.

An important component of the power analysis computation is the effect size indicator. You must specify the size of the effect you wish to detect in order to determine the sample size that you must use.

However, the previous equations have suggested that you must have your data in hand in order to get the effect size indicator. Hence, you need an estimate of the effect size in order to perform power and sample size computations.

So how may people do I need?We will use my favorite table thus far – the Power/Delta Table

d n

11

Sources of effect size estimation

There are three ways to estimate the size of the effect that you’ll want to detect:

• Prior Research: You can estimate the effect size from prior studies that give the necessary statistics. This will allow you to detect effect sizes similar to those found by other researchers in similar studies.

• Professional Judgment: Based on your own experiences, you may be able to identify an effect size that is substantively interesting. This will allow you to detect effect sizes that have real-world meaning, based on your experiences.

• Convention: You can also use Cohen’s rule of thumb (e.g., small = .20, medium = .50, large = .80). This approach is probably only advisable when you don’t have enough information to perform the estimation use either of the previous approaches.

12

An effect size indicator for t-tests

One measure of the magnitude of an effect, an effect size indicator, depicts the magnitude of the effect scaled in population standard deviation units.

parameter version

If we want to estimate d from observed data, then we can transform the equation to:

statistic version

A rule of thumb for interpreting d is that:

d = .20 is a small effect sized = .50 is a medium effect sized = .80 is a large effect size

X

Xd

0

Xs

Xd 0

13

A visual depiction of d

d = .20%overlap = 85



d = 1.10%overlap = 41

14

Another Effect Size Indicator for t-tests

A similar measure of the magnitude of an effect is the squared point-biserial correlation, which is similar to the measures of association that we discussed in the context of the chi-square test. Rather than depicting the magnitude of the effect on the population standard deviation scale (as is the case for d), the squared point-biserial correlation indicates the proportion of shared variance between the independent and dependent variable.

A rule of thumb for interpreting is that:

= .01 is a small effect size

= .06 is a medium effect size

= .14 is a large effect size

dft

tr

observed

observedpb

2

22

2pbr

2pbr

2pbr

2pbr

15

Reporting effect size indicators

To provide a more informative substantive interpretation, we would report and interpret the effect size indicators. So, we might say something like the following.

The difference in means for students in Program A (31.21) and Program B (37.86) is too large to be accounted for by sampling error, t(15) = 2.23, p = .02. In addition, this effect size is quite large ( ), indicating that the observed difference is not an artifact of a large sample size.

Scores of females (M = 3.56) were higher than the scores of males (M = 2.21), and this difference was statistically significant and the effect size was moderate, t(20) = 4.41, p < .0001, d = .55.

There was a statistically significant difference between the mean ratings of hubands and wives ( ), but the effect size indicated that this difference is probably trivial ( ).

25.2 pbr

01.,31.2 pD003.2 pbr

16

Effect size calculations: One-sample t-test

For the one-sample t-test, d is estimated as:

We can interpret the observed value as defined by Cohen’s rule-of-thumb criteria with values of .8 indicating large effect sizes.

The d index is very important in planning a study, because you need to specify a meaningful effect size that you’d like to detect in order to determine the sample size required to detect that difference.

As you have seen earlier that the effect size is related to the sample size. We use the statistic δ (delta)=d[f(n)] to represent this combination where the particular function of n will be defined differently for each individual test.

Xs

Xd 0

17

For the one-sample t-test, δ is based on the function of . Specifically, . Given δ as defined here, we can determine the power of the

one-sample t test from the table of power on p.678.

Back to the example we had for one-sample t-test:The mean GRE score of 300 students in School of Education at LUC is 565, and

the standard deviation equals 75. We know the mean of the GRE test-taker population is 500. Thus,

Then From the Appendix Power, for δ=15.07 with α=0.05, the power is beyond 0.99. This means that, if we reject the null hypothesis, we are 99% certain our students’ GRE mean is different from 500. There is still less than 1% of the chance to make Type II error.

nd n

0 565 5000.87

75X

Xd

s

0 X565, 500,and s 75.X

0.87* 300 15.07d n

18

Sometimes the researcher is interested in knowing how many samples he should have in his study in order to obtain certain power.

For example, a researcher wants to set power at .80 when he thinks (based on previous experience or literature) the effect size of her study is around d=0.20. According to the Appendix Power table, for power= .80 and α=0.05, δ must equal 2.80.

And we have δ and ca simply solve for n.

Therefore, if the researcher wants to have an 80% chance of rejecting the null hypothesis when the effect size is 0.2, he will have to use 196 random samples.

2 22.80

1960.20

d n

nd

19

[Example]

Literature Show that main influence score of peer pressure is 520 with a standard deviation of 80. An investigator would like to show that a minor change in conditions will produce scores with mean of only 500. He plans to run a t test to compare his sample mean with a population mean of 520.

Effect size:

If the sample size is 100, the δis:

Check the Appendix Power table, the power= .71

500 5200.25

80d

0.25* 100 2.5d n

20

What sample sizes would be needed to raise power to .70, .80, and .90?

(1)To have power=.70 with α=.05 , the δis close to 2.50.δ=2.40 power=0.67δ=2.50 power=0.71You can use interpolation and delta=2.475.

To still detect the d=-2.5 with delta=2.475:

(2)To have power=.80 with α=.05 , the δis close to 2.8.

2 22.475

98.01 99( )0.25

d n

n round upd

2.5 0.71 0.70

2.5 2.4 0.71 0.67

delta

2 22.8

125.44 1260.25

d n nd

21

What sample sizes would be needed to raise power to .70, .80, and .90?

(3)To have power=.90 with α=.05 , the δis in between 3.20 and 3.30.δ=3.20 power=0.89δ=3.30 power=0.91

Use interpolation , and delta=3.25.

To still detect the d=-2.5 with delta=3.25:

2 23.25

1690.25

d n

nd

3.30 0.91 0.90

3.30 3.20 0.91 0.89

delta

22

Effect Sizes: Two Independent-Samples

The effect size index for the two independent sample t-test is defined as follows.

spooled is defined as the common standard deviation (recall that we typically assume that the variances are equal).

for equal-sized samples.

can be known from the population, estimated based on prior research, or estimated from the data.

1 2 1 2

pooled

X Xd

s

2

2221 XX

pooled

sss

1 2 X Xs and s

23

In the case of unequal sample sizes, we pooled the variance as we do when computing the t-test. Recall what the pooled variance does—it estimates the population variance, weighting each sample variance by its sample size. Hence, the pooled variance is an estimate of the population variance that weights each case in the study equally.

So, we can rewrite d for the unequal sample size case as follows.

And we need to calculate δ to find the power. The δ for the two-sample case is defined as

, and we also need to know n when it’s not the same

for the two groups.

1 2 1 2

2 21 1 2 2

1 2

1 1

2

pooled

X X X Xd

s n s n s

n n

2

nd

24

When we deal with power for the t-test with unequal sample sizes, we need a single value of n to work with the power tables, so we need to combine the sample sizes from the two groups. The formula for the effective sample size is based on the harmonic mean.

Note that when we have unequal sample sizes, we need more participants to achieve the same level of power as a study in which sample sizes are equal (a balanced study). Consider the following two ways of dividing 100 participants into two groups.

In this case, 100 people in theunbalanced design has powerequivalent to a balanced

study with only 96 people.

What’s the point?—balance your samples when possible.

1 2

1 2

1 2

221 1h

n nn

n nn n

1 2

1 2

2 2 40 6048

40 60

compared to

50

n nn

n n

n in balanced studies

25

Let’s calculate the power of the two independent samples t-test that was shown on p.13 in the t-test slide set.

IV: Teacher’s happiness (0=low happiness; 1=high happiness)DV: Student’s achievement

Effect size: . With the unequal sample size,

Solving this for n gives us which is the sample size per group

d = (51.3812 – 50.4083) / ((10.3785+9.0362)/2) = .1nh = (2*117*138)/(117+138) = 126δ = .1*sqrt(126/2) = 0.7937254

1 2

pooled

X Xd

s

2

2221 XX

pooled

sss

Group Statistics

117 50.4083 10.37854 .95950

138 51.3812 9.03615 .76921

COMPOSITE SEXMALE

FEMALE

Follow-upReading std score

N Mean Std. DeviationStd. Error

Mean

2

nd

26

Summary:

One-Sample T-Test (effect size, delta (for power estimate), and sample size:

Two-Independent Samples Test (effect size, delta (for power estimate), and per-group group sample size:

1 2

pooled

X Xd

s

2

2221 XX

pooled

sss

2

nd

Xs

Xd 0

27

Effect Sizes: Matched-Samples

The d index for the matched sample t-test is defined as:

is the standard deviation of mean difference

A problem arises: To calculate , we need to know the correlation between X1 and X2.

According to the variance sum law:

To solve the problem, we make the general assumption of homogeneity of variance

1 2

1 2

( )D

X X

X Xd

s

1 2( )X Xs

1 2( )X Xs

1 2 1 2 1 2

2 2 2( ) 2X X X X X X

1 2

2 2 2X X

28

So the variance sum law can be revised

The statistic form:

Then we have to come up with the best guess of the correlation between X1 and X2 to calculate the . And the δ is defined as .

1 2 1 2 1 2

1 2

2 2 2( )

2 2 2 2 2

2

( )

2

2 2 2

=2 (1 )

2(1 )

X X X X X X

X XSo

1 2( )X X d n

1 2( ) 2(1 )X Xs s r

RMTD 404 Lecture 8. 2 Power Recall what you learned about statistical errors in Chapter 4: Type I...

Documents

Transcript of RMTD 404 Lecture 8. 2 Power Recall what you learned about statistical errors in Chapter 4: Type I...