Power and Sample Size Boulder 2004 Benjamin Neale Shaun Purcell.

32
Power and Sample Size Boulder 2004 Benjamin Neale Shaun Purcell

Transcript of Power and Sample Size Boulder 2004 Benjamin Neale Shaun Purcell.

Page 1: Power and Sample Size Boulder 2004 Benjamin Neale Shaun Purcell.

Power and Sample Size

Boulder 2004Benjamin NealeShaun Purcell

Page 2: Power and Sample Size Boulder 2004 Benjamin Neale Shaun Purcell.

To Be Accomplished

• Introduce Concept of Power via Correlation Coefficient (ρ) Example

• Identify Relevant Factors Contributing to Power

• Practical:• Power Analysis for Univariate Twin Model• How to use Mx for Power

Page 3: Power and Sample Size Boulder 2004 Benjamin Neale Shaun Purcell.

Simple example

Investigate the linear relationship (between two random variables X and Y: =0 vs. 0 (correlation coefficient).

• draw a sample, measure X,Y

• calculate the measure of association (Pearson product moment corr. coeff.)• test whether 0.

Page 4: Power and Sample Size Boulder 2004 Benjamin Neale Shaun Purcell.

How to Test 0• assumed the data are normally

distributed

• defined a null-hypothesis ( = 0)

• chosen level (usually .05)

• utilized the (null) distribution of the test statistic associated with =0

• t= [(N-2)/(1-2)]

Page 5: Power and Sample Size Boulder 2004 Benjamin Neale Shaun Purcell.

How to Test 0• Sample N=40

• r=.303, t=1.867, df=38, p=.06 α=.05

• As p > α, we fail to reject = 0

have we drawn the correct conclusion?

Page 6: Power and Sample Size Boulder 2004 Benjamin Neale Shaun Purcell.

= type I error rate probability of deciding 0

(while in truth =0)

is often chosen to equal .05...why?

DOGMA

Page 7: Power and Sample Size Boulder 2004 Benjamin Neale Shaun Purcell.

N=40, r=0, nrep=1000 – central t(38), =0.05 (critical value 2.04)

-5 -4 -3 -2 -1 0 1 2 3 4 50

200

400

600

800

-5 -4 -3 -2 -1 0 1 2 3 4 50

0.1

0.2

0.3

0.4

NREP=1000

2.5% 2.5%

central t(38)

4.6% sign.

Page 8: Power and Sample Size Boulder 2004 Benjamin Neale Shaun Purcell.

Observed non-null distribution (=.2) and null distribution

-5 -4 -3 -2 -1 0 1 2 3 4 50

20

40

60

80

100

-5 -4 -3 -2 -1 0 1 2 3 4 50

0.1

0.2

0.3

0.4

0.5

rho=.20N=40Nrep=1000

abs(t)>2.04 in23%

null distribution t(38)

Page 9: Power and Sample Size Boulder 2004 Benjamin Neale Shaun Purcell.

In 23% of tests of =0, |t|>2.024 (=0.05), and thus draw the correct conclusion that of rejecting 0.

The probability of rejecting the null-hypothesis (=0) correctly is 1-, or the power, when a true effect exists

Page 10: Power and Sample Size Boulder 2004 Benjamin Neale Shaun Purcell.

Hypothesis Testing

• Correlation Coefficient hypotheses:– ho (null hypothesis) is ρ=0

– ha (alternative hypothesis) is ρ ≠ 0• Two-sided test, where ρ > 0 or ρ < 0 are one-sided

• Null hypothesis usually assumes no effect

• Alternative hypothesis is the idea being tested

Page 11: Power and Sample Size Boulder 2004 Benjamin Neale Shaun Purcell.

Summary of Possible Results

H-0 true H-0 false

accept H-0 1-reject H-0 1-

=type 1 error rate

=type 2 error rate

1-=statistical power

Page 12: Power and Sample Size Boulder 2004 Benjamin Neale Shaun Purcell.

Rejection of H0 Non-rejection of H0

H0 true

HA true

Nonsignificant result(1- )

Type II error at rate

Significant result(1-)

Type I error at rate

Page 13: Power and Sample Size Boulder 2004 Benjamin Neale Shaun Purcell.

Power• The probability of rejection of a

false null-hypothesis depends on: –the significance criterion ()–the sample size (N) –the effect size (NCP)

“The probability of detecting a given effect size in a population from a sample of size N, using significance criterion ”

Page 14: Power and Sample Size Boulder 2004 Benjamin Neale Shaun Purcell.

P(T)

T

alpha 0.05

Sampling distribution if HA were true

Sampling distribution if H0 were true

POWER = 1 -

Standard Case

Effect Size (NCP)

Page 15: Power and Sample Size Boulder 2004 Benjamin Neale Shaun Purcell.

P(T)

T

alpha 0.1

Sampling distribution if HA were true

Sampling distribution if H0 were true

POWER = 1 -

Impact of Less Cons. alpha

Page 16: Power and Sample Size Boulder 2004 Benjamin Neale Shaun Purcell.

P(T)

T

alpha 0.01

Sampling distribution if HA were true

Sampling distribution if H0 were true

POWER = 1 -

Impact of More Cons. alpha

Page 17: Power and Sample Size Boulder 2004 Benjamin Neale Shaun Purcell.

P(T)

T

alpha 0.05

Increased Sample Size

Sampling distribution if HA were true

Sampling distribution if H0 were true

POWER = 1 -

Page 18: Power and Sample Size Boulder 2004 Benjamin Neale Shaun Purcell.

P(T)

T

alpha 0.05

Sampling distribution if HA were true

Sampling distribution if H0 were true

POWER = 1 -

Increase in Effect Size

Effect Size (NCP)↑

Page 19: Power and Sample Size Boulder 2004 Benjamin Neale Shaun Purcell.

Effects on Power Recap

• Larger Effect Size

• Larger Sample Size

• Alpha Level shifts <Beware the False Positive!!!>

• Type of Data:– Binary, Ordinal, Continuous

Page 20: Power and Sample Size Boulder 2004 Benjamin Neale Shaun Purcell.

When To Do Power Calcs?

• Generally study planning stages of study

• Occasionally with negative result

• No need if significance is achieved

• Computed to determine chances of success

Page 21: Power and Sample Size Boulder 2004 Benjamin Neale Shaun Purcell.

Power Calculations Empirical

• Attempt to Grasp the NCP from Null

• Simulate Data under theorized model

• Calculate Statistics and Perform Test

• Given α, how many tests p < α

• Power = (#hits)/(#tests)

Page 22: Power and Sample Size Boulder 2004 Benjamin Neale Shaun Purcell.

Practical: Empirical Power 1

• We will Simulate Data under a model online

• We will run an ACE model, and test for C

• We will then submit our data and Shaun will collate it for us

• While he’s collating, we’ll talk about theoretical power calculations

Page 23: Power and Sample Size Boulder 2004 Benjamin Neale Shaun Purcell.

Practical: Empirical Power 2

• First get F:\ben\2004\ace.mx and put it into your directory

• We will paste our simulated data into this script, so open it now in preparation, and note both places where we must paste in the data

• Note that you will have to fit the ACE model and then fit the AE submodel

Page 24: Power and Sample Size Boulder 2004 Benjamin Neale Shaun Purcell.

Practical: Empirical Power 3

• Simulation Conditions– 30% A2 20% C2 50% E2

– Input:– A 0.5477 C of 0.4472 E of 0.7071– 350 MZ 350 DZ– Simulate and Space Delimited at– http://statgen.iop.kcl.ac.uk/workshop/unisim.html or

click here in slide show mode– Click submit after filling in the fields and you will get a

page of data

Page 25: Power and Sample Size Boulder 2004 Benjamin Neale Shaun Purcell.

Practical: Empirical Power 4

• With the data page, use control-a to select the data, control-c to copy, and in Mx control-v to paste in both the MZ and DZ groups.

• Run the ace.mx script with the data pasted in and modify it to run the AE model.

• Report the A, C, and E estimates of the first model, and the A and E estimates of the second model as well as both the-2log-likelihoods on the webpage http://statgen.iop.kcl.ac.uk/workshop/ or click here in slide show mode

Page 26: Power and Sample Size Boulder 2004 Benjamin Neale Shaun Purcell.

Practical: Empirical Power 5

• Once all of you have submitted your results we will take a look at the theoretical power calculation, using Mx.

• Once we have finished with the theory Shaun will show us the empirical distribution that we generated today

Page 27: Power and Sample Size Boulder 2004 Benjamin Neale Shaun Purcell.

Theoretical Power Calculations

• Based on Stats, rather than Simulations• Can be calculated by hand sometimes, but

Mx does it for us• Note that sample size and alpha-level are

the only things we can change, but can assume different effect sizes

• Mx gives us the relative power levels at the alpha specified for different sample sizes

Page 28: Power and Sample Size Boulder 2004 Benjamin Neale Shaun Purcell.

Theoretical Power Calculations

• We will use the power.mx script to look at the sample size necessary for different power levels

• In Mx, power calculations can be computed in 2 ways:– Using Covariance Matrices (We Do This One)– Requiring an initial dataset to generate a

likelihood so that we can use a chi-square test

Page 29: Power and Sample Size Boulder 2004 Benjamin Neale Shaun Purcell.

Power.mx 1! Simulate the data! 30% additive genetic! 20% common environment! 50% nonshared environment

#NGroups 3

G1: model parametersCalculationBegin Matrices;

X lower 1 1 fixedY lower 1 1 fixedZ lower 1 1 fixed

End Matrices;

Matrix X 0.5477Matrix Y 0.4472Matrix Z 0.7071

Begin Algebra;A = X*X' ; C = Y*Y' ;E = Z*Z' ;

End Algebra;End

Page 30: Power and Sample Size Boulder 2004 Benjamin Neale Shaun Purcell.

Power.mx 2G2: MZ twin pairsCalculationMatrices = Group 1Covariances A+C+E | A+C _

A+C | A+C+E /Options MX%E=mzsim.covEnd

G3: DZ twin pairsCalculationMatrices = Group 1 H Full 1 1 Covariances A+C+E | H@A+C _

H@A+C | A+C+E /Matrix H 0.5Options MX%E=dzsim.covEnd

Page 31: Power and Sample Size Boulder 2004 Benjamin Neale Shaun Purcell.

Power.mx 3! Second part of script! Fit the wrong model to the simulated data ! to calculate power

#NGroups 3G1 : model parametersCalculationBegin Matrices;

X lower 1 1 freeY lower 1 1 fixedZ lower 1 1 free

End Matrices;

Begin Algebra;A = X*X' ;C = Y*Y' ; E = Z*Z' ;

End Algebra;End

Page 32: Power and Sample Size Boulder 2004 Benjamin Neale Shaun Purcell.

Power.mx 4G2 : MZ twinsData NInput_vars=2 NObservations=350CMatrix Full File=mzsim.covMatrices= Group 1Covariances A+C+E | A+C _

A+C | A+C+E /Option RSidualsEnd

G3 : DZ twinsData NInput_vars=2 NObservations=350CMatrix Full File=dzsim.covMatrices= Group 1 H Full 1 1Covariances A+C+E | H@A+C _

H@A+C | A+C+E /Matix H 0.5Option RSiduals

! Power for alpha = 0.05 and 1 dfOption Power= 0.05,1 End