Power and Sample Size Adapted from: Boulder 2004 Benjamin Neale Shaun Purcell I HAVE THE POWER!!!

Author
ezralyons 
Category
Documents

view
221 
download
1
Embed Size (px)
Transcript of Power and Sample Size Adapted from: Boulder 2004 Benjamin Neale Shaun Purcell I HAVE THE POWER!!!
PowerPower and Sample Size and Sample Size
Adapted from:Adapted from:Boulder 2004Boulder 2004
Benjamin NealeBenjamin NealeShaun PurcellShaun Purcell
I HAVE THE
POWER!!!
OverviewOverview
Introduce Concept of Power via Introduce Concept of Power via Correlation Coefficient (ρ) ExampleCorrelation Coefficient (ρ) Example
Discuss Factors Contributing to Discuss Factors Contributing to PowerPower
Practical:Practical:• Simulating data as a means of Simulating data as a means of
computing powercomputing power• Using Mx for Power CalculationsUsing Mx for Power Calculations
3
Simple exampleSimple exampleInvestigate the linear relationship between Investigate the linear relationship between two random variables X and Y: two random variables X and Y: ρρ=0 vs. =0 vs. ρ≠ρ≠0 0
using the Pearson correlation coefficient.using the Pearson correlation coefficient.
Sample subjects at random from Sample subjects at random from populationpopulation Measure X andYMeasure X andY Calculate the measure of association Calculate the measure of association ρρ Test whether Test whether ρρ ≠≠ 0. 0.
4
How to Test How to Test ρρ ≠≠ 00
Assume data are normally Assume data are normally distributeddistributed
Define a nullhypothesis (Define a nullhypothesis (ρρ = 0) = 0) Choose an Choose an αα level (usually .05) level (usually .05) Use the (null) distribution of the Use the (null) distribution of the
test statistic associated with test statistic associated with ρρ=0=0 t=t=ρρ √√ [(N2)/(1 [(N2)/(1ρρ22)] )]
5
How to Test How to Test ρρ ≠≠ 00
Sample N=40Sample N=40 r=.303, t=1.867, df=38, p=.06 r=.303, t=1.867, df=38, p=.06 αα
=.05=.05 Because observed p > Because observed p > αα, we fail , we fail
to reject to reject ρρ = 0 = 0
Have we drawn the correct Have we drawn the correct conclusion that p is genuinely conclusion that p is genuinely zero?zero?
6
= type I error rate = type I error rate probability of deciding probability of deciding ρ≠ρ≠ 0 0
(while in truth (while in truth ρρ=0)=0)
ααis often chosen to is often chosen to equal .05...why?equal .05...why?
DOGMA
7
N=40, r=0, nrep=1000, central N=40, r=0, nrep=1000, central t(38), t(38),
αα=0.05 (critical value 2.04)=0.05 (critical value 2.04)
8
Observed nonnull Observed nonnull distribution (distribution (ρρ=.2) =.2)
and null distributionand null distribution
9
In 23% of tests that In 23% of tests that ρρ=0, t=0, t>2.024 (>2.024 (αα=0.05), and thus =0.05), and thus correctly conclude that correctly conclude that ρρ 0. 0.
The probability of correctly The probability of correctly rejecting the nullhypothesis rejecting the nullhypothesis ((ρρ=0) is 1=0) is 1ββ, known as the power. , known as the power.
Hypothesis TestingHypothesis Testing
Correlation Coefficient hypotheses:Correlation Coefficient hypotheses: hhoo (null hypothesis) is ρ=0 (null hypothesis) is ρ=0
hha a (alternative hypothesis) is ρ ≠ 0(alternative hypothesis) is ρ ≠ 0 Twosided test, where ρ > 0 or ρ < 0 are Twosided test, where ρ > 0 or ρ < 0 are
onesidedonesided
Null hypothesis usually assumes no Null hypothesis usually assumes no effecteffect
Alternative hypothesis is the idea Alternative hypothesis is the idea being testedbeing tested
11
Summary of Possible Summary of Possible ResultsResults
H0 trueH0 true H0 H0 falsefalse
accept H0accept H0 11ααββreject H0reject H0 αα 11ββ
αα=type 1 error rate=type 1 error rate
ββ=type 2 error rate=type 2 error rate
11ββ=statistical power=statistical power
Rejection of H0 Nonrejection of H0
H0 true
HA true
STATISTICS
R E
A L I T
Y
Nonsignificant result(1 α)
Type II error at rate β
Significant result(1β)
Type I error at rate α
PowerPower The probability of rejecting the The probability of rejecting the
nullhypothesis depends on: nullhypothesis depends on: the significance criterion (the significance criterion (αα)) the sample size (N) the sample size (N) the effect size (NCP)the effect size (NCP)
“The probability of detecting a given effect size in a population from a sample of size N, using significance criterion α”
P(T)
T
alpha 0.05
Sampling distribution if HA were true
Sampling distribution if H0 were true
β α
POWER = 1  β
Standard Case
Effect Size (NCP)
P(T)
T
alpha 0.1
Sampling distribution if HA were true
Sampling distribution if H0 were true
POWER = 1  β ↑
Impact of less Impact of less conservative conservative α
β α
P(T)
T
alpha 0.01
Sampling distribution if HA were true
Sampling distribution if H0 were true
POWER = 1  β↓
Impact of more Impact of more conservative conservative α
β α
P(T)
T
alpha 0.05
β α
Impact of increased sample size
Reduced variance of sampling distribution if HA is true
Sampling distribution if H0 is true
POWER = 1  β↑
P(T)
T
alpha 0.05
Sampling distribution if HA were true
Sampling distribution if H0 were true
β α
POWER = 1  β↑
Impact of increase in Effect Size
Effect Size (NCP)↑
Summary: Factors affecting Summary: Factors affecting powerpower
Effect SizeEffect Size Sample SizeSample Size Alpha LevelAlpha Level <Beware the False Positive!!!><Beware the False Positive!!!> Type of Data:Type of Data:
Binary, Ordinal, ContinuousBinary, Ordinal, Continuous Research DesignResearch Design
Uses of power Uses of power calculationscalculations
Planning a studyPlanning a study
Possibly to reflect on ns trend resultPossibly to reflect on ns trend result
No need if significance is achievedNo need if significance is achieved
To determine chances of study To determine chances of study successsuccess
Power Calculations via Power Calculations via SimulationSimulation
Simulate Data under theorized modelSimulate Data under theorized model
Calculate Statistics and Perform TestCalculate Statistics and Perform Test
Given α, how many tests p < αGiven α, how many tests p < α
Power = (#hits)/(#tests)Power = (#hits)/(#tests)
Practical: Empirical Practical: Empirical Power 1Power 1
Simulate Data under a model onlineSimulate Data under a model online
Fit an ACE model, and test for CFit an ACE model, and test for C
Collate fit statistics on boardCollate fit statistics on board
Practical: Empirical Practical: Empirical Power 2Power 2
First get First get http://www.vipbg.vcu.edu/neale/gen619/phttp://www.vipbg.vcu.edu/neale/gen619/power/powerraw.mx and put it into your ower/powerraw.mx and put it into your directorydirectory
Second, open this script in Mx, and note Second, open this script in Mx, and note both places where we must paste in the both places where we must paste in the datadata
Third, simulate data (see next slide)Third, simulate data (see next slide) Fourth, fit the ACE model and then fit the Fourth, fit the ACE model and then fit the
AE submodelAE submodel
Practical: Empirical Practical: Empirical Power 3Power 3
Simulation ConditionsSimulation Conditions 30% A30% A22 20% C20% C22 50% E 50% E22
Input:Input: A 0.5477 C of 0.4472 E of 0.7071A 0.5477 C of 0.4472 E of 0.7071 350 MZ 350 DZ350 MZ 350 DZ Simulate and use “Space Delimited” Simulate and use “Space Delimited”
option atoption at http://statgen.iop.kcl.ac.uk/workshop/http://statgen.iop.kcl.ac.uk/workshop/
unisim.html or click unisim.html or click herehere in slide show mode in slide show mode Click submit after filling in the fields and you Click submit after filling in the fields and you
will get a page of datawill get a page of data
Practical: Empirical Practical: Empirical Power 4Power 4
With the data page, use ctrla to select the With the data page, use ctrla to select the data, controlc to copy, switch to Mx (e.g. data, controlc to copy, switch to Mx (e.g. with alttab) and in Mx controlv to paste with alttab) and in Mx controlv to paste in both the MZ and DZ groups.in both the MZ and DZ groups.
Run the ace.mx script with the data Run the ace.mx script with the data pasted in and modify it to run the AE pasted in and modify it to run the AE model.model.
Report the 2loglikelihoods on the Report the 2loglikelihoods on the whiteboardwhiteboard
Optionally, keep a record of A, C, and E Optionally, keep a record of A, C, and E estimates of the first model, and the A and estimates of the first model, and the A and E estimates of the second model E estimates of the second model
Simulation of other Simulation of other types of datatypes of data
Use SAS/R/Matlab/MathematicaUse SAS/R/Matlab/Mathematica
Any decent random number Any decent random number generator will dogenerator will do
See See http://www.vipbg.vcu.edu/~neale/gehttp://www.vipbg.vcu.edu/~neale/gen619/power/sim1.sasn619/power/sim1.sas
27
RR
R is in your futureR is in your future Can do it manually with rnormCan do it manually with rnorm Easier to use mvrnormEasier to use mvrnorm
runmx at Matt Keller’s site:runmx at Matt Keller’s site: http://www.matthewckeller.com/html/mhttp://www.matthewckeller.com/html/m
xr.htmlxr.html27
library (MASS)mvrnorm(n=100,c(1,1),matrix(c(1,.5,.5,1),2,2),empirical=FALSE)
Mathematica ExampleMathematica ExampleIn[32]:=(mu={1,2,3,4}; sigma={{1,1/2,1/3,1/4},{1/2,1/3,1/4,1/5},{1/3,1/4,1/5,1/6},{1/4,1/5,1/6, 1/7}}; Timing[Table[Random[MultinormalDistribution[mu,sigma]],{1000}]][[1]])
Out[32]=1.1 Second
In[33]:=Timing[RandomArray[MultinormalDistribution[mu,sigma],1000]][[1]]
Out[33]=0.04 Second
In[37]:=TableForm[RandomArray[MultinormalDistribution[mu,sigma],10]]
Obtain mathematica from VCU http://www.ts.vcu.edu/faq/stats/mathematica.html
Theoretical Power Theoretical Power CalculationsCalculations
Based on Stats, rather than Based on Stats, rather than SimulationsSimulations
Can be calculated by hand Can be calculated by hand sometimes, but Mx does it for ussometimes, but Mx does it for us
Note that sample size and alphaNote that sample size and alphalevel are the only things we can level are the only things we can change, but can assume different change, but can assume different effect sizeseffect sizes
Mx gives us the relative power levels Mx gives us the relative power levels at the alpha specified for different at the alpha specified for different sample sizessample sizes
Theoretical Power Theoretical Power CalculationsCalculations
We will use the power.mx script to We will use the power.mx script to look at the sample size necessary for look at the sample size necessary for different power levelsdifferent power levels
In Mx, power calculations can be In Mx, power calculations can be computed in 2 ways:computed in 2 ways: Using Covariance Matrices (We Do This Using Covariance Matrices (We Do This
One)One) Requiring an initial dataset to generate Requiring an initial dataset to generate
a likelihood so that we can use a chia likelihood so that we can use a chisquare testsquare test
Power.mx 1Power.mx 1! Simulate the data! Simulate the data
! ! 30% additive genetic30% additive genetic
! ! 20% common environment20% common environment
! ! 50% nonshared environment50% nonshared environment
#NGroups 3#NGroups 3
G1: model parametersG1: model parameters
CalculationCalculation
Begin Matrices;Begin Matrices;
X lower 1 1 fixedX lower 1 1 fixed
Y lower 1 1 fixedY lower 1 1 fixed
Z lower 1 1 fixedZ lower 1 1 fixed
End Matrices;End Matrices;
Matrix X 0.5477Matrix X 0.5477
Matrix Y 0.4472Matrix Y 0.4472
Matrix Z 0.7071Matrix Z 0.7071
Begin Algebra;Begin Algebra;
A = X*X' ; A = X*X' ;
C = Y*Y' ;C = Y*Y' ;
E = Z*Z' ;E = Z*Z' ;
End Algebra;End Algebra;
EndEnd
Power.mx 2Power.mx 2G2: MZ twin pairsG2: MZ twin pairs
CalculationCalculation
Matrices = Group 1Matrices = Group 1
Covariances A+C+ECovariances A+C+E  A+C _A+C _
A+CA+C  A+C+E /A+C+E /
Options MX%E=mzsim.covOptions MX%E=mzsim.cov
EndEnd
G3: DZ twin pairsG3: DZ twin pairs
CalculationCalculation
Matrices = Group 1Matrices = Group 1
H Full 1 1 H Full 1 1
Covariances A+C+ECovariances A+C+E  [email protected]+C [email protected]+C _
[email protected][email protected]+C  A+C+E /A+C+E /
Matrix H 0.5Matrix H 0.5
Options MX%E=dzsim.covOptions MX%E=dzsim.cov
EndEnd
Power.mx 3Power.mx 3! Second part of script! Second part of script
! Fit the wrong model to the simulated data ! Fit the wrong model to the simulated data
! to calculate power! to calculate power
#NGroups 3#NGroups 3
G1 : model parametersG1 : model parameters
CalculationCalculation
Begin Matrices;Begin Matrices;
X lower 1 1 freeX lower 1 1 free
Y lower 1 1 fixedY lower 1 1 fixed
Z lower 1 1 freeZ lower 1 1 free
End Matrices;End Matrices;
Begin Algebra;Begin Algebra;
A = X*X' ;A = X*X' ;
C = Y*Y' ; C = Y*Y' ;
E = Z*Z' ; E = Z*Z' ;
End Algebra;End Algebra;
EndEnd
Power.mx 4Power.mx 4G2 : MZ twinsG2 : MZ twins
Data NInput_vars=2 NObservations=350Data NInput_vars=2 NObservations=350
CMatrix Full File=mzsim.covCMatrix Full File=mzsim.cov
Matrices= Group 1Matrices= Group 1
Covariances A+C+ECovariances A+C+E  A+C _A+C _
A+C A+C   A+C+E /A+C+E /
Option RSidualsOption RSiduals
EndEnd
G3 : DZ twinsG3 : DZ twins
Data NInput_vars=2 NObservations=350Data NInput_vars=2 NObservations=350
CMatrix Full File=dzsim.covCMatrix Full File=dzsim.cov
Matrices= Group 1Matrices= Group 1
H Full 1 1H Full 1 1
Covariances A+C+ECovariances A+C+E  [email protected]+C [email protected]+C _
[email protected]+C [email protected]+C   A+C+E /A+C+E /
Matix H 0.5Matix H 0.5
Option RSidualsOption RSiduals
! Power for alpha = 0.05 and 1 df! Power for alpha = 0.05 and 1 df
Option Power= 0.05,1 Option Power= 0.05,1
EndEnd
35
Model IdentificationModel Identification
Necessary ConditionsNecessary Conditions
Sufficient ConditionsSufficient Conditions
Algebraic TestsAlgebraic Tests
Empirical TestsEmpirical Tests
35
36
Necessary ConditionsNecessary Conditions
Number of Parameters < or = Number of Parameters < or = Number of StatisticsNumber of Statistics
Structural Equation Model usually Structural Equation Model usually count variances & covariances to count variances & covariances to identify variance componentsidentify variance components
What is the number of What is the number of statistics/parameters in a univariate statistics/parameters in a univariate ACE model? Bivariate?ACE model? Bivariate? 3
6
37
Sufficient ConditionsSufficient Conditions
No general sufficient conditions for No general sufficient conditions for SEMSEM
Special case: ACE modelSpecial case: ACE model Distinct Statistics (i.e. have different Distinct Statistics (i.e. have different
predicted valuespredicted values VP = a2 + c2 + e2VP = a2 + c2 + e2 CMZ = a2 + c2CMZ = a2 + c2 CDZ = .5 a2 + c2CDZ = .5 a2 + c2
37
38
Sufficient Conditions 2Sufficient Conditions 2 Arrange in matrix formArrange in matrix form
1 1 1 a2 VP1 1 1 a2 VP 1 1 0 c2 = CMZ1 1 0 c2 = CMZ .5 1 0 e2 CDZ.5 1 0 e2 CDZ
A x = bA x = b
If A can be inverted then can find AIf A can be inverted then can find A
11bb 38
39
Sufficient Conditions 3Sufficient Conditions 3
39
Solve ACE modelCalc ng=1Begin Matrices; A full 3 3 b full 3 1End Matrices;Matrix A1 1 11 1 0.5 1 0Labels Col A A C ELabels Row A VP CMZ CDZMatrix b ! Data, essentially1.8.5Labels Col B StatisticLabels Row B VP CMZ CDZBegin Algebra; C = A~; x = A~*b;End Algebra;Labels Row x A C EEnd
40
Sufficient Conditions 4Sufficient Conditions 4
What if not soluble by inversion?What if not soluble by inversion? Empirical:Empirical:
1 Pick set of parameter values T1 Pick set of parameter values T11
2 Simulate data2 Simulate data 3 Fit model to data starting at T3 Fit model to data starting at T22 (not T (not T11)) 4 Repeat and look for solutions to step 3 4 Repeat and look for solutions to step 3
that are perfect but have estimates not that are perfect but have estimates not equal to Tequal to T11
If equally good solution but different If equally good solution but different values, reject identified model values, reject identified model hypothesis hypothesis
40
ConclusionConclusion
Power calculations relatively simple Power calculations relatively simple to doto do
Curse of dimensionalityCurse of dimensionality Different for raw vs summary Different for raw vs summary
statisticsstatistics Simulation can be done many waysSimulation can be done many ways No substitute for research designNo substitute for research design