Biostatistics Case Studies 2014 Youngju Pak, PhD. Biostatistician [email protected] Session 2:...

38
Biostatistics Case Studies 2014 Youngju Pak, PhD. Biostatistician [email protected] Session 2: Sample Size & Power for Inequality and Equivalence Studies II

Transcript of Biostatistics Case Studies 2014 Youngju Pak, PhD. Biostatistician [email protected] Session 2:...

Page 1: Biostatistics Case Studies 2014 Youngju Pak, PhD. Biostatistician ypak@labiomed.org Session 2: Sample Size & Power for Inequality and Equivalence Studies.

Biostatistics Case Studies 2014

Youngju Pak, PhD.

Biostatistician

[email protected]

Session 2:

Sample Size & Power for Inequality and Equivalence Studies II

Page 2: Biostatistics Case Studies 2014 Youngju Pak, PhD. Biostatistician ypak@labiomed.org Session 2: Sample Size & Power for Inequality and Equivalence Studies.

What we have leaned in session 1? Info Needed for Study Size: Comparing Means (Inequality test)

1. Effect (clinically meaningful difference)

2. Subject variability

3. Type I error (1.96 for α=0.05; 2.58 for α=0.01)

4. Power (0.842 for 80% power; 1.645 for 95% power)

Free sample size calculations:

www.stat.uiowa.edu/~rlenth/Power

(1.96 + 0.842)22SD2

Δ2N =

Page 3: Biostatistics Case Studies 2014 Youngju Pak, PhD. Biostatistician ypak@labiomed.org Session 2: Sample Size & Power for Inequality and Equivalence Studies.

Case Study

Ophthalmology 2006; 113:70-76.

Page 4: Biostatistics Case Studies 2014 Youngju Pak, PhD. Biostatistician ypak@labiomed.org Session 2: Sample Size & Power for Inequality and Equivalence Studies.

Abstract

Page 5: Biostatistics Case Studies 2014 Youngju Pak, PhD. Biostatistician ypak@labiomed.org Session 2: Sample Size & Power for Inequality and Equivalence Studies.

Primary Outcome and Study Size

Study Size - Page 72 bottom of column 1:

Primary Outcome - Page 72 middle of column 1:

Needs Consensus

PI’s Gamble

Page 6: Biostatistics Case Studies 2014 Youngju Pak, PhD. Biostatistician ypak@labiomed.org Session 2: Sample Size & Power for Inequality and Equivalence Studies.

Testing inequality vs. equivalence.

• Hypotheses for testing inequality:Ha: | mean(treatment ) - mean (control ) | ≠ 0H0: | mean(treatment ) - mean (control ) | = 0

• Hypotheses for testing inequality:• Ha : δ1< mean(trt1) – mean (trt2) < δ2

• H0 : mean(trt1) – mean (trt2) ≤ δ 1 Non-Inferiority

or mean(trt1) – mean (trt2) ≥ δ2

Page 7: Biostatistics Case Studies 2014 Youngju Pak, PhD. Biostatistician ypak@labiomed.org Session 2: Sample Size & Power for Inequality and Equivalence Studies.

Graphical presentation of equivalence test

• With our regular t-tests, to conclude there is a substantial difference you must observe a difference large enough to conclude it is not due to sampling error

• To conclude there is not a substantial difference you must observe a difference small enough to reject that closeness is not due to sampling error from distributions centered on large effects

Page 8: Biostatistics Case Studies 2014 Youngju Pak, PhD. Biostatistician ypak@labiomed.org Session 2: Sample Size & Power for Inequality and Equivalence Studies.

Non-Inferiority Study

• Usually a new treatment or regimen is compared with an accepted treatment or regimen or standard of care.

• The new treatment is assumed inferior to the standard and the study is designed to show overwhelming

evidence that it is at least nearly as good, i.e., non- inferior. It usually has other advantages, e.g., oral vs. inj.

• A negative inferiority study fails to detect inferiority, but does not necessarily give evidence for non-inferiority.

• The accepted treatment is usually known to be efficacious already, but an added placebo group may also be used.

Page 9: Biostatistics Case Studies 2014 Youngju Pak, PhD. Biostatistician ypak@labiomed.org Session 2: Sample Size & Power for Inequality and Equivalence Studies.

How to determine Sample Size?

• For IOP study, we have– Ha: mean IOP change uf – mean IOP change f < 1.5

– H0: mean IOP change uf – mean IOP change f ≥ 1.5

thus, we are only interested in the upper limit of the difference Non-inferiority one-sided T-test

• Thus we reject the H0 if Signal/ Noise < some clinical value.

• But N for a non-inferiority test require more complicated parameters such as the non-centrality parameter of the t-distribution (a Two One Sided T-test is usually used for the equivalence test ).

Page 10: Biostatistics Case Studies 2014 Youngju Pak, PhD. Biostatistician ypak@labiomed.org Session 2: Sample Size & Power for Inequality and Equivalence Studies.

Let’s run a softwarefrom

www.stat.uiowa.edu/~rlenth/Power• Information you will need

– Equivalence Margin • Non-Inferiority Margin(NIM) =1.5 for the IOP

study– Assumed mean difference in change of IOP

between two groups -> usually zero difference assumed but it is assumed 0.5 for the IOP study

– SD of changes of IOP = 3.5– α (usually set to 2.5%) since the confidence level of

the confidence interval is (100-2 x α) %

Page 11: Biostatistics Case Studies 2014 Youngju Pak, PhD. Biostatistician ypak@labiomed.org Session 2: Sample Size & Power for Inequality and Equivalence Studies.

Sample size for IOP study

Page 12: Biostatistics Case Studies 2014 Youngju Pak, PhD. Biostatistician ypak@labiomed.org Session 2: Sample Size & Power for Inequality and Equivalence Studies.

Three dimensional power curve for

a non-inferiority test

Page 13: Biostatistics Case Studies 2014 Youngju Pak, PhD. Biostatistician ypak@labiomed.org Session 2: Sample Size & Power for Inequality and Equivalence Studies.

How do we determine if the fixed method is noninferior to the unfixed method?

Regardless of study aim – to prove treatments equivalent or to prove them different - inference can be based on:

Primary Outcome: IOP reduction D= Duf – Df , where Df = mean IOP reduction with fixed therapy

Typical superiority/inferiority study:Compare to 0.

Non-inferiority study:Compare to δ2, a pre-specified margin of

equivalence (1.5 here).

= 95% CI for D(= Duf – Df ) = “true (population) values for D”

Page 14: Biostatistics Case Studies 2014 Youngju Pak, PhD. Biostatistician ypak@labiomed.org Session 2: Sample Size & Power for Inequality and Equivalence Studies.

Typical Analysis: Inferiority or Superiority

H0: Duf – Df = 0

H1: Duf – Df ≠ 0

Aim: H1 → therapies differ

α = 0.05 & N=2•194

Power = 80%

when Δ=1, SD=3.5

Fixed is inferior

= 95% CI for D = “true (population) values for D”

Fixed is superior

0

0

Du – Df

[Not used in this paper]

0No difference detected

Du – Df

Du – Df

Page 15: Biostatistics Case Studies 2014 Youngju Pak, PhD. Biostatistician ypak@labiomed.org Session 2: Sample Size & Power for Inequality and Equivalence Studies.

Typical Analysis: Inferiority Only

H0: Du – Df ≤ 0

H1: Du – Df > 0

Aim: H1 → fixed is inferior

α = 0.025 & N=2•194

Power = 80% for

when Δ=1, SD=3.5

Fixed is inferior

= 95% CI for Du – Df= “true (population) values for D”

0

0

Duf – Df

[Not used in this paper]

0

Inferiority not detectedDuf – Df

Duf – Df

( α = 0.05 → N=2•153 )

Page 16: Biostatistics Case Studies 2014 Youngju Pak, PhD. Biostatistician ypak@labiomed.org Session 2: Sample Size & Power for Inequality and Equivalence Studies.

Non-Inferiority

H0: Du – Df ≥ 1.5

H1: Du – Df < 1.5

Aim: H1 → fixed is non-inferior

α = 0.025 & N=2•194

Power = 80% for

When Δ= 0.5, NIM=1.5

Fixed is non-inferior

= 95% CI for Du – Df= “true (population) values for D”

0

0

Duf – Df

[As in this paper]

0Non-Inferiority not detected

Duf – Df

Duf – Df

1.5

1.5

Fixed is inferior1.5

Page 17: Biostatistics Case Studies 2014 Youngju Pak, PhD. Biostatistician ypak@labiomed.org Session 2: Sample Size & Power for Inequality and Equivalence Studies.

Inferiority and Non-Inferiority

Fixed is non-inferior

= 95% CI for Du – Df = “true (population) values for D”

0

0

0Neither is detected

Duf – Df

1.5

1.5

Fixed is inferior

0 1.5Fixed is “non-clinically”

inferior

D^uf = 9.0 D^f = 8.7 D^ = 0.3 95% CI = -0.1 to 0.7Observed Results:

Fixed is non-inferior0 1.5

1.5

Page 18: Biostatistics Case Studies 2014 Youngju Pak, PhD. Biostatistician ypak@labiomed.org Session 2: Sample Size & Power for Inequality and Equivalence Studies.

Conclusions: General

• “Negligibly inferior” would be a better term than non-inferior.

• All inference can be based on confidence intervals.

• Pre-specify the comparisons to be made. Cannot test for both non-inferiority and superiority.

• Power for only one or for multiple comparisons, e.g., non-inferiority and inferiority. Power can be different for different comparisons.

• Very careful consideration must be given to choice of margin of equivalence (1.5 here). You can be risky and gamble on what expected differences will be (0.5 here), but the study is worthless if others in the field would

find your margin too large.

Page 19: Biostatistics Case Studies 2014 Youngju Pak, PhD. Biostatistician ypak@labiomed.org Session 2: Sample Size & Power for Inequality and Equivalence Studies.

• FDA Guidelines :

http://www.fda.gov/downloads/drugs/guidancecomplianceregulatoryinformation/guidances/ucm202140.pdf

Where, M1= Full effect of the active control compare with the placebo effectM2= NI Margin

Page 20: Biostatistics Case Studies 2014 Youngju Pak, PhD. Biostatistician ypak@labiomed.org Session 2: Sample Size & Power for Inequality and Equivalence Studies.

Self-Quiz

1. Give an example in your specialty area for a superiority /inferiority study. Now modify it to an equivalence study. Now modify it to a non-inferiority study.

2. T or F: The main point about non-inferiority studies is that we are asking whether a treatment is as good or better vs. worse than another treatment, so it uses a one-sided test.

3. Power for a typical superiority test is the likelihood that you will declare treatment differences (p<0.05) if treatments really differ by some magnitude Δ. Explain what power means for a non-inferiority study.

4. T or F: Last-value-carried-forward is a good way to handle drop-outs in a non-inferiority study. Explain.

continued

Page 21: Biostatistics Case Studies 2014 Youngju Pak, PhD. Biostatistician ypak@labiomed.org Session 2: Sample Size & Power for Inequality and Equivalence Studies.

Self-Quiz

5. T or F: In a non-inferiority study, you should first test for non-inferiority with a confidence interval, and then use a t-test to test for superiority, but only if non-inferiority was established at the first step.

6. What is the meaning of the equivalence margin, and how do you determine it?

continued

Page 22: Biostatistics Case Studies 2014 Youngju Pak, PhD. Biostatistician ypak@labiomed.org Session 2: Sample Size & Power for Inequality and Equivalence Studies.

Self-Quiz

7. Suppose the primary outcome for a study is a serum inflammatory marker. If it’s assay is poor (low reproducibility), then it is more difficult to find treatment differences in a typical superiority/inferiority study than for a better assay, due to this noise. Would it be easier or more difficult to find non-inferiority with this assay, compared to a better assay?

8. Does the assumed treatment difference (0.5 here) for power calculations have the same meaning as the difference used for power calculations in a typical superiority/inferiority study?

Page 23: Biostatistics Case Studies 2014 Youngju Pak, PhD. Biostatistician ypak@labiomed.org Session 2: Sample Size & Power for Inequality and Equivalence Studies.

Self-Quiz

1. Give an example in your specialty area for a superiority /inferiority study. Now modify it to an equivalence study. Now modify it to a non-inferiority study.

Page 24: Biostatistics Case Studies 2014 Youngju Pak, PhD. Biostatistician ypak@labiomed.org Session 2: Sample Size & Power for Inequality and Equivalence Studies.

Self-Quiz

1. Answer

Vaccine Testing:

Superiority: New candidate vaccine vs. placebo

Equivalence: Antigen potency between two manufacturing plants or lots.

Non-Inferiority: New candidate vaccine vs. old one.

Page 25: Biostatistics Case Studies 2014 Youngju Pak, PhD. Biostatistician ypak@labiomed.org Session 2: Sample Size & Power for Inequality and Equivalence Studies.

Self-Quiz

2. T or F: The main point about non-inferiority studies is that we are asking whether a treatment is as good or better vs. worse than another treatment, so it uses a one-sided test.

Page 26: Biostatistics Case Studies 2014 Youngju Pak, PhD. Biostatistician ypak@labiomed.org Session 2: Sample Size & Power for Inequality and Equivalence Studies.

Self-Quiz

2. Answer

False. That is a feature of these studies, but not their distinguishing feature. They and equivalence studies are used to try to prove sameness, as opposed to typical studies that try to prove differences.

Page 27: Biostatistics Case Studies 2014 Youngju Pak, PhD. Biostatistician ypak@labiomed.org Session 2: Sample Size & Power for Inequality and Equivalence Studies.

Self-Quiz

3. Power for a typical superiority test is the likelihood that you will declare treatment differences (p<0.05) if treatments really differ by some magnitude Δ. Explain what power means for a non-inferiority study.

Page 28: Biostatistics Case Studies 2014 Youngju Pak, PhD. Biostatistician ypak@labiomed.org Session 2: Sample Size & Power for Inequality and Equivalence Studies.

Self-Quiz

3. Answer

Power for a non-inferiority study is the likelihood that you will declare treatment A to be no worse than a pre-specified magnitude δ from B (p<0.05) if treatments really differ by some Δ. Of course, Δ is less than δ , and is often 0.

Page 29: Biostatistics Case Studies 2014 Youngju Pak, PhD. Biostatistician ypak@labiomed.org Session 2: Sample Size & Power for Inequality and Equivalence Studies.

Self-Quiz

4. T or F: Last-value-carried-forward is a good way to handle drop-outs in a non-inferiority study. Explain.

Page 30: Biostatistics Case Studies 2014 Youngju Pak, PhD. Biostatistician ypak@labiomed.org Session 2: Sample Size & Power for Inequality and Equivalence Studies.

Self-Quiz

4. Answer

False. LVCF biases results toward less of a difference. This makes typical superiority studies conservative, but increases the chance of a falsely “proving” the aim.

Page 31: Biostatistics Case Studies 2014 Youngju Pak, PhD. Biostatistician ypak@labiomed.org Session 2: Sample Size & Power for Inequality and Equivalence Studies.

Self-Quiz

5. T or F: In a non-inferiority study, you should first test for non-inferiority with a confidence interval, and then use a t-test to test for superiority, but only if non-inferiority was established at the first step.

Page 32: Biostatistics Case Studies 2014 Youngju Pak, PhD. Biostatistician ypak@labiomed.org Session 2: Sample Size & Power for Inequality and Equivalence Studies.

Self-Quiz

5. Answer

False. You must specify a-priori superiority, in order to have a legitimate claim of proving it (beyond a reasonable (5%) doubt). The stated sequential strategy will only allow you to claim an observed result, without a statement about it’s certainty.

Page 33: Biostatistics Case Studies 2014 Youngju Pak, PhD. Biostatistician ypak@labiomed.org Session 2: Sample Size & Power for Inequality and Equivalence Studies.

Self-Quiz

6. What is the meaning of the equivalence margin, and how do you determine it?

Page 34: Biostatistics Case Studies 2014 Youngju Pak, PhD. Biostatistician ypak@labiomed.org Session 2: Sample Size & Power for Inequality and Equivalence Studies.

Self-Quiz

6. Answer

The equivalence margin is the maximum difference between treatments that is considered to be negligible or unimportant.

It must be pre-specified in order to prove equivalence or non-inferiority to that degree, rather than just noting it as an observation. Thus, it is ideally determined by peer-agreement or FDA concurrence prior to starting the study.

Page 35: Biostatistics Case Studies 2014 Youngju Pak, PhD. Biostatistician ypak@labiomed.org Session 2: Sample Size & Power for Inequality and Equivalence Studies.

Self-Quiz

7. Suppose the primary outcome for a study is a serum inflammatory marker. If it’s assay is poor (low reproducibility), then it is more difficult to find treatment differences in a typical superiority/inferiority study than for a better assay, due to this noise. Would it be easier or more difficult to find non-inferiority with this assay, compared to a better assay?

Page 36: Biostatistics Case Studies 2014 Youngju Pak, PhD. Biostatistician ypak@labiomed.org Session 2: Sample Size & Power for Inequality and Equivalence Studies.

Self-Quiz

7. Answer

It would still be more difficult to show the aim, non-inferiority here, since CIs will be wider, but there will be no bias due to it toward either treatment.

Generally, poorer study conduct is penalized in superiority studies and rewarded in non-inferiority studies, but that is not true for this type of poorer measurement error.

Page 37: Biostatistics Case Studies 2014 Youngju Pak, PhD. Biostatistician ypak@labiomed.org Session 2: Sample Size & Power for Inequality and Equivalence Studies.

Self-Quiz

10.Does the assumed treatment difference (0.5 for the IOP study) for power calculations have the same meaning as the difference used for power calculations in a typical superiority/inferiority study?

Page 38: Biostatistics Case Studies 2014 Youngju Pak, PhD. Biostatistician ypak@labiomed.org Session 2: Sample Size & Power for Inequality and Equivalence Studies.

Self-Quiz

10.Answer

No. Here, it is our best estimate of true treatment differences.

For superiority studies, the difference is ideally the minimal difference that is “clinically relevant”, not the expected difference, closer in meaning to the equivalence margin here. In practice, it is the smallest difference that logistics, money, time, and effort will allow us to detect with specified certainty.