September 22, 2022

Acknowledgements and disclaimer

  • Thanks to Yujie Zhao, Yilong Zhang, Nan Xiao and others for software contributions
  • Disclaimers
    • All opinions reflect those of the author
    • All errors attributable to author

Abstract

We consider several industry group sequential trials and associated issues over the last 30 years. Generally, group sequential design has provided a great deal of flexibility to overcome many challenges in a relatively straightforward way compared to more complex adaptive designs. Among the issues considered are timing of and boundaries for interim and final analyses, dealing with multiple hypotheses created by dose groups, populations and endpoints, and statistical information. Tools for design and execution will also be discussed.

Things I like

  • Interim analyses (group sequential design)
    • Early futility analysis (safety, proof-of-concept)
    • Efficacy interim analysis (efficacy may be good; e.g., Gandhi et al. (2018), Powles et al. (2020))
  • Multiple hypothesis testing
    • Testing multiple arms (doses)
    • Testing multiple populations(e.g., biomarker+ and overall)
  • Software that gives me what I want
  • Operationally seamless Phase 2/3
    • Get to see all data at critical decision point

Things I (generally) don’t like

  • Adaptive design
    • but I do like blinded sample size re-estimation
  • Optimal design
    • but it can be good to compare to how close your design is to optimal
  • Adaptive Phase 2/3
    • Too restrictive
    • Operationally usually not possible
    • I don’t believe in homogeneous trials
  • Trying to answer too many questions in Phase 2

EPIC

EPIC population and endpoints

EPIC Investigators (1994)

  • Population: patients undergoing angioplasty who are at high risk for recurrent events
  • Adjudicated composite primary endpoint: 1) death, 2) nonfatal myocardial infarction, 3) urgent repeat intervention
  • Safety concerns
    • Major bleeding
    • Intracranial hemmorhage

EPIC Design

  • 3 arms, double blind (c7E3 Fab = abciximab = ReoPro = anti-platelet agent)
    • c7E3 Fab bolus + infusion
    • c7E3 Fab bolus + placebo infusion (FDA insisted on this!)
    • placebo bolus + placebo infusion (control)
  • Group sequential design
    • 2100 patients to detect a reduction 15% to 10% with 80% power and 2-sided \(\alpha=0.05\)
    • Interim analyses after 1/3 and 2/3 of patients
    • Final analysis: nominal 2-sided p = 0.036
  • Multiplicity control
    • Global null hypothesis trend test: control (0), bolus (1), bolus + infusion (2)
    • Pairwise comparisons of c7E3 Fab dose groups vs. control

Adjudication

  • Adjudication considered necessary for accurate assessment of primary endpoint
    • Lots of data collection and cleaning
    • Set up of adjudication logistics and personnel
    • Too slow for prompt DMC availability?

Independent Data Monitoring Committee (IDMC)

  • First interim (~700 patients in interim)
    • Operational challenges were big
    • Both high quality and fast delivery needed
  • Lesson learned: clean adjudicated data needed fast for success

EPIC results at final analysis

Endpoint Placebo c7E3 bolus c7E3 bolus +
infusion
N 696 695 708
Primary Efficacy 89 (12.4%) 79 (11.4%) 59 (8.3%)
Major bleeding 46 (6.6%) 76 (10.9%) 99 (13.9%)
Intracranial hemorrhage 2 (0.3%) 1 (0.1%) 3 (0.4%)
  • Positive efficacy finding with discouraging safety limited sales
    • p=0.009 for trend test, p=0.008 for bolus + infusion vs control (35% reduction)
    • 3rd arm (bolus) essential to find minimally effective dose

EPIC lessons learned

  • Importance of logistics and execution for interim analysis
  • Interim analysis important for both efficacy and safety
  • More than 1 experimental arm can be essential in Phase 3

CAPTURE

CAPTURE population and endpoints

CAPTURE Investigators et al. (1997)

  • Population: patients with unstable angina undergoing angioplasty who are at high risk for recurrent events
  • Treatment: Medical therapy starting 18-24 hrs prior to planned angioplasty through 1 hour post angiplasty
  • Primary efficacy endpoint: same as EPIC

CAPTURE Design

  • 2 arms, double blind (c7E3 Fab = abciximab = ReoPro = anti-platelet agent)
    • c7E3 Fab bolus + c7E3 Fab infusion
    • placebo bolus + placebo infusion (control)
  • Group sequential design
    • 1400 patients to detect reduction 15% to 10% with 80% power and 2-sided \(\alpha=0.05\)
    • Interim analyses after 25% and 50% of patients
    • Final analysis: 1400 patients

Custom efficacy bounds

  • Discussion with Jan Tijssen, statistician, Amsterdam AMC
  • Interim bounds with custom spending function resulting in
    • p=0.0001, p=0.001 nominal p-value bounds at 25% and 50% of patients
CAPTURE fixed design sample size1
Design N Bound alpha Power
RD 1372 1.959964 0.025 0.8
1 Risk difference power without continuity correction using method of Farrington and Manning.

CAPTURE custom efficacy bounds

Think about bounds at time of design

O’Brien-Fleming-like bounds

CAPTURE sample size for group sequential design, N = 1400
Efficacy testing bound only; O'Brien-Fleming-like bound
Bound Nominal p1 ~Risk difference at bound Cumulative boundary crossing probability
Alternate hypothesis Null hypothesis
Analysis: 1 N: 350 risk difference: 0.05 IF: 0.25
Efficacy 0.0000 0.1527 0.0017 0.000007
Analysis: 2 N: 700 risk difference: 0.05 IF: 0.5
Efficacy 0.0015 0.0739 0.1692 0.001525
Analysis: 3 N: 1050 risk difference: 0.05 IF: 0.75
Efficacy 0.0092 0.0480 0.5425 0.009649
Analysis: 4 N: 1400 risk difference: 0.05 IF: 1
Efficacy 0.0220 0.0355 0.8020 0.025000
1 One-sided p-value for experimental vs control treatment. Values < 0.5 favor experimental, > 0.5 favor control.

Custom spending function for CAPTURE design

t-distribution spending (K. M. Anderson and Clark (2010))

CAPTURE: actual bounds

Using t-distribution spending (K. M. Anderson and Clark (2010))

One-sided CAPTURE bounds as specified in protocol
Custom spending function to set desired interim nominal p-values and N=1400
Bound Nominal p1 ~Risk difference at bound Cumulative boundary crossing probability
Alternate hypothesis Null hypothesis
Analysis: 1 N: 350 risk difference: 0.05 IF: 0.25
Efficacy 0.0001 0.1311 0.0104 0.0001
Analysis: 2 N: 700 risk difference: 0.05 IF: 0.5
Efficacy 0.0010 0.0773 0.1376 0.0010
Analysis: 3 N: 1050 risk difference: 0.05 IF: 0.75
Efficacy 0.0072 0.0498 0.5065 0.0075
Analysis: 4 N: 1400 risk difference: 0.05 IF: 1
Efficacy 0.0231 0.0351 0.8047 0.0250
1 One-sided p-value for experimental vs control treatment. Values < 0.5 favor experimental, > 0.5 favor control.

CAPTURE differences sufficient to cross efficacy bounds

Judgement required to decide if

  1. bounds represent clinically and statistically significant results,
  2. ethically allow continuation when not crossed,
  3. establish balance of safety and efficacy, and
  4. are enough data to demonstrate new treatment fit for use

CAPTURE differences sufficient to cross efficacy bounds

Analaysis N per arm Hypothetical result O'Brien-Fleming spending t-distribution spending
Control Experimental Nominal p OBF bound Reject OBF t-Dist. bound Reject t
1 175 25 (14.3%) 3 (1.7%) 7.30 × 10−6 7.37 × 10−6 TRUE 1.00 × 10−4 TRUE
1 175 25 (14.3%) 5 (2.9%) 6.70 × 10−5 7.37 × 10−6 FALSE 1.00 × 10−4 TRUE
1 175 30 (17.1%) 5 (2.9%) 4.21 × 10−6 7.37 × 10−6 TRUE 1.00 × 10−4 TRUE
1 175 30 (17.1%) 8 (4.6%) 7.84 × 10−5 7.37 × 10−6 FALSE 1.00 × 10−4 TRUE
1 175 35 (20%) 8 (4.6%) 5.50 × 10−6 7.37 × 10−6 TRUE 1.00 × 10−4 TRUE
1 175 35 (20%) 11 (6.3%) 7.33 × 10−5 7.37 × 10−6 FALSE 1.00 × 10−4 TRUE
2 350 50 (14.3%) 24 (6.9%) 6.97 × 10−4 1.52 × 10−3 TRUE 9.59 × 10−4 TRUE
2 350 50 (14.3%) 25 (7.1%) 1.13 × 10−3 1.52 × 10−3 TRUE 9.59 × 10−4 FALSE
2 350 60 (17.1%) 32 (9.1%) 8.67 × 10−4 1.52 × 10−3 TRUE 9.59 × 10−4 TRUE
2 350 60 (17.1%) 33 (9.4%) 1.32 × 10−3 1.52 × 10−3 TRUE 9.59 × 10−4 FALSE
2 350 70 (20%) 40 (11.4%) 9.18 × 10−4 1.52 × 10−3 TRUE 9.59 × 10−4 TRUE
2 350 70 (20%) 41 (11.7%) 1.35 × 10−3 1.52 × 10−3 TRUE 9.59 × 10−4 FALSE
3 525 75 (14.3%) 49 (9.3%) 6.45 × 10−3 9.16 × 10−3 TRUE 7.19 × 10−3 TRUE
3 525 75 (14.3%) 50 (9.5%) 8.60 × 10−3 9.16 × 10−3 TRUE 7.19 × 10−3 FALSE
3 525 90 (17.1%) 62 (11.8%) 7.03 × 10−3 9.16 × 10−3 TRUE 7.19 × 10−3 TRUE
3 525 90 (17.1%) 63 (12%) 9.10 × 10−3 9.16 × 10−3 TRUE 7.19 × 10−3 FALSE
3 525 105 (20%) 75 (14.3%) 7.01 × 10−3 9.16 × 10−3 TRUE 7.19 × 10−3 TRUE
3 525 105 (20%) 76 (14.5%) 8.91 × 10−3 9.16 × 10−3 TRUE 7.19 × 10−3 FALSE
4 700 100 (14.3%) 75 (10.7%) 2.17 × 10−2 2.20 × 10−2 TRUE 2.31 × 10−2 TRUE
4 700 120 (17.1%) 92 (13.1%) 1.84 × 10−2 2.20 × 10−2 TRUE 2.31 × 10−2 TRUE
4 700 120 (17.1%) 93 (13.3%) 2.23 × 10−2 2.20 × 10−2 FALSE 2.31 × 10−2 TRUE
4 700 140 (20%) 111 (15.9%) 2.17 × 10−2 2.20 × 10−2 TRUE 2.31 × 10−2 TRUE

CAPTURE design with requirement of positive trend at IA1, IA2

CAPTURE design with simple futility bound
Futility bound specified with fixed Z-values
Bound Nominal p1 ~Risk difference at bound Cumulative boundary crossing probability
Alternate hypothesis Null hypothesis
Analysis: 1 N: 350 risk difference: 0.05 IF: 0.25
Futility 0.5000 0.0000 0.0781 0.5000
Efficacy 0.0001 0.1311 0.0104 0.0001
Analysis: 2 N: 700 risk difference: 0.05 IF: 0.5
Futility 0.5000 0.0000 0.0862 0.6250
Efficacy 0.0010 0.0773 0.1376 0.0010
Analysis: 3 N: 1050 risk difference: 0.05 IF: 0.75
Efficacy 0.0072 0.0498 0.4984 0.0073
Analysis: 4 N: 1400 risk difference: 0.05 IF: 1
Efficacy 0.0231 0.0351 0.7675 0.0228
1 One-sided p-value for experimental vs control treatment. Values < 0.5 favor experimental, > 0.5 favor control.

Increase to > 1500 patients to regain lost power

CAPTURE interim analysis story

  • First analysis of 350 patients
    • Sites somewhat limited
  • Second analysis of 700 patients
    • Many more sites enrolling
    • Observed differences in treatment effect from first IA
      • Random or differences in medical practice?
  • DMC added 3rd interim analysis at 1050 patients
    • Is this OK? (FDA said yes in this case; good to ask!)
    • Crossed efficacy bound and stopped trial
    • With two previous positive trials, this was acceptable for approval
  • Overflow data(N=1265) reported in manuscript

CAPTURE lessons learned

  • Enrolling in different sites over time can make trial non-homogenous
    • Homogeneity required for most adaptive trials
  • While O’Brien-Fleming-like spending is a good default, it may be worth considering a bespoke (custom) spending function
  • Futility requiring just a positive trend can save investment and provide at least a minimal proof-of-concept
    • With a surrogate endpoint, may wish to try 2-in-1 for POC (Chen et al. (2018))

Testing multiple hypotheses

Graphical Multiplicity Control in Group Sequential Design

Complex example of Maurer and Bretz (2013)

KEYNOTE 048 Mulitplicity, Burtness et al. (2019)

Challenges in trials with multiple hypotheses

  • Subgroup assumption challenges
    • Prevalence hard to predict
    • Differential rate of endpoint accrual
  • Endpoint challenges
    • Information fraction alignment
  • Mulitiplicity challenges
    • See K. M. Anderson et al. (2022) for implementation and accounting for correlations to relax bounds
    • Implementation of Maurer and Bretz (2013) now in gsDesign vignette (K. Anderson (2020))

Summary

  • Group sequential design can:
    • right size a trial
    • balance safety and efficacy over time
    • answer difficult dose, population and endpoint questions
  • Logistics are key
  • Spending functions worth a lot of thought
  • Software used for this presentation primarily gsDesign2
    • Release expected in Q4
    • Additional flexibility compared to gsDesign
      • Allows non-proportional hazards
      • Stratified populations (binary and TTE endpoints)

Thank you

References

Anderson, Keaven. 2020. gsDesign: Group Sequential Design. https://CRAN.R-project.org/package=gsDesign.

Anderson, Keaven M, and Jason B Clark. 2010. “Fitting Spending Functions.” Statistics in Medicine 29 (3): 321–27.

Anderson, Keaven M, Zifang Guo, Jing Zhao, and Linda Z Sun. 2022. “A Unified Framework for Weighted Parametric Group Sequential Design.” Biometrical Journal.

Burtness, Barbara, Kevin J Harrington, Richard Greil, Denis Soulières, Makoto Tahara, Gilberto de Castro Jr, Amanda Psyrri, et al. 2019. “Pembrolizumab Alone or with Chemotherapy Versus Cetuximab with Chemotherapy for Recurrent or Metastatic Squamous Cell Carcinoma of the Head and Neck (KEYNOTE-048): A Randomised, Open-Label, Phase 3 Study.” The Lancet 394 (10212): 1915–28.

CAPTURE Investigators et al. 1997. “Randomized Placebo-Controlled Trial of Abciximab Before and During Coronary Intervention in Refractory Angina: The CAPTURE Study.” Lancet 349: 1429–35.

Chen, Cong, Keaven Anderson, Devan V Mehrotra, Eric H Rubin, and Archie Tse. 2018. “A 2-in-1 Adaptive Phase 2/3 Design for Expedited Oncology Drug Development.” Contemporary Clinical Trials 64: 238–42.

EPIC Investigators. 1994. “Use of a Monoclonal Antibody Directed Against the Platelet Glycoprotein IIb/IIIa Receptor in High-Risk Coronary Angioplasty.” New England Journal of Medicine 330 (14): 956–61.

Gandhi, Leena, Delvys Rodrı́guez-Abreu, Shirish Gadgeel, Emilio Esteban, Enriqueta Felip, Flávia De Angelis, Manuel Domine, et al. 2018. “Pembrolizumab Plus Chemotherapy in Metastatic Non–Small-Cell Lung Cancer.” New England Journal of Medicine 378 (22): 2078–92.

Maurer, Willi, and Frank Bretz. 2013. “Multiple Testing in Group Sequential Trials Using Graphical Approaches.” Statistics in Biopharmaceutical Research 5 (4): 311–20.

Powles, Thomas, Elizabeth R Plimack, Denis Soulières, Tom Waddell, Viktor Stus, Rustem Gafanov, Dmitry Nosov, et al. 2020. “Pembrolizumab Plus Axitinib Versus Sunitinib Monotherapy as First-Line Treatment of Advanced Renal Cell Carcinoma (KEYNOTE-426): Extended Follow-up from a Randomised, Open-Label, Phase 3 Trial.” The Lancet Oncology 21 (12): 1563–73.