- Thanks to Yujie Zhao, Yilong Zhang, Nan Xiao and others for software contributions
- Disclaimers
- All opinions reflect those of the author
- All errors attributable to author

September 22, 2022

- Thanks to Yujie Zhao, Yilong Zhang, Nan Xiao and others for software contributions
- Disclaimers
- All opinions reflect those of the author
- All errors attributable to author

We consider several industry group sequential trials and associated issues over the last 30 years. Generally, group sequential design has provided a great deal of flexibility to overcome many challenges in a relatively straightforward way compared to more complex adaptive designs. Among the issues considered are timing of and boundaries for interim and final analyses, dealing with multiple hypotheses created by dose groups, populations and endpoints, and statistical information. Tools for design and execution will also be discussed.

- Interim analyses (group sequential design)
*Early*futility analysis (safety, proof-of-concept)- Efficacy interim analysis (efficacy may be good; e.g., Gandhi et al. (2018), Powles et al. (2020))

- Multiple hypothesis testing
- Testing multiple arms (doses)
- Testing multiple populations(e.g., biomarker+ and overall)

- Software that gives me what I want
- Operationally seamless Phase 2/3
- Get to see all data at critical decision point

- Adaptive design
- but I do like blinded sample size re-estimation

- Optimal design
- but it can be good to compare to how close your design is to optimal

- Adaptive Phase 2/3
- Too restrictive
- Operationally usually not possible
- I don’t believe in homogeneous trials

- Trying to answer too many questions in Phase 2

EPIC Investigators (1994)

- Population: patients undergoing angioplasty who are at high risk for recurrent events
- Adjudicated composite primary endpoint: 1) death, 2) nonfatal myocardial infarction, 3) urgent repeat intervention
- Safety concerns
- Major bleeding
- Intracranial hemmorhage

- 3 arms, double blind (c7E3 Fab = abciximab = ReoPro = anti-platelet agent)
- c7E3 Fab bolus + infusion
- c7E3 Fab bolus + placebo infusion (FDA insisted on this!)
- placebo bolus + placebo infusion (control)

- Group sequential design
- 2100 patients to detect a reduction 15% to 10% with 80% power and 2-sided \(\alpha=0.05\)
- Interim analyses after 1/3 and 2/3 of patients
- Final analysis: nominal 2-sided p = 0.036

- Multiplicity control
- Global null hypothesis trend test: control (0), bolus (1), bolus + infusion (2)
- Pairwise comparisons of c7E3 Fab dose groups vs. control

- Adjudication considered necessary for accurate assessment of primary endpoint
- Lots of data collection and cleaning
- Set up of adjudication logistics and personnel
- Too slow for prompt DMC availability?

- First interim (~700 patients in interim)
- Operational challenges were big
- Both high quality and fast delivery needed

- Lesson learned: clean adjudicated data needed fast for success

Endpoint | Placebo | c7E3 bolus | c7E3 bolus + infusion |
---|---|---|---|

N | 696 | 695 | 708 |

Primary Efficacy | 89 (12.4%) | 79 (11.4%) | 59 (8.3%) |

Major bleeding | 46 (6.6%) | 76 (10.9%) | 99 (13.9%) |

Intracranial hemorrhage | 2 (0.3%) | 1 (0.1%) | 3 (0.4%) |

- Positive efficacy finding with discouraging safety limited sales
- p=0.009 for trend test, p=0.008 for bolus + infusion vs control (35% reduction)
- 3rd arm (bolus) essential to find minimally effective dose

- Importance of logistics and execution for interim analysis
- Interim analysis important for both efficacy and safety
- More than 1 experimental arm can be essential in Phase 3

CAPTURE Investigators et al. (1997)

- Population: patients with unstable angina undergoing angioplasty who are at high risk for recurrent events
- Treatment: Medical therapy starting 18-24 hrs prior to planned angioplasty through 1 hour post angiplasty
- Primary efficacy endpoint: same as EPIC

- 2 arms, double blind (c7E3 Fab = abciximab = ReoPro = anti-platelet agent)
- c7E3 Fab bolus + c7E3 Fab infusion
- placebo bolus + placebo infusion (control)

- Group sequential design
- 1400 patients to detect reduction 15% to 10% with 80% power and 2-sided \(\alpha=0.05\)
- Interim analyses after 25% and 50% of patients
- Final analysis: 1400 patients

- Discussion with Jan Tijssen, statistician, Amsterdam AMC
- Interim bounds with custom spending function resulting in
- p=0.0001, p=0.001 nominal p-value bounds at 25% and 50% of patients

CAPTURE fixed design sample size^{1} |
||||
---|---|---|---|---|

Design | N | Bound | alpha | Power |

RD | 1372 | 1.959964 | 0.025 | 0.8 |

^{1} Risk difference power without continuity correction using method of Farrington and Manning. |

O’Brien-Fleming-like bounds

CAPTURE sample size for group sequential design, N = 1400 | ||||
---|---|---|---|---|

Efficacy testing bound only; O'Brien-Fleming-like bound | ||||

Bound | Nominal p^{1} |
~Risk difference at bound | Cumulative boundary crossing probability | |

Alternate hypothesis | Null hypothesis | |||

Analysis: 1 N: 350 risk difference: 0.05 IF: 0.25 | ||||

Efficacy | 0.0000 | 0.1527 | 0.0017 | 0.000007 |

Analysis: 2 N: 700 risk difference: 0.05 IF: 0.5 | ||||

Efficacy | 0.0015 | 0.0739 | 0.1692 | 0.001525 |

Analysis: 3 N: 1050 risk difference: 0.05 IF: 0.75 | ||||

Efficacy | 0.0092 | 0.0480 | 0.5425 | 0.009649 |

Analysis: 4 N: 1400 risk difference: 0.05 IF: 1 | ||||

Efficacy | 0.0220 | 0.0355 | 0.8020 | 0.025000 |

^{1} One-sided p-value for experimental vs control treatment. Values < 0.5 favor experimental, > 0.5 favor control. |

t-distribution spending (K. M. Anderson and Clark (2010))

Using t-distribution spending (K. M. Anderson and Clark (2010))

One-sided CAPTURE bounds as specified in protocol | ||||
---|---|---|---|---|

Custom spending function to set desired interim nominal p-values and N=1400 | ||||

Bound | Nominal p^{1} |
~Risk difference at bound | Cumulative boundary crossing probability | |

Alternate hypothesis | Null hypothesis | |||

Analysis: 1 N: 350 risk difference: 0.05 IF: 0.25 | ||||

Efficacy | 0.0001 | 0.1311 | 0.0104 | 0.0001 |

Analysis: 2 N: 700 risk difference: 0.05 IF: 0.5 | ||||

Efficacy | 0.0010 | 0.0773 | 0.1376 | 0.0010 |

Analysis: 3 N: 1050 risk difference: 0.05 IF: 0.75 | ||||

Efficacy | 0.0072 | 0.0498 | 0.5065 | 0.0075 |

Analysis: 4 N: 1400 risk difference: 0.05 IF: 1 | ||||

Efficacy | 0.0231 | 0.0351 | 0.8047 | 0.0250 |

^{1} One-sided p-value for experimental vs control treatment. Values < 0.5 favor experimental, > 0.5 favor control. |

Judgement required to decide if

- bounds represent clinically and statistically significant results,
- ethically allow continuation when not crossed,
- establish balance of safety and efficacy, and
- are enough data to demonstrate new treatment fit for use

Analaysis | N per arm | Hypothetical result | O'Brien-Fleming spending | t-distribution spending | ||||
---|---|---|---|---|---|---|---|---|

Control | Experimental | Nominal p | OBF bound | Reject OBF | t-Dist. bound | Reject t | ||

1 | 175 | 25 (14.3%) | 3 (1.7%) | 7.30 × 10^{−6} |
7.37 × 10^{−6} |
TRUE | 1.00 × 10^{−4} |
TRUE |

1 | 175 | 25 (14.3%) | 5 (2.9%) | 6.70 × 10^{−5} |
7.37 × 10^{−6} |
FALSE | 1.00 × 10^{−4} |
TRUE |

1 | 175 | 30 (17.1%) | 5 (2.9%) | 4.21 × 10^{−6} |
7.37 × 10^{−6} |
TRUE | 1.00 × 10^{−4} |
TRUE |

1 | 175 | 30 (17.1%) | 8 (4.6%) | 7.84 × 10^{−5} |
7.37 × 10^{−6} |
FALSE | 1.00 × 10^{−4} |
TRUE |

1 | 175 | 35 (20%) | 8 (4.6%) | 5.50 × 10^{−6} |
7.37 × 10^{−6} |
TRUE | 1.00 × 10^{−4} |
TRUE |

1 | 175 | 35 (20%) | 11 (6.3%) | 7.33 × 10^{−5} |
7.37 × 10^{−6} |
FALSE | 1.00 × 10^{−4} |
TRUE |

2 | 350 | 50 (14.3%) | 24 (6.9%) | 6.97 × 10^{−4} |
1.52 × 10^{−3} |
TRUE | 9.59 × 10^{−4} |
TRUE |

2 | 350 | 50 (14.3%) | 25 (7.1%) | 1.13 × 10^{−3} |
1.52 × 10^{−3} |
TRUE | 9.59 × 10^{−4} |
FALSE |

2 | 350 | 60 (17.1%) | 32 (9.1%) | 8.67 × 10^{−4} |
1.52 × 10^{−3} |
TRUE | 9.59 × 10^{−4} |
TRUE |

2 | 350 | 60 (17.1%) | 33 (9.4%) | 1.32 × 10^{−3} |
1.52 × 10^{−3} |
TRUE | 9.59 × 10^{−4} |
FALSE |

2 | 350 | 70 (20%) | 40 (11.4%) | 9.18 × 10^{−4} |
1.52 × 10^{−3} |
TRUE | 9.59 × 10^{−4} |
TRUE |

2 | 350 | 70 (20%) | 41 (11.7%) | 1.35 × 10^{−3} |
1.52 × 10^{−3} |
TRUE | 9.59 × 10^{−4} |
FALSE |

3 | 525 | 75 (14.3%) | 49 (9.3%) | 6.45 × 10^{−3} |
9.16 × 10^{−3} |
TRUE | 7.19 × 10^{−3} |
TRUE |

3 | 525 | 75 (14.3%) | 50 (9.5%) | 8.60 × 10^{−3} |
9.16 × 10^{−3} |
TRUE | 7.19 × 10^{−3} |
FALSE |

3 | 525 | 90 (17.1%) | 62 (11.8%) | 7.03 × 10^{−3} |
9.16 × 10^{−3} |
TRUE | 7.19 × 10^{−3} |
TRUE |

3 | 525 | 90 (17.1%) | 63 (12%) | 9.10 × 10^{−3} |
9.16 × 10^{−3} |
TRUE | 7.19 × 10^{−3} |
FALSE |

3 | 525 | 105 (20%) | 75 (14.3%) | 7.01 × 10^{−3} |
9.16 × 10^{−3} |
TRUE | 7.19 × 10^{−3} |
TRUE |

3 | 525 | 105 (20%) | 76 (14.5%) | 8.91 × 10^{−3} |
9.16 × 10^{−3} |
TRUE | 7.19 × 10^{−3} |
FALSE |

4 | 700 | 100 (14.3%) | 75 (10.7%) | 2.17 × 10^{−2} |
2.20 × 10^{−2} |
TRUE | 2.31 × 10^{−2} |
TRUE |

4 | 700 | 120 (17.1%) | 92 (13.1%) | 1.84 × 10^{−2} |
2.20 × 10^{−2} |
TRUE | 2.31 × 10^{−2} |
TRUE |

4 | 700 | 120 (17.1%) | 93 (13.3%) | 2.23 × 10^{−2} |
2.20 × 10^{−2} |
FALSE | 2.31 × 10^{−2} |
TRUE |

4 | 700 | 140 (20%) | 111 (15.9%) | 2.17 × 10^{−2} |
2.20 × 10^{−2} |
TRUE | 2.31 × 10^{−2} |
TRUE |

CAPTURE design with simple futility bound | ||||
---|---|---|---|---|

Futility bound specified with fixed Z-values | ||||

Bound | Nominal p^{1} |
~Risk difference at bound | Cumulative boundary crossing probability | |

Alternate hypothesis | Null hypothesis | |||

Analysis: 1 N: 350 risk difference: 0.05 IF: 0.25 | ||||

Futility | 0.5000 | 0.0000 | 0.0781 | 0.5000 |

Efficacy | 0.0001 | 0.1311 | 0.0104 | 0.0001 |

Analysis: 2 N: 700 risk difference: 0.05 IF: 0.5 | ||||

Futility | 0.5000 | 0.0000 | 0.0862 | 0.6250 |

Efficacy | 0.0010 | 0.0773 | 0.1376 | 0.0010 |

Analysis: 3 N: 1050 risk difference: 0.05 IF: 0.75 | ||||

Efficacy | 0.0072 | 0.0498 | 0.4984 | 0.0073 |

Analysis: 4 N: 1400 risk difference: 0.05 IF: 1 | ||||

Efficacy | 0.0231 | 0.0351 | 0.7675 | 0.0228 |

^{1} One-sided p-value for experimental vs control treatment. Values < 0.5 favor experimental, > 0.5 favor control. |

Increase to > 1500 patients to regain lost power

- First analysis of 350 patients
- Sites somewhat limited

- Second analysis of 700 patients
- Many more sites enrolling
- Observed differences in treatment effect from first IA
- Random or differences in medical practice?

- DMC added 3rd interim analysis at 1050 patients
- Is this OK? (FDA said yes in this case; good to ask!)
- Crossed efficacy bound and stopped trial
- With two previous positive trials, this was acceptable for approval

- Overflow data(N=1265) reported in manuscript

- Enrolling in different sites over time can make trial non-homogenous
- Homogeneity required for most adaptive trials

- While O’Brien-Fleming-like spending is a good default, it may be worth considering a bespoke (custom) spending function
- Futility requiring just a positive trend can save investment and provide at least a minimal proof-of-concept
- With a surrogate endpoint, may wish to try 2-in-1 for POC (Chen et al. (2018))

Complex example of Maurer and Bretz (2013)

- Subgroup assumption challenges
- Prevalence hard to predict
- Differential rate of endpoint accrual

- Endpoint challenges
- Information fraction alignment

- Mulitiplicity challenges
- See K. M. Anderson et al. (2022) for implementation and accounting for correlations to relax bounds
- Implementation of Maurer and Bretz (2013) now in
**gsDesign**vignette (K. Anderson (2020))

- Group sequential design can:
- right size a trial
- balance safety and efficacy over time
- answer difficult dose, population and endpoint questions

- Logistics are key
- Spending functions worth a lot of thought
- Software used for this presentation primarily
**gsDesign2**- Release expected in Q4
- Additional flexibility compared to
**gsDesign**- Allows non-proportional hazards
- Stratified populations (binary and TTE endpoints)

*Email:* Keaven_Anderson@merck.com

Anderson, Keaven. 2020. *
gsDesign: Group Sequential Design*. https://CRAN.R-project.org/package=gsDesign.

Anderson, Keaven M, and Jason B Clark. 2010.
“Fitting Spending Functions.” *Statistics in Medicine* 29 (3): 321–27.

Anderson, Keaven M, Zifang Guo, Jing Zhao, and Linda Z Sun. 2022.
“A Unified Framework for Weighted Parametric Group Sequential Design.” *Biometrical Journal*.

Burtness, Barbara, Kevin J Harrington, Richard Greil, Denis Soulières, Makoto Tahara, Gilberto de Castro Jr, Amanda Psyrri, et al. 2019.
“Pembrolizumab Alone or with Chemotherapy Versus Cetuximab with Chemotherapy for Recurrent or Metastatic Squamous Cell Carcinoma of the Head and Neck (KEYNOTE-048): A Randomised, Open-Label, Phase 3 Study.” *The Lancet* 394 (10212): 1915–28.

CAPTURE Investigators et al. 1997.
“Randomized Placebo-Controlled Trial of Abciximab Before and During Coronary Intervention in Refractory Angina: The CAPTURE Study.” *Lancet* 349: 1429–35.

Chen, Cong, Keaven Anderson, Devan V Mehrotra, Eric H Rubin, and Archie Tse. 2018.
“A 2-in-1 Adaptive Phase 2/3 Design for Expedited Oncology Drug Development.” *Contemporary Clinical Trials* 64: 238–42.

EPIC Investigators. 1994.
“Use of a Monoclonal Antibody Directed Against the Platelet Glycoprotein IIb/IIIa Receptor in High-Risk Coronary Angioplasty.” *New England Journal of Medicine* 330 (14): 956–61.

Gandhi, Leena, Delvys Rodrı́guez-Abreu, Shirish Gadgeel, Emilio Esteban, Enriqueta Felip, Flávia De Angelis, Manuel Domine, et al. 2018.
“Pembrolizumab Plus Chemotherapy in Metastatic Non–Small-Cell Lung Cancer.” *New England Journal of Medicine* 378 (22): 2078–92.

Maurer, Willi, and Frank Bretz. 2013.
“Multiple Testing in Group Sequential Trials Using Graphical Approaches.” *Statistics in Biopharmaceutical Research* 5 (4): 311–20.

Powles, Thomas, Elizabeth R Plimack, Denis Soulières, Tom Waddell, Viktor Stus, Rustem Gafanov, Dmitry Nosov, et al. 2020.
“Pembrolizumab Plus Axitinib Versus Sunitinib Monotherapy as First-Line Treatment of Advanced Renal Cell Carcinoma (KEYNOTE-426): Extended Follow-up from a Randomised, Open-Label, Phase 3 Trial.” *The Lancet Oncology* 21 (12): 1563–73.