Augist 7, 2024

Abstract

Innovation in clinical trial design may not be enabled by off-the shelf software. This talk will focus on group sequential design with variations such as testing for multiple hypotheses, design for possibly delayed treatment effects and stratified design with differing outcome distributions and treatment effects in different strata. Examples of trials with biomarker/histology subgroups and also multiple experimental treatment groups will be presented. We discuss software specification, testing, issue management and release strategies. Specific trials developed have used gsDesign (group sequential design), gsDesign2 (group sequential design with more options, including non-proportional hazards), multiple testing in group sequential design (gMCPLite and WPGSD). Also, we discuss the use of Shiny to enable design without the need to program in R.

Overview of design challenges considered

  • Group sequential design using spending bounds (gsDesign)
  • Testing multiple hypotheses in group sequential design
    • gMCPLite: part of gMCP plus ggplot2 graphics functionality
    • wpgsd package for correlated outcomes
      • MAMS (multi-arm-multi-stage)
      • Related populations (e.g., biomarker+, overall)
  • Non-proportional hazards with group sequential design (gsDesign2)
  • Fast simulation for time-to-event endpoints: simtrial

Hex sticker wall

Our design packages

Package Topic Shiny interface GitHub / Documentation Unit test coverage
gsDesign Design https://rinpharma.shinyapps.io/gsdesign/

https://github.com/keaven/gsDesign

https://keaven.github.io/gsDesign/

75%
gsDesign2 NPH Under construction

https://github.com/Merck/gsDesign2

https://merck.github.io/gsDesign2/

77%
simtrial TTE Simulation ?

https://github.com/Merck/simtrial

https://merck.github.io/simtrial/

83%
gMCPLite Graphical Multiplicity https://rinpharma.shinyapps.io/gmcp/

https://github.com/Merck/gMCPLite

https://merck.github.io/gMCPLite/

76%
wpgsd Bounds for Correlated Testing Not currently planned

https://github.com/Merck/wpgsd

https://merck.github.io/wpgsd/

79%

Graphical multiplicity

  • Graph using ggplot2 initially built with https://rinpharma.shinyapps.io/gmcp/ which generated code for gMCPLite
  • Dividing \(\alpha\) equally between biomarker positive (B+) and overall population
  • Group sequential design and \(\alpha\)-reallocation available using Maurer and Bretz (2013)
  • Accounting for correlated tests, we can relax Maurer-Bretz bounds using Anderson et al. (2022) and the wpgsd package

Designs

gsDesign and its Shiny app

  • The gsDesign package supports group sequential clinical trial design, largely as presented by Jennison and Turnbull (1999). An easy-to-use web interface to enable usage without coding as well as generate code to reproduce a design; this is being enhanced to support more features on an ongoing basis.
  • Initial OS design for B+ group assumes
    • \(\alpha= 0.01\), 90% power
    • Median control OS = 12 months
    • Hazard ratio (HR) for experimental treatment = 0.6
    • 12 month enrollment with 6 month ramp-up
    • Analyses planned at 16, 26, and 36 months
    • O’Brien-Fleming-like spending bound

Design Using Calendar timing
gsDesign::gsSurvCalendar()
Analysis Value Efficacy
IA 1: 45% p (1-sided) 0.0001
N: 284 ~HR at bound 0.4648
Events: 91 P(Cross) if HR=1 0.0001
Month: 16 P(Cross) if HR=0.6 0.1134
IA 2: 79% p (1-sided) 0.0037
N: 284 ~HR at bound 0.6542
Events: 159 P(Cross) if HR=1 0.0038
Month: 26 P(Cross) if HR=0.6 0.7118
Final p (1-sided) 0.0088
N: 284 ~HR at bound 0.7156
Events: 201 P(Cross) if HR=1 0.0100
Month: 36 P(Cross) if HR=0.6 0.9003
Information- or calendar-based spending supported
Table often incoporated directly into protocol

gsDesign2

  • Introduction: The goal of gsDesign2 is to enable fixed or group sequential design under non-proportional hazards, including changing hazard ratios over time and/or between strata. Substantial flexibility on top of what gsDesign provides (Zhao, Zhang, et al. (2023)).
  • Reproducing same design as gsDesign with different sample size method: gs_design_ahr().
  • Results close, but slightly different from gsDesign.
    • Does this mean one is wrong?
    • We will check using simulation!

Design for B+ Population Using gs_design_ahr()
AHR approximations of ~HR at bound
Bound Z Nominal p1 ~HR at bound2 Cumulative boundary crossing probability
Alternate hypothesis Null hypothesis
Analysis: 1 Time: 15.9 N: 278 Event: 89 AHR: 0.6 Information fraction: 0.45
Efficacy 3.72 0.0001 0.4469 0.0995 0.0001
Analysis: 2 Time: 25.8 N: 278 Event: 155 AHR: 0.6 Information fraction: 0.78
Efficacy 2.71 0.0034 0.6436 0.6798 0.0034
Analysis: 3 Time: 36.2 N: 278 Event: 198 AHR: 0.6 Information fraction: 1
Efficacy 2.37 0.0090 0.7126 0.9013 0.0100
1 One-sided p-value for experimental vs control treatment. Value < 0.5 favors experimental, > 0.5 favors control.
2 Approximate hazard ratio to cross bound.

gsDesign vs. gsDesign2

Feature gsDesign gsDesign2
Nonproportional hazards ✔️
Allow skipping bound at an analysis ✔️
Integer-based sample size/event count ✔️ ✔️
Alternates to logrank for survival analysis ✔️
Calendar-based timing/spending ✔️ ✔️
HR bounds for futility ✔️
Stratified design for binomial ✔️
Shiny interface ✔️ Under construction
Maturity ✔️

Simulations

simtrial

  • simtrial: fast, extensible clinical trial simulation framework for time-to-event endpoints.
    • Backend based on data.table
  • For each simulation
    • Generate data: sim_pw_surv()
    • For each analysis
      • Cut the data for analysis: create_cutting(); function factory allowing complex rules
      • Create Z-value test for analysis: various tests available; logrank shown here
  • Across simulations: summarize trial outcomes

10k simulations for design power
Requiring event count and minimum follow-up helpful
Cut criteria Power (95% CI)
Event count 89.5%, 95% CI: (88.9%, 90.1%)
Event count and minimum follow-up 94.8%, 95% CI: (94.4%, 95.2%)

Non-proportional hazards

  • For Biomarker- group population, assume delayed effect.
    • 3 months: HR = 1
    • Thereafter: HR = 0.7.
    • 30% of overall population.
  • Sample size determined by B+ population already derived.
  • Spending for both B+ and overall determined by B+ information fraction.
  • Power computed here using gs_power_ahr().
  • Design assumes stratified analysis (B+, B-).

Overall Population Design
AHR approximations of ~HR at bound
Bound Z Nominal p1 ~HR at bound2 Cumulative boundary crossing probability
Alternate hypothesis Null hypothesis3
Analysis: 1 Time: 15.9 N: 398 Event: 130 AHR: 0.65 Information fraction: 0.46
Efficacy 3.72 0.0001 0.5145 0.1129 0.0001
Analysis: 2 Time: 25.9 N: 398 Event: 225 AHR: 0.63 Information fraction: 0.79
Efficacy 2.71 0.0034 0.6938 0.7880 0.0034
Analysis: 3 Time: 36 N: 398 Event: 284 AHR: 0.62 Information fraction: 1
Efficacy 2.36 0.0090 0.7538 0.9613 0.0100
1 One-sided p-value for experimental vs. control treatment. Value < 0.5 favors experimental, > 0.5 favors control.
2 Approximate hazard ratio to cross bound.
3 alpha-spending determined by B+ information fraction.

wpgsd

Weighted parametric group sequential design

  • WPGSD; Anderson et al. (2022)
  • Takes advantage of the known correlation structure in constructing efficacy bounds
  • Controls family-wise Type I error (FWER) for a group sequential design.
  • Correlation may be due to:
    • common observations in nested populations
    • overlapping populations
    • common control arm.

Counting events that occur in intersection hypotheses
H1 H2 Analysis Event
1 1 1 89
1 1 2 155
1 1 3 198
1 2 1 89
1 2 2 155
1 2 3 198
2 2 1 130
2 2 2 225
2 2 3 284

Correlation Matrix and Bounds

Correlation matrix

Correlations Between Tests
H1_A1 H2_A1 H1_A2 H2_A2 H1_A3 H2_A3
1.00 0.83 0.76 0.63 0.67 0.56
0.83 1.00 0.63 0.76 0.55 0.68
0.76 0.63 1.00 0.83 0.88 0.74
0.63 0.76 0.83 1.00 0.73 0.89
0.67 0.55 0.88 0.73 1.00 0.83
0.56 0.68 0.74 0.89 0.83 1.00

Bounds

Analysis Hypotheses H1 H2
1 H1 0.00083 NA
1 H1, H2 0.00048 0.00048
1 H2 NA 0.00083
2 H1 0.011 NA
2 H1, H2 0.0069 0.0069
2 H2 NA 0.011
3 H1 0.022 NA
3 H1, H2 0.014 0.014
3 H2 NA 0.022

Group Sequential Bound Comparison: Bonferroni vs. Parametric

Usual group sequential calculation

Bonferroni Bounds
Adjusted only for correlations between analyses
Analysis Hypotheses H1 H2
1 H1 0.00083 NA
1 H1, H2 0.00019 0.00019
1 H2 NA 0.00083
2 H1 0.011 NA
2 H1, H2 0.0047 0.0047
2 H2 NA 0.011
3 H1 0.022 NA
3 H1, H2 0.011 0.011
3 H2 NA 0.022
Bounds expressed as nominal p-values

WPGSD

Weighted Parametric GSD
Adjusted for correlations between analyses and hypotheses
Analysis Hypotheses H1 H2
1 H1 0.00083 NA
1 H1, H2 0.00048 0.00048
1 H2 NA 0.00083
2 H1 0.011 NA
2 H1, H2 0.0069 0.0069
2 H2 NA 0.011
3 H1 0.022 NA
3 H1, H2 0.014 0.014
3 H2 NA 0.022
Bounds expressed as nominal p-values

Simplified submission and reproducibility

Structure design documents for reproducibility

The design toolchain in R is constantly evolving. In a collaborative setting, it is critical to document and use the exact dependency versions when creating and revising designs.

Design documentation using R packages can be organized in R Markdown documents, and further placed in a proprietary R package for submission.

Standard approaches for package environment reproducibility:

  • sessioninfo::session_info() documents dependency version and source.
  • renv snapshots dependency name/version/source and restores them.

For organizations, also consider deploying company-wide reproducible environment strategies such as shared baseline and package validation.

pkglite enables eCTD submission for proprietary R packages

Current Hardware

Summary

  • Validated R-based packages for complex group sequential design
  • Substantial documentation and examples available
  • Key features include:
    • Non-proportional hazards
    • Specialized spending and other bounds
    • Liberalizing bounds by incorporating known correlations
    • Summary output formatted for formal documents

Thank you

Session Information

## ─ Session info ───────────────────────────────────────────────────────────────
##  setting  value
##  version  R version 4.3.1 (2023-06-16)
## 
## ─ Packages ───────────────────────────────────────────────────────────────────
##  package     * version  date (UTC) lib source
##  bslib         0.7.0    2024-03-29 [3] RSPM (R 4.3.1)
##  cachem        1.0.8    2023-05-01 [3] RSPM (R 4.3.1)
##  cli           3.6.2    2023-12-11 [3] RSPM (R 4.3.1)
##  colorspace    2.1-0    2023-01-23 [3] RSPM (R 4.3.1)
##  data.table    1.15.4   2024-03-30 [3] RSPM (R 4.3.1)
##  digest        0.6.35   2024-03-11 [3] RSPM (R 4.3.1)
##  dplyr       * 1.1.4    2023-11-17 [3] RSPM (R 4.3.1)
##  evaluate      0.23     2023-11-01 [3] RSPM (R 4.3.1)
##  fansi         1.0.6    2023-12-08 [3] RSPM (R 4.3.1)
##  farver        2.1.1    2022-07-06 [3] RSPM (R 4.3.1)
##  fastmap       1.1.1    2023-02-24 [3] RSPM (R 4.3.1)
##  generics      0.1.3    2022-07-05 [3] RSPM (R 4.3.1)
##  ggplot2     * 3.5.0    2024-02-23 [3] RSPM (R 4.3.1)
##  glue          1.7.0    2024-01-09 [3] RSPM (R 4.3.1)
##  gMCPLite    * 0.1.5    2024-01-11 [3] RSPM (R 4.3.1)
##  gsDesign    * 3.6.4    2024-07-26 [1] RSPM (R 4.3.1)
##  gsDesign2   * 1.1.2.22 2024-09-03 [1] local
##  gt          * 0.10.1   2024-01-17 [3] RSPM (R 4.3.1)
##  gtable        0.3.4    2023-08-21 [3] RSPM (R 4.3.1)
##  highr         0.10     2022-12-22 [3] RSPM (R 4.3.1)
##  htmltools     0.5.8.1  2024-04-04 [3] RSPM (R 4.3.1)
##  jquerylib     0.1.4    2021-04-26 [3] RSPM (R 4.3.1)
##  jsonlite      1.8.8    2023-12-04 [3] RSPM (R 4.3.1)
##  knitr         1.46     2024-04-06 [3] RSPM (R 4.3.1)
##  labeling      0.4.3    2023-08-29 [3] RSPM (R 4.3.1)
##  lattice       0.21-8   2023-04-05 [2] CRAN (R 4.3.1)
##  lifecycle     1.0.4    2023-11-07 [3] RSPM (R 4.3.1)
##  magrittr      2.0.3    2022-03-30 [3] RSPM (R 4.3.1)
##  MASS          7.3-60   2023-05-04 [2] CRAN (R 4.3.1)
##  Matrix        1.5-4.1  2023-05-18 [2] CRAN (R 4.3.1)
##  metalite    * 0.1.3    2023-08-10 [3] RSPM (R 4.3.1)
##  metalite.ae * 0.1.2    2024-04-16 [3] RSPM (R 4.3.1)
##  mkdocs        0.4.0    2024-07-11 [3] RSPM (R 4.3.1)
##  munsell       0.5.1    2024-04-01 [3] RSPM (R 4.3.1)
##  mvtnorm       1.2-5    2024-05-21 [1] RSPM (R 4.3.1)
##  pillar        1.9.0    2023-03-22 [3] RSPM (R 4.3.1)
##  pkgconfig     2.0.3    2019-09-22 [3] RSPM (R 4.3.1)
##  purrr         1.0.2    2023-08-10 [3] RSPM (R 4.3.1)
##  r2rtf         1.1.1    2023-10-25 [3] RSPM (R 4.3.1)
##  R6            2.5.1    2021-08-19 [3] RSPM (R 4.3.1)
##  Rcpp          1.0.12   2024-01-09 [3] RSPM (R 4.3.1)
##  rlang         1.1.3    2024-01-10 [3] RSPM (R 4.3.1)
##  rmarkdown     2.26     2024-03-05 [3] RSPM (R 4.3.1)
##  rstudioapi    0.16.0   2024-03-24 [3] RSPM (R 4.3.1)
##  sass          0.4.9    2024-03-15 [3] RSPM (R 4.3.1)
##  scales        1.3.0    2023-11-28 [3] RSPM (R 4.3.1)
##  sessioninfo   1.2.2    2021-12-06 [3] RSPM (R 4.3.1)
##  stringi       1.8.3    2023-12-11 [3] RSPM (R 4.3.1)
##  stringr       1.5.1    2023-11-14 [3] RSPM (R 4.3.1)
##  survival      3.5-5    2023-03-12 [2] CRAN (R 4.3.1)
##  tibble        3.2.1    2023-03-20 [3] RSPM (R 4.3.1)
##  tidyr         1.3.1    2024-01-24 [3] RSPM (R 4.3.1)
##  tidyselect    1.2.1    2024-03-11 [3] RSPM (R 4.3.1)
##  utf8          1.2.4    2023-10-22 [3] RSPM (R 4.3.1)
##  vctrs         0.6.5    2023-12-01 [3] RSPM (R 4.3.1)
##  withr         3.0.0    2024-01-16 [3] RSPM (R 4.3.1)
##  wpgsd       * 0.1.0    2024-07-24 [1] Github (Merck/wpgsd@caecf4d)
##  xfun          0.43     2024-03-25 [3] RSPM (R 4.3.1)
##  xml2          1.3.6    2023-12-04 [3] RSPM (R 4.3.1)
##  xtable        1.8-4    2019-04-21 [3] RSPM (R 4.3.1)
##  yaml          2.3.8    2023-12-11 [3] RSPM (R 4.3.1)
## 
## ──────────────────────────────────────────────────────────────────────────────

References

Anderson, Keaven M, Zifang Guo, Jing Zhao, and Linda Z Sun. 2022. “A Unified Framework for Weighted Parametric Group Sequential Design.” Biometrical Journal 64 (7): 1219–39.

Jennison, Christopher, and Bruce W Turnbull. 1999. Group Sequential Methods with Applications to Clinical Trials. CRC Press.

Maurer, Willi, and Frank Bretz. 2013. “Multiple Testing in Group Sequential Trials Using Graphical Approaches.” Statistics in Biopharmaceutical Research 5 (4): 311–20.

Zhao, Yujie, Nan Xiao, Keaven Anderson, and Yilong Zhang. 2023. “Electronic Common Technical Document Submission with Analysis Using R.” Clinical Trials 20 (1): 89–92. https://doi.org/10.1177/17407745221123244.

Zhao, Yujie, Yilong Zhang, Larry Leon, and Keaven M Anderson. 2023. “Group Sequential Design Under Non-Proportional Hazards.” arXiv Preprint arXiv:2312.01723.