“It is more fun to talk with someone who doesn’t use long, difficult words but rather short, easy words like ‘What about lunch?’” — A. A. Milne, Winnie-the-Pooh
September 1, 2022, Bremen
“It is more fun to talk with someone who doesn’t use long, difficult words but rather short, easy words like ‘What about lunch?’” — A. A. Milne, Winnie-the-Pooh
These materials do not represent corporate thoughts of Merck & Co., Inc., Rahway, NJ, USA and its affiliates, or Meta Platforms, Inc.
Keaven Anderson takes responsibility for any errors.
Aim is to support specific design innovations coming into common use
There are also many non-Merck packages
# New grammar and capabilities library(gsdmvn) # To be combined with gsDesign2 library(gsDesign2) # Standalone time-to-event simulation library(simtrial) # Supported since 2007 library(gsDesign) # tidyverse packages library(tibble) library(gt) library(dplyr)
Enrollment rates (piecewise constant, fixed total duration for NPH approach)
Stratum | duration | rate |
---|---|---|
All | 18 | 20 |
Failure and dropout rates (piecewise constant, piecewise hazard ratio)
Stratum | duration | failRate | hr | dropoutRate |
---|---|---|---|---|
All | 4 | 0.05776227 | 1.0 | 0.001 |
All | 100 | 0.05776227 | 0.6 | 0.001 |
# Study duration in months studyDuration <- 36 # Experimental / Control randomization ratio ratio <- 1 # 1-sided Type I error alpha <- 0.025 # Type II error (power may be a bad argument choice) beta <- .1
Easy to describe expected effect over time
AHR( enrollRates = enrollRates, failRates = failRates, totalDuration = c(.01, seq(4, 4.5, .1), 5:36), ratio = 1 ) %>% ggplot(aes(x = Time, y = AHR)) + geom_line() + ggtitle("Geometric mean for hazard ratio by Cox model") + scale_x_continuous(breaks = seq(0, 36, 12))
Method: AHR = average hazard ratio for NPH (Mukhopadhyay et al. 2020)
x <- fixed_design( x = "AHR", alpha = alpha, power = 1 - beta, ratio = 1, enrollRates = enrollRates, failRates = failRates, studyDuration = studyDuration ) x %>% summary() %>% as_gt()
Fixed Design under AHR Method1 | ||||||
---|---|---|---|---|---|---|
Design | N | Events | Time | Bound | alpha | Power |
AHR | 463.078 | 324.7077 | 36 | 1.959964 | 0.025 | 0.9 |
1 Power computed with average hazard ratio method. |
Other methods available: Lachin and Foulkes (Lachin and Foulkes 1986), Fleming-Harrington (Harrington and Fleming 1982), MaxCombo (Karrison et al. 2016; Roychoudhury et al. 2021), Modestly Weighted Logrank (Magirr and Burman 2019), Milestone difference, RMST. Many of these implemented by npsurvSS package (Yung and Liu 2019).
Compare power of tests under 4 month effect delay scenario | ||||||||
---|---|---|---|---|---|---|---|---|
Design | N | Events | Time | Bound | alpha | Power | Simulated power1 | Simulated alpha |
Average hazard ratio | 463.1 | 324.7 | 36 | 1.959964 | 0.0250 | 0.9000 | 0.8960 | 0.0253 |
Lachin and Foulkes | 463.1 | 328.9 | 36 | 1.959964 | 0.0250 | 0.9060 | NA | NA |
Fleming-Harrington FH(0, 0) (logrank) | 463.1 | 324.7 | 36 | 1.959964 | 0.0250 | 0.9029 | 0.8971 | 0.0226 |
Fleming-Harrington FH(0, 0.5) | 463.1 | 324.7 | 36 | 1.959964 | 0.0250 | 0.9584 | 0.9533 | 0.0260 |
MaxCombo: logrank, FH(0, 0.5) | 463.1 | 324.7 | 36 | 1.959964 | 0.0250 | 0.9565 | 0.9415 | 0.0255 |
MaxCombo: logrank, FH(0, 0.5), FH(0.5, 0.5) | 463.1 | 324.7 | 36 | 1.959964 | 0.0250 | 0.9585 | 0.9455 | 0.0276 |
Modestly weighted LR: tau = 4 | 463.1 | 324.7 | 36 | 1.959964 | 0.0250 | 0.9198 | 0.9180 | 0.0233 |
Modestly weighted LR: tau = 12 | 463.1 | 324.7 | 36 | 1.959964 | 0.0250 | 0.9449 | 0.9383 | 0.0215 |
Modestly weighted LR: tau = 18 | 463.1 | 324.7 | 36 | 1.959964 | 0.0250 | 0.9486 | 0.9404 | 0.0234 |
RMST: tau = 36 | 463.1 | 324.7 | 36 | 1.959964 | 0.0250 | 0.8760 | 0.8883 | 0.0277 |
1 Simulated power and alpha is based on 10,000 simulations. |
Strong null addresses Magirr and Burman (2019), Freidlin and Korn (2019)
Test | alpha |
---|---|
Strong null | |
Logrank | 0.0029 |
Fleming-Harrington FH(0, 0.5) | 0.0163 |
MaxCombo: logrank, FH(0, 0.5) | 0.0163 |
MaxCombo: logrank, FH(0, 0.5), FH(0.5, 0.5) | 0.0166 |
MaxCombo: logrank, FH(0, 1) | 0.0344 |
MaxCombo: logrank, FH(0, 1), FH(1, 1) | 0.0366 |
Modestly weighted LR: tau = 4 | 0.0043 |
Modestly weighted LR: tau = 12 | 0.0098 |
Modestly weighted LR: tau = 18 | 0.0132 |
RMST: tau = 36 | 0.0022 |
Milestone: tau = 24 | 0.0135 |
Milestone: tau = 30 | 0.0203 |
Email: Keaven_Anderson@merck.com
Anderson, Keaven M, Zifang Guo, Jing Zhao, and Linda Z Sun. 2022. “A Unified Framework for Weighted Parametric Group Sequential Design.” Biometrical Journal.
Freidlin, Boris, and Edward L Korn. 2019. “Methods for Accommodating Nonproportional Hazards in Clinical Trials: Ready for the Primary Analysis?” Journal of Clinical Oncology 37 (35): 3455.
Harrington, David P, and Thomas R Fleming. 1982. “A Class of Rank Test Procedures for Censored Survival Data.” Biometrika 69 (3): 553–66.
Karrison, Theodore G et al. 2016. “Versatile Tests for Comparing Survival Curves Based on Weighted Log-Rank Statistics.” Stata Journal 16 (3): 678–90.
Lachin, John M., and Mary A. Foulkes. 1986. “Evaluation of Sample Size and Power for Analyses of Survival with Allowance for Nonuniform Patient Entry, Losses to Follow-up, Noncompliance, and Stratification.” Biometrics 42: 507–19.
Magirr, Dominic, and Carl-Fredrik Burman. 2019. “Modestly Weighted Logrank Tests.” Statistics in Medicine 38 (20): 3782–90.
Magirr, Dominic, and José L Jiménez. 2022. “Design and Analysis of Group-Sequential Clinical Trials Based on a Modestly Weighted Log-Rank Test in Anticipation of a Delayed Separation of Survival Curves: A Practical Guidance.” Clinical Trials 19 (2): 201–10.
Mehrotra, Devan V, and Radha Railkar. 2000. “Minimum Risk Weights for Comparing Treatments in Stratified Binomial Trials.” Statistics in Medicine 19 (6): 811–25.
Mukhopadhyay, Pralay, Wenmei Huang, Paul Metcalfe, Fredrik Öhrn, Mary Jenner, and Andrew Stone. 2020. “Statistical and Practical Considerations in Designing of Immuno-Oncology Trials.” Journal of Biopharmaceutical Statistics 30 (6): 1130–46.
Mukhopadhyay, Pralay, Jiabu Ye, Keaven M Anderson, Satrajit Roychoudhury, Eric H Rubin, Susan Halabi, and Richard J Chappell. 2022. “Log-Rank Test Vs MaxCombo and Difference in Restricted Mean Survival Time Tests for Comparing Survival Under Nonproportional Hazards in Immuno-Oncology Trials: A Systematic Review and Meta-Analysis.” JAMA Oncology.
Roychoudhury, Satrajit, Keaven M Anderson, Jiabu Ye, and Pralay Mukhopadhyay. 2021. “Robust Design and Analysis of Clinical Trials with Nonproportional Hazards: A Straw Man Guidance from a Cross-Pharma Working Group.” Statistics in Biopharmaceutical Research, 1–15. https://doi.org/10.1080/19466315.2021.1874507.
Yung, Godwin, and Yi Liu. 2019. “Sample Size and Power for the Weighted Log-Rank Test and Kaplan-Meier Based Tests with Allowance for Nonproportional Hazards.” Biometrics.