December 6, 2021
3:30: Introduction and background theory (30 minutes)
4:00: Proportional hazards applications with Shiny app (25 minutes)
4:25: Intro to non-proportional hazards (NPH; 5 minutes)
4:30: Software and piecewise model (15 min)
4:45: Average hazard ratio (AHR; 20 minutes)
5:05: Break (10 minutes)
5:15: NPH design with logrank test (25 minutes)
5:40: Weighted logrank and combination tests (40 minutes)
6:20 Summary and questions (10 minutes)
All opinions expressed are those of the presenters and not Merck Sharp & Dohme Corp., a subsidiary of Merck & Co., Inc., Kenilworth, NJ, USA.
Some slides need to be scrolled down to see the full content.
For minimum effort and maximum benefit, you will probably at least want to use the Shiny interface for the gsDesign R package at https://rinpharma.shinyapps.io/gsdesign/. This is also available at https://gsdesign.shinyapps.io/prod/, but this site is only licensed for a small number of simultaneous users.
data/
: contains design files for examples; also simulation resultsvignettes/
: reports produced by Shiny app to summarize designssimulation/
: R code and simulation data for the last part of courseAlternate hypothesis: Type II error \(\beta= 1 - \hbox{power}\)
\[-\infty\le a_k<b_k, k=1,\ldots,K-1,\] \[a_K\le b_K\] If \(a_K = b_K\) then total Type II error is \[\beta = \sum_{k=1}^{K} l_k = \sum_{k=1}^{K} \text{Pr}(\{Z_k < a_k\} \cap_{j=1}^{k-1} \{a_j \le Z_j \le b_j\}\mid H_1)\]
Test each treatment for superiority vs the other
Usually not of interest in pharmaceutical industry
Give up if experimental arm not trending in favor of control?
Give up if experimental arm trending worse than control
Approaches to calculate decision boundary:
The error spending approach: specify boundary crossing probabilities at each analysis. This is most commonly done with the error spending function approach (Lan and DeMets 1983).
The boundary family approach: specify how big boundary values should be relative to each other and adjust these relative values by a constant multiple to control overall error rates. The commonly applied boundary family include:
Main idea:
Modified Haybittle-Peto procedure 1:
Bonferroni adjustment:
Advantages:
Definition:
For 2-sided testing, Wang and Tsiatis (1987) defined the boundary function for the \(k\)-th look as \[ \Gamma(\alpha, K, \Delta) k^{\Delta - 0.5}, \] where \(\Gamma(\alpha, K, \Delta)\) is a constant chosen so that the level of significance is equal to \(\alpha\).
Two special cases:
For 2-sided testing, the Pocock procedure rejects at the \(k\)-th equally-spaced of \(K\) looks if \[|Z_k| > c_P(K),\] where \(c_P(K)\) is fixed given \(K\) such that \(\text{Pr}(\cup_{k=1}^{K} |Z_k| > c_P(K)) = \alpha\).
total number of looks(K) | \(\alpha = 0.01\) | \(\alpha = 0.05\) | \(\alpha = 0.1\) |
---|---|---|---|
1 | 2.576 | 1.960 | 1.645 |
2 | 2.772 | 2.178 | 1.875 |
4 | 2.939 | 2.361 | 2.067 |
8 | 3.078 | 2.512 | 2.225 |
\(\infty\) | \(\infty\) | \(\infty\) | \(\infty\) |
We will reject \(H_0\) if \(|Z(k/4)| > 2.361\) for \(k = 1,2,3,4\) (final analysis).
Weakness:
Overly aggressive interim bounds
High price for the end of the trial.
\(c_P(K) \to +\infty\) as \(K \to + \infty\).
Requires equally spaced looks.
total number of looks(K) | \(\alpha = 0.01\) | \(\alpha = 0.05\) | \(\alpha = 0.1\) |
---|---|---|---|
1 | 2.576 | 1.960 | 1.645 |
2 | 2.580 | 1.977 | 1.678 |
4 | 2.609 | 2.024 | 1.733 |
8 | 2.648 | 2.072 | 1.786 |
16 | 2.684 | 2.114 | 1.830 |
\(\infty\) | 2.807 | 2.241 | 1.960 |
Example:
Procedure name | Boundary | Advantages | Disadvantages |
---|---|---|---|
Haybittle-Peto | K-1 at interim analyses and 1.96 at the final analysis | simple to implement | |
Pocock | a constant decision boundary for Z-score | (1) requires the same level of evidence for early and late looks at the data, so it pays larger price for the final analysis ; (2) requires equally spaced looks |
|
O’Brien-Fleming | constant B-value boundaries, steep decrease in Z-boundaries | pay smaller price for the final analysis | too conservative in the early stages? |
Key aspects of the design as documented in the protocol accompanying Gandhi et al (2018).
Poisson mixture cure model we consider:
\[S(t)= \exp(-\theta (1 - \exp(-\lambda t)).\]
Note that:
More details in book.
See the following link for the Moderna COVID-19 design replication: https://medium.com/@yipeng_39244/reverse-engineering-the-statistical-analyses-in-the-moderna-protocol-2c9fd7544326
Can you reproduce this using the Shiny interface?
Cholesterol lowering and mortality Miettinen et al (1997)
simtrial
, gsDesign2
and gsdmvn
are under development and hosted on GitHub/Merck.
Below is the GitHub repo link to the source code
simtrial
: https://github.com/Merck/simtrialgsDesign2
: https://github.com/Merck/gsDesign2gsdmvn
: https://github.com/Merck/gsdmvnHow to report issues?
simfix()
: Simulation of fixed sample size design for time-to-event endpointsimfix2simPWSurv()
: Conversion of enrollment and failure rates from simfix()
to simPWSurv()
formatsimPWSurv()
: Simulate a stratified time-to-event outcome randomized trialcutData()
: Cut a dataset for analysis at a specified datecutDataAtCount()
: Cut a dataset for analysis at a specified event countgetCutDateForCount()
: Get date at which an event count is reachedtenFH()
: Fleming-Harrington weighted logrank teststenFHcorr()
: Fleming-Harrington weighted logrank tests plus correlationstensurv()
: Process survival data into counting process formatpMaxCombo()
: MaxCombo p-valuepwexpfit()
: Piecewise exponential survival estimationwMB()
: Magirr and Burman modestly weighted logrank testsfixedBlockRand()
: Permuted fixed block randomizationrpwenroll()
: Generate piecewise exponential enrollmentrpwexp()
: The piecewise exponential distributionFrom NPH Working Group
Ex1delayedEffect
: Time-to-event data example 1 for non-proportional hazards working groupEx2delayedEffect
: Time-to-event data example 2 for non-proportional hazards working groupEx3curewithph
: Time-to-event data example 3 for non-proportional hazards working groupEx4belly
: Time-to-event data example 4 for non-proportional hazards working groupEx5widening
: Time-to-event data example 5 for non-proportional hazards working groupEx6crossing
: Time-to-event data example 6 for non-proportional hazards working groupMBdelayed
: Simulated survival dataset with delayed treatment effectAHR()
: Average hazard ratio under non-proportional hazards (test version)eAccrual()
: Piecewise constant expected accrualeEvents_df()
: Expected events observed under piecewise exponential modelppwe()
: Estimate piecewise exponential cumulative distribution functions2pwe()
: Approximate survival distribution with piecewise exponential distributiontEvents()
: Predict time at which a targeted event count is achievedWe will focus on AHR()
, ppwe()
, and s2pwe()
in this training.
Power and design functions extending the Jennison and Turnbull (2000) computational model to non-constant treatment effects. Partial list of functions:
gs_power_ahr()
: Power computationgs_design_ahr()
: Design computationsgs_b()
: direct input of boundsgs_spending_bound()
: spending function boundsSet up piecewise exponential enrollment rates
enrollRates <- tibble::tribble( ~duration, ~rate, # 5/month for 6 months 6, 5, # 20/month until enrollment complete 6, 20 )
Get enrollment times for 150 observations
set.seed(123) Month <- simtrial::rpwenroll( n = 150, enrollRates = enrollRates )
Note: under exponential distribution, median (\(m\)) and failure rate (\(\lambda\)) related:
\[ \begin{align} m=&\log(2)/\lambda \\ \lambda=&\log(2)/m \\ \end{align} \]
Specify failure rates and dropout rates in same table
# Control: exponential with 15 month median # HR: 1 for 4 months, 0.6 thereafter failRates <- tibble::tribble( ~Stratum, ~duration, ~failRate, ~hr, ~dropoutRate, "All", 4, log(2) / 15, 1, .001, "All", 100, log(2) / 15, 0.6, .001 )
dloglogis <- function(x, alpha = 1, beta = 4) { 1 / (1 + (x / alpha)^beta) } times10 <- c(seq(1 / 3, 1, 1 / 3), 2, 3) # Use s2pwe() to generate piecewise approximation gsDesign2::s2pwe( times = times10, survival = dloglogis(times10, alpha = .5, beta = 4) ) %>% gt() %>% fmt_number(col = 1:2, decimals=3)
duration | rate |
---|---|
0.333 | 0.541 |
0.333 | 3.736 |
0.333 | 4.223 |
1.000 | 2.716 |
1.000 | 1.619 |
Interval | HR | -ln(HR) | Expected Events |
---|---|---|---|
0-4 | 1.0 | 0.00 | d1 |
>4 | 0.6 | 0.51 | d2 |
\[\hbox{AHR} = \exp\left( \frac{d_1 \log(1) + d_2 \log(0.6)}{d_1 + d_2}\right)\]
\[ \hat{\beta}\sim \hbox{Normal}(\beta,\mathcal{I}^{-1}) \] where \[ \mathcal{I}=\sum_{m=1}^M (1/E(D_{0m}) + 1/E(D_{1m}))^{-1} \] - Under null hypothesis, this is like the Schoenfeld (1981) approximation \[ \mathcal{I}=\xi\sum_{m=1}^M E(D_{0m}) \] where \(\xi=1/2\) for 1:1 randomization
gsDesign::nEvents()
Events <- 332 # if beta is NULL and n= # of events, # power is computed instead of events required Power <- gsDesign::nEvents(n = Events, beta = NULL, hr = c(.6, .7, .8))
Assume 332 events
Used the following code to get AHR and information at specified times:
analysisTimes <- c(12, 20, 28, 36) sampleSize <- 500 enrollRates <- tibble(Stratum = "All", duration = 12, rate = sampleSize / 12) failRates <- tibble( Stratum = "All", duration = c(4, 100), failRate = log(2) / 15, hr = c(1, .6), dropoutRate = 0.001 ) ahr <- gsDesign2::AHR( enrollRates = enrollRates, failRates = failRates, totalDuration = analysisTimes, ratio = 1 )
# Transform to simPWSurv() format x <- simfix2simPWSurv(failRates) nsim <- 10000 # of simulations # Set up matrix for simulation results results <- matrix(0, nrow = nsim * 4, ncol = 6) colnames(results) <- c("Sim", "Analysis", "Events", "beta", "var", "logrank") ii <- 0 # index for results row for (sim in 1:nsim) { # Simulate a trial ds <- simPWSurv( n = sampleSize, enrollRates = enrollRates, failRates = x$failRates, dropoutRates = x$dropoutRates ) for (j in seq_along(analysisTimes)) { # Cut data at specified analysis times # Use cutDataAtCount to cut at event count dsc <- ds %>% cutData(analysisTimes[j]) ii <- ii + 1 results[ii, 1] <- sim results[ii, 2] <- j results[ii, 3] <- sum(dsc$event) cox <- coxph(Surv(tte, event) ~ Treatment, data = dsc) results[ii, 4] <- as.numeric(cox$coefficients) results[ii, 5] <- as.numeric(cox$var) # Logrank test Z <- dsc %>% tensurv(txval = "Experimental") %>% tenFH(rg = tibble::tibble(rho = 0, gamma = 0)) results[ii, 6] <- as.numeric(Z$Z) } }
Simulation summary based on 10k replications
results <- tibble::as_tibble(results) simsum <- results %>% group_by(Analysis) %>% summarize( AHR = exp(mean(beta)), Events = mean(Events), info = 1 / mean(var(beta)), info0 = Events / 4 )
Asymptotic Approximation
Time | AHR | Events | info | info0 |
---|---|---|---|---|
12.00 | 0.84 | 107.39 | 26.37 | 26.85 |
20.00 | 0.74 | 207.90 | 50.67 | 51.97 |
28.00 | 0.70 | 279.10 | 68.23 | 69.78 |
36.00 | 0.68 | 331.29 | 81.38 | 82.82 |
10k simulations
AHR | Events | info | info0 |
---|---|---|---|
0.84 | 107.16 | 25.48 | 26.79 |
0.74 | 207.77 | 49.74 | 51.94 |
0.70 | 278.93 | 67.29 | 69.73 |
0.68 | 331.13 | 80.11 | 82.78 |
info0
seems a better approximation of simulation than info
info0
and info
in design will make sample size a little more conservative
info
ggplot(results, aes(x = factor(Analysis), y = beta)) + geom_violin() + ggtitle("Distribution of Cox Coefficient by Analysis") + xlab("Analysis") + ylab("Cox coefficient")
9 simulations
Question: Do you really want to adapt sample size based on an early interim estimate of treatment effect?
Statistical information at analysis: \(\mathcal{I}_k\), \(1\le k\le K\)
Proportion of final information at analysis \(k\): \(t_k =\mathcal{I}_k / \mathcal{I}_K\)
\[Z_k\sim \hbox{Normal}(\sqrt{\mathcal{I}_k} \theta(t_k),1)\] Multivariate normal with correlations for \(1\le j\le k\le K\):
\[\hbox{Corr}(Z_j,Z_k)=\sqrt{t_j/t_k}\]
analysisTimes <- c(12, 20, 28, 36) sampleSize <- 500
enrollRates
Stratum | duration | rate |
---|---|---|
All | 12 | 41.7 |
failRates
Stratum | duration | failRate | hr | dropoutRate |
---|---|---|---|---|
All | 4 | 0.046 | 1.0 | 0.001 |
All | 100 | 0.046 | 0.6 | 0.001 |
ahr
Time | AHR | Events | info | info0 |
---|---|---|---|---|
12 | 0.84 | 107.4 | 26.4 | 26.8 |
20 | 0.74 | 207.9 | 50.7 | 52.0 |
28 | 0.70 | 279.1 | 68.2 | 69.8 |
36 | 0.68 | 331.3 | 81.4 | 82.8 |
Information fraction for interim analyses
## [1] 0.3241690 0.6275343 0.8424726
PH1sided <- gsDesign::gsSurv( # Derive Group Sequential Design k = 4, # Number of analyses (interim + final) test.type = 1, # use this for 1-sided testing alpha = 0.025, # 1-sided Type I error beta = 0.1, # Type II error (1 - power) timing = timing, # Information fraction for interims sfu = sfLDOF, # O'Brien-Fleming spending approximation lambdaC = failRates$failRate, # Piecewise control failure rates hr = ahr$AHR[4], # Used final analysis AHR eta = failRates$dropoutRate, # Piecewise exponential dropout rates gamma = enrollRates$rate, # Relative enrollment R = enrollRates$duration, # Duration of piecewise enrollment rates S = failRates$duration[1], # Duration of piecewise failure rates (K-1) T = max(analysisTimes), # Study duration minfup = max(analysisTimes) - sum(enrollRates$duration), # Minimum follow-up ratio = 1 # Experimental:Control randomization ratio )
Analysis | Value | Efficacy |
---|---|---|
IA 1: 32% | Z | 3.7670 |
N: 444 | p (1-sided) | 0.0001 |
Events: 97 | ~HR at bound | 0.4636 |
Month: 13 | P(Cross) if HR=1 | 0.0001 |
P(Cross) if HR=0.68 | 0.0289 | |
IA 2: 63% | Z | 2.6020 |
N: 444 | p (1-sided) | 0.0046 |
Events: 186 | ~HR at bound | 0.6828 |
Month: 21 | P(Cross) if HR=1 | 0.0047 |
P(Cross) if HR=0.68 | 0.4999 | |
IA 3: 84% | Z | 2.2209 |
N: 444 | p (1-sided) | 0.0132 |
Events: 250 | ~HR at bound | 0.7549 |
Month: 28 | P(Cross) if HR=1 | 0.0146 |
P(Cross) if HR=0.68 | 0.7916 | |
Final | Z | 2.0453 |
N: 444 | p (1-sided) | 0.0204 |
Events: 297 | ~HR at bound | 0.7885 |
Month: 36 | P(Cross) if HR=1 | 0.0250 |
P(Cross) if HR=0.68 | 0.9000 |
cat(summary(PH1sided))
One-sided group sequential design with 4 analyses, time-to-event outcome with sample size 444 and 297 events required, 90 percent power, 2.5 percent (1-sided) Type I error to detect a hazard ratio of 0.68. Enrollment and total study durations are assumed to be 12 and 36 months, respectively. Efficacy bounds derived using a Lan-DeMets O’Brien-Fleming approximation spending function with none = 1.
library(gsdmvn) # Spending function setup upar <- list(sf = gsDesign::sfLDOF, total_spend = 0.025) NPH1sided <- gs_design_ahr( enrollRates = enrollRates, failRates = failRates, ratio = 1, alpha = .025, beta = 0.1, # Information fraction not required (but available!) analysisTimes = analysisTimes, # Function to enable spending bound upper = gs_spending_bound, # Spending function and parameters used upar = list(sf = gsDesign::sfLDOF, total_spend = 0.025), # Lower bound fixed at -infinity lower = gs_b, # allows input of fixed bound # With gs_b, just enter values for bounds lpar = rep(-Inf, 4) )
Analysis | Bound | Time | N | Events | Z | Probability | AHR | theta | info | info0 |
---|---|---|---|---|---|---|---|---|---|---|
1 | Upper | 12 | 464.3 | 99.7 | 3.7670 | 0.0019 | 0.840 | 0.175 | 24.5 | 24.9 |
2 | Upper | 20 | 464.3 | 193.0 | 2.6020 | 0.3024 | 0.738 | 0.304 | 47.0 | 48.3 |
3 | Upper | 28 | 464.3 | 259.2 | 2.2209 | 0.7329 | 0.700 | 0.357 | 63.3 | 64.8 |
4 | Upper | 36 | 464.3 | 307.6 | 2.0453 | 0.9000 | 0.683 | 0.381 | 75.6 | 76.9 |
Analysis | Value | Efficacy | Futility |
---|---|---|---|
IA 1: 32% | Z | 3.7670 | -3.7670 |
N: 444 | p (1-sided) | 0.0001 | 0.0001 |
Events: 97 | ~HR at bound | 0.4636 | 2.1569 |
Month: 13 | P(Cross) if HR=1 | 0.0001 | 0.0001 |
P(Cross) if HR=0.68 | 0.0289 | 0.0000 | |
IA 2: 63% | Z | 2.6020 | -2.6020 |
N: 444 | p (1-sided) | 0.0046 | 0.0046 |
Events: 186 | ~HR at bound | 0.6828 | 1.4646 |
Month: 21 | P(Cross) if HR=1 | 0.0047 | 0.0047 |
P(Cross) if HR=0.68 | 0.4999 | 0.0000 | |
IA 3: 84% | Z | 2.2209 | -2.2209 |
N: 444 | p (1-sided) | 0.0132 | 0.0132 |
Events: 250 | ~HR at bound | 0.7549 | 1.3246 |
Month: 28 | P(Cross) if HR=1 | 0.0146 | 0.0146 |
P(Cross) if HR=0.68 | 0.7916 | 0.0000 | |
Final | Z | 2.0453 | -2.0453 |
N: 444 | p (1-sided) | 0.0204 | 0.0204 |
Events: 297 | ~HR at bound | 0.7885 | 1.2682 |
Month: 36 | P(Cross) if HR=1 | 0.0250 | 0.0250 |
P(Cross) if HR=0.68 | 0.9000 | 0.0000 |
library(gsdmvn) # Spending function and parameters for both bounds par <- list(sf = gsDesign::sfLDOF, total_spend = 0.025) NPHsymmetric <- gs_design_ahr( enrollRates = enrollRates, failRates = failRates, ratio = 1, alpha = .025, beta = 0.1, # Information fraction not required (but available!) analysisTimes = analysisTimes, # Function to enable spending bound upper = gs_spending_bound, lower = gs_spending_bound, # Spending function and parameters used upar = par, lpar = par, binding = TRUE, # set lower bound to binding h1_spending = FALSE )
Analysis | Bound | Time | N | Events | Z | Probability | AHR | theta | info | info0 |
---|---|---|---|---|---|---|---|---|---|---|
1 | Upper | 12 | 464.3 | 99.7 | 3.7670 | 0.0019 | 0.840 | 0.175 | 24.5 | 24.9 |
2 | Upper | 20 | 464.3 | 193.0 | 2.6020 | 0.3024 | 0.738 | 0.304 | 47.0 | 48.3 |
3 | Upper | 28 | 464.3 | 259.2 | 2.2209 | 0.7329 | 0.700 | 0.357 | 63.3 | 64.8 |
4 | Upper | 36 | 464.3 | 307.6 | 2.0453 | 0.9000 | 0.683 | 0.381 | 75.6 | 76.9 |
1 | Lower | 12 | 464.3 | 99.7 | −3.7670 | 0.0000 | 0.840 | 0.175 | 24.5 | 24.9 |
2 | Lower | 20 | 464.3 | 193.0 | −2.6020 | 0.0000 | 0.738 | 0.304 | 47.0 | 48.3 |
3 | Lower | 28 | 464.3 | 259.2 | −2.2209 | 0.0000 | 0.700 | 0.357 | 63.3 | 64.8 |
4 | Lower | 36 | 464.3 | 307.6 | −2.0453 | 0.0000 | 0.683 | 0.381 | 75.6 | 76.9 |
\[ \begin{align} f_1(s_k,\alpha)-f_1(s_{k-1},\alpha) &= P_0(\{Z_{k}\geq b_{k}(\alpha)\}\cap_{j=1}^{k-1}\{Z_{j}< b_{j}(\alpha)\}\\ f_2(s_k,\gamma)-f_2(s_{k-1},\gamma) &= P_\theta(\{Z_{k}< a_{k}(\gamma)\}\cap_{j=1}^{k-1}\{a_{j}(\gamma)\le Z_{j}< b_{j}(\alpha)\} \end{align} \] - Generally, sample size set so that \(a_K=b_K\)
Analysis | Value | Efficacy | Futility |
---|---|---|---|
IA 1: 32% | Z | 3.7670 | -0.2503 |
N: 476 | p (1-sided) | 0.0001 | 0.5988 |
Events: 104 | ~HR at bound | 0.4767 | 1.0505 |
Month: 13 | P(Cross) if HR=1 | 0.0001 | 0.4012 |
P(Cross) if HR=0.68 | 0.0338 | 0.0143 | |
IA 2: 63% | Z | 2.6020 | 0.8440 |
N: 476 | p (1-sided) | 0.0046 | 0.1993 |
Events: 201 | ~HR at bound | 0.6922 | 0.8875 |
Month: 21 | P(Cross) if HR=1 | 0.0047 | 0.8103 |
P(Cross) if HR=0.68 | 0.5385 | 0.0393 | |
IA 3: 84% | Z | 2.2209 | 1.5151 |
N: 476 | p (1-sided) | 0.0132 | 0.0649 |
Events: 269 | ~HR at bound | 0.7626 | 0.8312 |
Month: 28 | P(Cross) if HR=1 | 0.0144 | 0.9414 |
P(Cross) if HR=0.68 | 0.8185 | 0.0687 | |
Final | Z | 2.0453 | 2.0453 |
N: 476 | p (1-sided) | 0.0204 | 0.0204 |
Events: 319 | ~HR at bound | 0.7953 | 0.7953 |
Month: 36 | P(Cross) if HR=1 | 0.0225 | 0.9775 |
P(Cross) if HR=0.68 | 0.9000 | 0.1000 |
library(gsdmvn) # Spending function setup upar <- list(sf = gsDesign::sfLDOF, total_spend = 0.025) lpar <- list(sf = gsDesign::sfHSD, total_spend = .1, param = -2) NPHasymmetric <- gs_design_ahr( enrollRates = enrollRates, failRates = failRates, ratio = 1, alpha = .025, beta = 0.1, # Information fraction not required (but available!) analysisTimes = analysisTimes, # Function to enable spending bound upper = gs_spending_bound, lower = gs_spending_bound, # Spending function and parameters used upar = upar, lpar = lpar )
Analysis | Bound | Time | N | Events | Z | Probability | AHR | theta | info | info0 |
---|---|---|---|---|---|---|---|---|---|---|
1 | Upper | 12 | 501.8 | 107.8 | 3.7670 | 0.0021 | 0.840 | 0.175 | 26.5 | 26.9 |
2 | Upper | 20 | 501.8 | 208.6 | 2.6020 | 0.3318 | 0.738 | 0.304 | 50.9 | 52.2 |
3 | Upper | 28 | 501.8 | 280.1 | 2.2209 | 0.7660 | 0.700 | 0.357 | 68.5 | 70.0 |
4 | Upper | 36 | 501.8 | 332.5 | 2.0453 | 0.9000 | 0.683 | 0.381 | 81.7 | 83.1 |
1 | Lower | 12 | 501.8 | 107.8 | −1.2899 | 0.0143 | 0.840 | 0.175 | 26.5 | 26.9 |
2 | Lower | 20 | 501.8 | 208.6 | 0.3054 | 0.0387 | 0.738 | 0.304 | 50.9 | 52.2 |
3 | Lower | 28 | 501.8 | 280.1 | 1.3340 | 0.0681 | 0.700 | 0.357 | 68.5 | 70.0 |
4 | Lower | 36 | 501.8 | 332.5 | 2.0453 | 0.1000 | 0.683 | 0.381 | 81.7 | 83.1 |
gsDesign::gsSurv()
# Spending function setup upar <- list(sf = gsDesign::sfLDOF, total_spend = 0.025) lpar <- c(qnorm(.05), rep(-Inf, 3)) NPHskip <- gs_design_ahr( enrollRates = enrollRates, failRates = failRates, ratio = 1, alpha = .025, beta = 0.1, # Information fraction not required (but available!) analysisTimes = analysisTimes, # Upper spending bound upper = gs_spending_bound, upar = upar, # Skip first efficacy analysis test_upper = c(FALSE, TRUE, TRUE, TRUE), # Spending function and parameters used lower = gs_b, lpar = lpar )
Analysis | Bound | Time | N | Events | Z | Probability | AHR | theta | info | info0 |
---|---|---|---|---|---|---|---|---|---|---|
1 | Lower | 12 | 467.6 | 100.4 | −1.6449 | 0.0060 | 0.840 | 0.175 | 24.7 | 25.1 |
2 | Upper | 20 | 467.6 | 194.4 | 2.5999 | 0.3057 | 0.738 | 0.304 | 47.4 | 48.6 |
3 | Upper | 28 | 467.6 | 261.0 | 2.2207 | 0.7359 | 0.700 | 0.357 | 63.8 | 65.3 |
4 | Upper | 36 | 467.6 | 309.8 | 2.0452 | 0.9000 | 0.683 | 0.381 | 76.1 | 77.5 |
All probabilities are under null hypothesis
We consider other alternative tests for group sequential design.
npsurvSS
gsdmvn
gsdmvn
For simplicity, we made a few key assumptions.
The fixed design part largely follows the concept described in Yung and Liu (2019).
We considered a 1-sided test with type I error at \(\alpha=0.025\) and \(1-\beta=80\%\) power.
z_alpha <- abs(qnorm(0.025)) z_alpha
## [1] 1.959964
z_beta <- abs(qnorm(0.2)) z_beta
## [1] 0.8416212
By assuming local alternative, we have
\[\sigma_0^2 \approx \sigma_1^2 = \sigma^2\] In this simplified case, the sample size can be calculated as
\[ n = \frac{4 (z_{\alpha}+z_{\beta})^{2}}{\theta^2} \]
Similar to the fixed design, we can define the test statistics for weighted logrank test using counting process formula
\[ Z_k=\sqrt{\frac{n_{0}+n_{1}}{n_{0}n_{1}}}\int_{0}^{t_k}w(t)\frac{\overline{Y}_{0}(t)\overline{Y}_{1}(t)}{\overline{Y}_{0}(t)+\overline{Y}_{0}(t)}\left\{ \frac{d\overline{N}_{1}(t)}{\overline{Y}_{1}(t)}-\frac{d\overline{N}_{0}(t)}{\overline{Y}_{0}(t)}\right\} \]
Note, the only difference is that the test statistics fixed analysis up to \(t_k\) at \(k\)-th interim analysis
Email: keaven_anderson@merck.com
[1] Scharfstein, D. O., Tsiatis, A. A. and Robins, J. M. (1997). Semiparametric efficiency and its implication on the design and analysis of group-sequential studies. Journal of the American Statistical Association 92 1342–50.
[2] Jennison, C. and Turnbull, B. W. (2000). Group sequential methods with applications to clinical trials. Chapman; Hall/CRC, Boca Raton, FL.
[3] Lachin, J. M. and Foulkes, M. A. (1986). Evaluation of sample size and power for analyses of survival with allowance for nonuniform patient entry, losses to follow-up, noncompliance, and stratification. Biometrics 42 507–19.
[4] Schoenfeld, D. (1981). The asymptotic properties of nonparametric tests for comparing survival distributions. Biometrika 68 316–9.
[5] Mukhopadhyay, P., Huang, W., Metcalfe, P., Öhrn, F., Jenner, M. and Stone, A. (2020). Statistical and practical considerations in designing of immuno-oncology trials. Journal of Biopharmaceutical Statistics 1–7.
[6] Yung, G. and Liu, Y. (2020). Sample size and power for the weighted log-rank test and kaplan-meier based tests with allowance for nonproportional hazards. Biometrics 76 939–50.
[7] Karrison, T. G. and others. (2016). Versatile tests for comparing survival curves based on weighted log-rank statistics. Stata Journal 16 678–90.
[8] Kim, K. and Tsiatis, A. A. (1990). Study duration for clinical trials with survival response and early stopping rule. Biometrics 81–92.
[9] Lan, K. K. G. and DeMets, D. L. (1983). Discrete sequential boundaries for clinical trials. Biometrika 70 659–63.
[10] Haybittle, J. (1971). Repeated assessment of results in clinical trials of cancer treatment. The British Journal of Radiology 44 793–7.
[11] Peto, R., Pike, Mc., Armitage, P., Breslow, N. E., Cox, D., Howard, S., Mantel, N., McPherson, K., Peto, J. and Smith, P. (1977). Design and analysis of randomized clinical trials requiring prolonged observation of each patient. II. Analysis and examples. British Journal of Cancer 35 1–39.
[12] Wang, S. K. and Tsiatis, A. A. (1987). Approximately optimal one-parameter boundaries for group sequential trials. Biometrics 193–9.
[13] Pocock, S. J. (1977). Group sequential methods in the design and analysis of clinical trials. Biometrika 64 191–9.
[14] O’Brien, P. C. and Fleming, T. R. (1979). A multiple testing procedure for clinical trials. Biometrics 549–56.
[15] Slud, E. and Wei, L. (1982). Two-sample repeated significance tests based on the modified wilcoxon statistic. Journal of the American Statistical Association 77 862–8.
[16] Fleming, T. R., Harrington, D. P. and O’Brien, P. C. (1984). Designs for group sequential tests. Controlled Clinical Trials 5 348–61.
[17] Lan, K. K. G. and DeMets, D. L. (1989). Group sequential procedures: Calendar versus information time. Statistics in Medicine 8 1191–8.
[18] Gandhi, L., Rodrı́guez-Abreu, D., Gadgeel, S., Esteban, E., Felip, E., De Angelis, F., Domine, M., Clingan, P., Hochmair, M. J., Powell, S. F. and others. (2018). Pembrolizumab plus chemotherapy in metastatic non–small-cell lung cancer. New England Journal of Medicine 378 2078–92.
[19] Maurer, W. and Bretz, F. (2013). Multiple testing in group sequential trials using graphical approaches. Statistics in Biopharmaceutical Research 5 311–20.
[20] Downs, J. R., Beere, P. A., Whitney, E., Clearfield, M., Weis, S., Rochen, J., Stein, E. A., Shapiro, D. R., Langendorfer, A. and Gotto Jr, A. M. (1997). Design & rationale of the air force/texas coronary atherosclerosis prevention study (AFCAPS/TexCAPS). The American Journal of Cardiology 80 287–93.
[21] Downs, J. R., Clearfield, M., Weis, S., Whitney, E., Shapiro, D. R., Beere, P. A., Langendorfer, A., Stein, E. A., Kruyer, W., Gotto Jr, A. M. and others. (1998). Primary prevention of acute coronary events with lovastatin in men and women with average cholesterol levels: Results of AFCAPS/TexCAPS. Journal of the American Medical Association 279 1615–22.
[22] Hwang, I. K., Shih, W. J. and De Cani, J. S. (1990). Group sequential designs using a family of type i error probability spending functions. Statistics in Medicine 9 1439–45.
[23] White, W. B., Cannon, C. P., Heller, S. R., Nissen, S. E., Bergenstal, R. M., Bakris, G. L., Perez, A. T., Fleck, P. R., Mehta, C. R., Kupfer, S. and others. (2013). Alogliptin after acute coronary syndrome in patients with type 2 diabetes. New England Journal of Medicine 369 1327–35.
[24] White, W. B., Bakris, G. L., Bergenstal, R. M., Cannon, C. P., Cushman, W. C., Fleck, P., Heller, S., Mehta, C., Nissen, S. E., Perez, A. and others. (2011). EXamination of cArdiovascular outcoMes with alogliptIN versus standard of carE in patients with type 2 diabetes mellitus and acute coronary syndrome (EXAMINE): A cardiovascular safety study of the dipeptidyl peptidase 4 inhibitor alogliptin in patients with type 2 diabetes with acute coronary syndrome. American Heart Journal 162 620–6.
[25] Cohen, E. E., Soulières, D., Le Tourneau, C., Dinis, J., Licitra, L., Ahn, M.-J., Soria, A., Machiels, J.-P., Mach, N., Mehra, R. and others. (2019). Pembrolizumab versus methotrexate, docetaxel, or cetuximab for recurrent or metastatic head-and-neck squamous cell carcinoma (KEYNOTE-040): A randomised, open-label, phase 3 study. The Lancet 393 156–67.
[26] Miettinen, T. A., Pyörälä, K., Olsson, A. G., Musliner, T. A., Cook, T. J., Faergeman, O., Berg, K., Pedersen, T., Kjekshus, J. and Group, for the S. S. S. (1997). Cholesterol-lowering therapy in women and elderly patients with myocardial infarction or angina pectoris: Findings from the scandinavian simvastatin survival study (4S). Circulation 96 4211–8.
[27] Shitara, K., Özgüroğlu, M., Bang, Y.-J., Di Bartolomeo, M., Mandalà, M., Ryu, M.-H., Fornaro, L., Olesiński, T., Caglevic, C., Chung, H. C. and others. (2018). Pembrolizumab versus paclitaxel for previously treated, advanced gastric or gastro-oesophageal junction cancer (KEYNOTE-061): A randomised, open-label, controlled, phase 3 trial. The Lancet 392 123–33.
[28] Hernán, M. A. (2010). The hazards of hazard ratios. Epidemiology 21 13.
[29] Tsiatis, A. A. (1982). Repeated significance testing for a general class of statistics use in censored survival analysis. Journal of the American Statistical Association 77 855–61.
[30] Yung, G. and Liu, Y. (2019). Sample size and power for the weighted log-rank test and kaplan-meier based tests with allowance for nonproportional hazards. Biometrics.