5 Time-to-event sample size derivation

We extend the Lachin and Foulkes (Lachin and Foulkes 1986) method to cases where the null hypothesis does not reflect equality. This includes non-inferiority scenarios. For vaccines or other prevention studies this also includes super-superiority. Denote the null hypothesis failure rates for control and experimental treatment groups as \(\lambda_{00}\) and \(\lambda_{01}\), respectively. Denote the alternate hypothesis rates as \(\lambda_{10}\) and \(\lambda_{11}\). Further, denote the alternate hypothesis hazard ratio \(h_1=\lambda_{11}/\lambda_{10}\), and the null hypothesis hazard ratio \(h_0=\lambda_{01}/\lambda_{00}\). We let censoring rates be specific to the control (\(\eta_0\)) and experimental (\(\eta_1\)) groups; these values are only implicit in the equations below. Further, we let \(\xi\) denote the proportion of subjects randomized to the experimental treatment group. Finally, we let \(\eta\) represent an exponential dropout rate independent and the time to dropout is independent of the time until an event. Lachin and Foulkes assumed a null hypothesis with no difference between failure rates in the control and experimental rates and test for superiority. That is, \(\lambda_{00}=\lambda_{01}\) (\(h_0=1\)) and \(\lambda_{10}<\lambda_{01}\) (\(h_1<1\)). They set event rates under the null hypothesis so that the the weighted average event rate is the same under the null and alternate hypotheses:

\[ \lambda_{00}=\lambda_{01}=\bar\lambda=(1-\xi)\lambda_{10}+\xi\lambda_{11}. \tag{5.1}\]

The apparent intent of this is to equalize the variance for the log hazard ratio under null and alternative hypotheses; this will not be exactly the case. We let \(\delta\) represent an indicator that an uncensored event is observed for a patient in a specified treatment group given enrollment, event rate, and dropout rate assumptions. The Lachin and Foulkes power equation for proportional hazards translates in our notation to:

\[ \begin{split} \sqrt{N}\ln(h_1) &= Z_\alpha \sqrt{E\left\{\delta|\bar{\lambda},\eta\right\}^{-1}(\xi^{-1}+(1-\xi)^{-1})} \\ &+Z_\beta\sqrt{ E\left\{\delta|\lambda_1,\eta\right\}^{-1}\xi^{-1}+ E\left\{\delta|\lambda_0,\eta\right\}^{-1}(1-\xi)^{-1}} \end{split} \tag{5.2}\]

Lachin and Foulkes did not cover any cases other than equality under the null hypothesis; i.e., the assumed \(h_0 \neq 1\) (i.e., \(\lambda_{00} \neq \lambda_{01}\)). Equation 5.2 generalizes in this case to

\[ \begin{split} \sqrt{N}\ln\left(\frac{h_1}{h_0}\right) &= Z_\alpha \sqrt{E\left\{\delta|\lambda_{01},\eta_1\right\}^{-1}\xi^{-1}+ E\left\{\delta|\lambda_{00},\eta_0\right\}^{-1}(1-\xi)^{-1}} \\ &+Z_\beta\sqrt{ E\left\{\delta|\lambda_{11},\eta_1\right\}^{-1}\xi^{-1}+ E\left\{\delta|\lambda_{10},\eta_0\right\}^{-1}(1-\xi)^{-1}} \end{split} \tag{5.3}\]

While we have defined null hypothesis assumptions \(\lambda_{00}\) and \(\lambda_{01}\) for an exponential distribution, the gsDesign functions nSurv() and gsSurv() extend the approach above in an analogous fashion to piecewise exponential failure and dropout rates with a common proportional hazard ratio across piecewise intervals.

For a fixed sample size with default arguments, we have:

library(gsDesign)

fixed_design <- nSurv()
fixed_design
#> Fixed design, two-arm trial with time-to-event
#> outcome (Lachin and Foulkes, 1986).
#> Solving for:  Accrual rate 
#> Hazard ratio                  H1/H0=0.6/1
#> Study duration:                   T=18
#> Accrual duration:                   12
#> Min. end-of-study follow-up: minfup=6
#> Expected events (total, H1):        160.4832
#> Expected sample size (total):       250.4492
#> Accrual rates:
#>      Stratum 1
#> 0-12   20.8708
#> Control event rates (H1):
#>       Stratum 1
#> 0-Inf    0.1155
#> Censoring rates:
#>       Stratum 1
#> 0-Inf         0
#> Power:                 100*(1-beta)=90%
#> Type I error (1-sided):   100*alpha=2.5%
#> Equal randomization:          ratio=1

This intentionally does not round up, so the user needs to round the number of events and sample size up. For gsSurv(), this rounding can be done automatically:

gs_design <- gsSurv() |> toInteger()
gs_design |> gsBoundSummary()
#>     Analysis              Value Efficacy Futility
#>    IA 1: 33%                  Z   3.0139  -0.2458
#>       N: 190        p (1-sided)   0.0013   0.5971
#>   Events: 57       ~HR at bound   0.4500   1.0673
#>     Month: 8   P(Cross) if HR=1   0.0013   0.4029
#>              P(Cross) if HR=0.6   0.1396   0.0147
#>    IA 2: 66%                  Z   2.5528   0.9301
#>       N: 268        p (1-sided)   0.0053   0.1762
#>  Events: 114       ~HR at bound   0.6199   0.8401
#>    Month: 13   P(Cross) if HR=1   0.0061   0.8320
#>              P(Cross) if HR=0.6   0.5769   0.0433
#>        Final                  Z   1.9988   1.9988
#>       N: 268        p (1-sided)   0.0228   0.0228
#>  Events: 172       ~HR at bound   0.7373   0.7373
#>    Month: 18   P(Cross) if HR=1   0.0233   0.9767
#>              P(Cross) if HR=0.6   0.9005   0.0995

A textual summary is also available:

summary(gs_design)

#> Asymmetric two-sided group sequential design with
#> non-binding futility bound, 3 analyses, time-to-event
#> outcome with sample size 268 and 172 events required, 90
#> percent power, 2.5 percent (1-sided) Type I error to detect
#> a hazard ratio of 0.6. Enrollment and total study durations
#> are assumed to be 12 and 18 months, respectively. Efficacy
#> bounds derived using a Hwang-Shih-DeCani spending function
#> with gamma = -4. Futility bounds derived using a
#> Hwang-Shih-DeCani spending function with gamma = -2.

All the assumptions laid out in this text can be changed as documented in the help file.