Skip to contents

Overview

This vignette provides a thorough analysis of the differences between the SAS SEQDESIGN procedure and the R package gsDesign for designing group sequential clinical trials with time-to-event endpoints.

Starting point: SAS PROC SEQDESIGN Example 109.14

Consider the example described in the SAS Documentation: Computing Sample Size for Survival Data with Uniform Accrual. The SAS code is as follows:

proc seqdesign;
    ErrorSpend: design nstages=4 method=errfuncobf;
    samplesize model=twosamplesurvival(
        nullhazard = 0.03466
        hazard     = 0.01733
        accrual    = uniform
        accrate    = 15
        acctime    = 18
        ceiling    = time);
run;

SAS assumptions summary:

  • 4-stage group sequential design
  • Lan-DeMets O’Brien-Fleming error spending function
  • Two-sided symmetric test, total alpha = 0.05
  • 90% power (beta = 0.10)
  • Equally spaced information fractions: 0.25, 0.50, 0.75, 1.00
  • Schoenfeld formula for required number of events
  • Sample size formula: \(N = \text{ACCRATE} \times \text{ACCTIME} = 15 \times 18 = 270\) (fixed)
  • Study duration \(T\) solved to achieve required events (CEILING=TIME)
  • Hazard ratio \(HR = 0.01733 / 0.03466 = 0.5\)

Based on these assumptions, a user naively attempting to replicate this in R with gsDesign may attempt code like the following:

des <- gsSurv(
  k         = 4,
  test.type = 2,
  alpha     = 0.05,
  beta      = 0.10,
  sfu       = sfLDOF, # Lan-DeMets O'Brien--Fleming
  timing    = c(.25, .50, .75, 1), # Equal information fractions
  lambdaC   = 0.03466,
  hr        = 0.5,
  eta       = 0, # No dropout
  gamma     = 15, # Uniform accrual rate of 15/month
  R         = 18, # Accrual duration (months)
  T         = 25,
  ratio     = 1 # 1:1 randomization
)

des
#> Group sequential design (method=LachinFoulkes; k=4 analyses; Two-sided symmetric)
#> HR=0.500 vs HR0=1.000 | alpha=0.050 (sided=2) | power=90.0%
#> N=235.2 subjects | D=74.9 events | T=25.0 study duration | accrual=19.0 Accrual duration | minfup=6.0 minimum follow-up | ratio=1 randomization ratio (experimental/control)
#> 
#> Spending functions:
#>   Bounds derived using a  Lan-DeMets O'Brien-Fleming approximation spending function (no parameters).
#> 
#> Analysis summary:
#> Method: LachinFoulkes 
#>    Analysis              Value Efficacy Futility
#>   IA 1: 25%                  Z   3.7496  -3.7496
#>      N: 142        p (1-sided)   0.0001   0.0001
#>  Events: 19       ~HR at bound   0.1768   5.6562
#>   Month: 11   P(Cross) if HR=1   0.0001   0.0001
#>             P(Cross) if HR=0.5   0.0117   0.0000
#>   IA 2: 50%                  Z   2.5399  -2.5399
#>      N: 204        p (1-sided)   0.0055   0.0055
#>  Events: 38       ~HR at bound   0.4361   2.2933
#>   Month: 16   P(Cross) if HR=1   0.0056   0.0056
#>             P(Cross) if HR=0.5   0.3292   0.0000
#>   IA 3: 75%                  Z   2.0161  -2.0161
#>      N: 236        p (1-sided)   0.0219   0.0219
#>  Events: 57       ~HR at bound   0.5840   1.7124
#>   Month: 21   P(Cross) if HR=1   0.0236   0.0236
#>             P(Cross) if HR=0.5   0.7148   0.0000
#>       Final                  Z   1.7202  -1.7202
#>      N: 236        p (1-sided)   0.0427   0.0427
#>  Events: 75       ~HR at bound   0.6720   1.4881
#>   Month: 25   P(Cross) if HR=1   0.0500   0.0500
#>             P(Cross) if HR=0.5   0.9000   0.0000
#> 
#> Key inputs (names preserved):
#>                                desc    item  value input
#>                     Accrual rate(s)   gamma 12.378    15
#>            Accrual rate duration(s)       R     19    18
#>              Control hazard rate(s) lambdaC  0.035 0.035
#>             Control dropout rate(s)     eta      0     0
#>        Experimental dropout rate(s)    etaE      0  etaE
#>  Event and dropout rate duration(s)       S   NULL     S

Note that the output differs in multiple ways, including that the gsDesign output indicates that fewer events will be required to achieve the desired level of power compared to the SAS output. Such differences do not cast doubt on the fidelity of either software, but rather highlight the underlying differences in assumptions and defaults settings in the design of each software tool.

Side-by-side comparison table

events_sas <- c(22.27, 44.54, 66.81, 89.08)
events_a2 <- round(des$n.I, 2)

z_sas <- c(4.333, 2.963, 2.359, 2.014)
z_a2 <- round(des$upper$bound, 4)

N2a <- sum(des$eNC[4, ]) + sum(des$eNE[4, ])

knitr::kable(
  data.frame(
    Analysis = 1:4,
    Events_SAS = events_sas,
    Events_gsDesign = events_a2,
    Z_SAS = z_sas,
    Z_gsDesign = z_a2
  ),
  caption = "Naive side-by-side comparison of events and Z-boundaries at each look."
)
Naive side-by-side comparison of events and Z-boundaries at each look.
Analysis Events_SAS Events_gsDesign Z_SAS Z_gsDesign
1 22.27 18.73 4.333 3.7496
2 44.54 37.46 2.963 2.5399
3 66.81 56.19 2.359 2.0161
4 89.08 74.92 2.014 1.7202

The rest of this vignette works to resolve the apparent discrepancies in results, using the above example as a reference case. We identify the key assumptions in each system, explain why naive parameter translation produces different results, and show how to obtain matching output. Those who would like to know more details about the design of the time-to-event functionality in gsDesign should consult vignette("SurvivalOverview").

Key differences: SAS SEQDESIGN vs. R gsDesign

There are three fundamental differences that affect the output from the example above.

1. Events formula

  • SAS: Schoenfeld (1981). Uses only the null-hypothesis variance.
  • gsDesign: Lachin and Foulkes (1986) by default. Uses both null and alternative hypothesis variances. L-F is slightly more conservative, giving ~1-2% more events than Schoenfeld for the same parameters.

2. Alpha handling in gsSurv()

O’Brien-Fleming spending makes things a bit ambiguous due to the one-sided versus two-sided spending.

  • SAS: Uses a symmetric two-sided test with alpha = 0.05. Per-tail, \(z_{\alpha/2} = z_{0.025} = 1.96\). Boundaries and event calculations both use 0.025 per tail.
  • gsSurv(): The test.type = 2 with alpha = 0.05 creates the same symmetric boundaries, but computes a different group sequential design inflation factor because it counts power from both tails. SAS computes the inflation factor using one-sided power only.

A possible strategy for aligning gsSurv() and SAS calculations in this case is to use test.type = 1 with alpha = 0.025 to match SAS per-tail logic, as we will see momentarily.

3. What varies to achieve power

  • SAS: Fixes the accrual rate and accrual time, solves for study duration.
  • gsSurv(): Fixes study duration (T) and varies the accrual rate (gamma) and accrual time (R).

Aligning the two approaches

gsSurv() with aligned parameters

Let’s start our work with gsDesign by defining parameters to match SAS Example 109.14:

k <- 4
alpha <- 0.05 # Two-sided total alpha (SAS convention)
beta <- 0.10 # 1 - power = 0.10 -> 90% power
lambdaC <- 0.03466 # Control hazard rate
lambdaE <- 0.01733 # Experimental hazard rate
HR <- lambdaE / lambdaC # = 0.5
timing <- c(0.25, 0.50, 0.75, 1.00) # Equally spaced information fractions
accrate <- 15 # Uniform accrual rate (subjects per time unit)
R <- 18 # Accrual duration (time units)
N <- accrate * R

Instead of starting with a call to gsDesign::gsDesign(), we begin with gsDesign::gsSurv(). The gsSurv() function combines nSurv() with gsDesign() (group sequential boundaries) in one call. It is the standard gsDesign function for designing time-to-event trials.

Two key differences from SAS remain even with the alpha parameter aligned with SAS:

  • gsSurv() defaults to Lachin-Foulkes sample size; Lachin-Foulkes gives slightly more events than Schoenfeld.
  • gsSurv() with \(T\) specified varies the accrual rate to achieve power, rather than fixing \(N\) and solving for \(T\) as SAS does. To address this, we set the T parameter equal to the study duration time calculated by SAS.
des_2 <- gsSurv(
  k = 4,
  test.type = 1, # One-sided (matches SAS per-tail)
  alpha = 0.025, # alpha/2 per tail
  beta = 0.10,
  sfu = sfLDOF,
  timing = c(.25, .50, .75, 1),
  lambdaC = 0.03466,
  hr = 0.5,
  eta = 0, # Assume no dropout
  gamma = 15, # Input rate; gsSurv will **adjust** this
  R = 18,
  T = 25.13323, # This is the study duration computed by SAS
  ratio = 1,
  method = "Schoenfeld" # Use Schoenfeld bounds to match SAS
)
des_2
#> Group sequential design (method=Schoenfeld; k=4 analyses; One-sided (efficacy only))
#> HR=0.500 vs HR0=1.000 | alpha=0.025 (sided=1) | power=90.0%
#> N=278.7 subjects | D=89.1 events | T=25.1 study duration | accrual=19.1 Accrual duration | minfup=6.0 minimum follow-up | ratio=1 randomization ratio (experimental/control)
#> 
#> Spending functions:
#>   Efficacy bounds derived using a Lan-DeMets O'Brien-Fleming approximation spending function (no parameters).
#> 
#> Analysis summary:
#> Method: Schoenfeld 
#>    Analysis              Value Efficacy
#>   IA 1: 25%                  Z   4.3326
#>      N: 168        p (1-sided)   0.0000
#>  Events: 23       ~HR at bound   0.1594
#>   Month: 11   P(Cross) if HR=1   0.0000
#>             P(Cross) if HR=0.5   0.0035
#>   IA 2: 50%                  Z   2.9631
#>      N: 242        p (1-sided)   0.0015
#>  Events: 45       ~HR at bound   0.4115
#>   Month: 17   P(Cross) if HR=1   0.0015
#>             P(Cross) if HR=0.5   0.2579
#>   IA 3: 75%                  Z   2.3590
#>      N: 280        p (1-sided)   0.0092
#>  Events: 67       ~HR at bound   0.5615
#>   Month: 21   P(Cross) if HR=1   0.0096
#>             P(Cross) if HR=0.5   0.6853
#>       Final                  Z   2.0141
#>      N: 280        p (1-sided)   0.0220
#>  Events: 90       ~HR at bound   0.6526
#>   Month: 25   P(Cross) if HR=1   0.0250
#>             P(Cross) if HR=0.5   0.9000
#> 
#> Key inputs (names preserved):
#>                                desc    item  value input
#>                     Accrual rate(s)   gamma 14.568    15
#>            Accrual rate duration(s)       R 19.133    18
#>              Control hazard rate(s) lambdaC  0.035 0.035
#>             Control dropout rate(s)     eta      0     0
#>        Experimental dropout rate(s)    etaE      0  etaE
#>  Event and dropout rate duration(s)       S   NULL     S
gsBoundSummary(des_2)
#> Method: Schoenfeld 
#>    Analysis              Value Efficacy
#>   IA 1: 25%                  Z   4.3326
#>      N: 168        p (1-sided)   0.0000
#>  Events: 23       ~HR at bound   0.1594
#>   Month: 11   P(Cross) if HR=1   0.0000
#>             P(Cross) if HR=0.5   0.0035
#>   IA 2: 50%                  Z   2.9631
#>      N: 242        p (1-sided)   0.0015
#>  Events: 45       ~HR at bound   0.4115
#>   Month: 17   P(Cross) if HR=1   0.0015
#>             P(Cross) if HR=0.5   0.2579
#>   IA 3: 75%                  Z   2.3590
#>      N: 280        p (1-sided)   0.0092
#>  Events: 67       ~HR at bound   0.5615
#>   Month: 21   P(Cross) if HR=1   0.0096
#>             P(Cross) if HR=0.5   0.6853
#>       Final                  Z   2.0141
#>      N: 280        p (1-sided)   0.0220
#>  Events: 90       ~HR at bound   0.6526
#>   Month: 25   P(Cross) if HR=1   0.0250
#>             P(Cross) if HR=0.5   0.9000

Observations:

Using gsSurv() with parameters that align with SAS settings, we notice that

  • \(Z\)-boundaries: 4.333, 2.963, 2.359, 2.014 -> matches SAS
  • Events: 23, 45, 67, 90 -> matches SAS; gsDesign is just taking the ceiling() of the SAS output.
  • Total N: 280 (slightly more than SAS’s fixed 270). This is a result of the fact that gsSurv() is varying R and gamma (corresponding conceptually to the parameters acctime and accrate in PROC SEQDESIGN) in order to maintain power.

Bringing it full circle with SAS PROC SEQDESIGN

Observe the final values for R and gamma calculated by gsSurv() in the “Key inputs” table output (see above) by des_2: gamma = 14.568, and R = 19.133 (slightly different than our input 15 and 18). If we substitute in these values for the accrate and acctime parameters in SAS, we will see that the SAS calculations produce a final sample size of N = 278.7, aligning with the gsSurv() sample size up.

proc seqdesign;
    ErrorSpend: design nstages=4 method=errfuncobf;
    samplesize model=twosamplesurvival(
        nullhazard = 0.03466
        hazard     = 0.01733
        accrual    = uniform
        accrate    = 14.568
        acctime    = 19.133
        ceiling    = time);
run;

References

Lachin, John M., and Mary A. Foulkes. 1986. “Evaluation of Sample Size and Power for Analyses of Survival with Allowance for Nonuniform Patient Entry, Losses to Follow-up, Noncompliance, and Stratification.” Biometrics 42: 507–19.
Schoenfeld, David. 1981. “The Asymptotic Properties of Nonparametric Tests for Comparing Survival Distributions.” Biometrika 68 (1): 316–19.