
Reproducing PROC SEQDESIGN survival designs in gsDesign
Source:vignettes/SeqDesignSurvival.Rmd
SeqDesignSurvival.RmdOverview
This vignette provides a thorough analysis of the differences between the SAS SEQDESIGN procedure and the R package gsDesign for designing group sequential clinical trials with time-to-event endpoints.
Starting point: SAS PROC SEQDESIGN Example 109.14
Consider the example described in the SAS Documentation: Computing Sample Size for Survival Data with Uniform Accrual. The SAS code is as follows:
proc seqdesign;
ErrorSpend: design nstages=4 method=errfuncobf;
samplesize model=twosamplesurvival(
nullhazard = 0.03466
hazard = 0.01733
accrual = uniform
accrate = 15
acctime = 18
ceiling = time);
run;
SAS assumptions summary:
- 4-stage group sequential design
- Lan-DeMets O’Brien-Fleming error spending function
- Two-sided symmetric test, total alpha = 0.05
- 90% power (beta = 0.10)
- Equally spaced information fractions: 0.25, 0.50, 0.75, 1.00
- Schoenfeld formula for required number of events
- Sample size formula: \(N = \text{ACCRATE} \times \text{ACCTIME} = 15 \times 18 = 270\) (fixed)
- Study duration \(T\) solved to achieve required events (CEILING=TIME)
- Hazard ratio \(HR = 0.01733 / 0.03466 = 0.5\)
Based on these assumptions, a user naively attempting to replicate this in R with gsDesign may attempt code like the following:
des <- gsSurv(
k = 4,
test.type = 2,
alpha = 0.05,
beta = 0.10,
sfu = sfLDOF, # Lan-DeMets O'Brien--Fleming
timing = c(.25, .50, .75, 1), # Equal information fractions
lambdaC = 0.03466,
hr = 0.5,
eta = 0, # No dropout
gamma = 15, # Uniform accrual rate of 15/month
R = 18, # Accrual duration (months)
T = 25,
ratio = 1 # 1:1 randomization
)
des
#> Group sequential design (method=LachinFoulkes; k=4 analyses; Two-sided symmetric)
#> HR=0.500 vs HR0=1.000 | alpha=0.050 (sided=2) | power=90.0%
#> N=235.2 subjects | D=74.9 events | T=25.0 study duration | accrual=19.0 Accrual duration | minfup=6.0 minimum follow-up | ratio=1 randomization ratio (experimental/control)
#>
#> Spending functions:
#> Bounds derived using a Lan-DeMets O'Brien-Fleming approximation spending function (no parameters).
#>
#> Analysis summary:
#> Method: LachinFoulkes
#> Analysis Value Efficacy Futility
#> IA 1: 25% Z 3.7496 -3.7496
#> N: 142 p (1-sided) 0.0001 0.0001
#> Events: 19 ~HR at bound 0.1768 5.6562
#> Month: 11 P(Cross) if HR=1 0.0001 0.0001
#> P(Cross) if HR=0.5 0.0117 0.0000
#> IA 2: 50% Z 2.5399 -2.5399
#> N: 204 p (1-sided) 0.0055 0.0055
#> Events: 38 ~HR at bound 0.4361 2.2933
#> Month: 16 P(Cross) if HR=1 0.0056 0.0056
#> P(Cross) if HR=0.5 0.3292 0.0000
#> IA 3: 75% Z 2.0161 -2.0161
#> N: 236 p (1-sided) 0.0219 0.0219
#> Events: 57 ~HR at bound 0.5840 1.7124
#> Month: 21 P(Cross) if HR=1 0.0236 0.0236
#> P(Cross) if HR=0.5 0.7148 0.0000
#> Final Z 1.7202 -1.7202
#> N: 236 p (1-sided) 0.0427 0.0427
#> Events: 75 ~HR at bound 0.6720 1.4881
#> Month: 25 P(Cross) if HR=1 0.0500 0.0500
#> P(Cross) if HR=0.5 0.9000 0.0000
#>
#> Key inputs (names preserved):
#> desc item value input
#> Accrual rate(s) gamma 12.378 15
#> Accrual rate duration(s) R 19 18
#> Control hazard rate(s) lambdaC 0.035 0.035
#> Control dropout rate(s) eta 0 0
#> Experimental dropout rate(s) etaE 0 etaE
#> Event and dropout rate duration(s) S NULL SNote that the output differs in multiple ways, including that the gsDesign output indicates that fewer events will be required to achieve the desired level of power compared to the SAS output. Such differences do not cast doubt on the fidelity of either software, but rather highlight the underlying differences in assumptions and defaults settings in the design of each software tool.
Side-by-side comparison table
events_sas <- c(22.27, 44.54, 66.81, 89.08)
events_a2 <- round(des$n.I, 2)
z_sas <- c(4.333, 2.963, 2.359, 2.014)
z_a2 <- round(des$upper$bound, 4)
N2a <- sum(des$eNC[4, ]) + sum(des$eNE[4, ])
knitr::kable(
data.frame(
Analysis = 1:4,
Events_SAS = events_sas,
Events_gsDesign = events_a2,
Z_SAS = z_sas,
Z_gsDesign = z_a2
),
caption = "Naive side-by-side comparison of events and Z-boundaries at each look."
)| Analysis | Events_SAS | Events_gsDesign | Z_SAS | Z_gsDesign |
|---|---|---|---|---|
| 1 | 22.27 | 18.73 | 4.333 | 3.7496 |
| 2 | 44.54 | 37.46 | 2.963 | 2.5399 |
| 3 | 66.81 | 56.19 | 2.359 | 2.0161 |
| 4 | 89.08 | 74.92 | 2.014 | 1.7202 |
The rest of this vignette works to resolve the apparent discrepancies
in results, using the above example as a reference case. We identify the
key assumptions in each system, explain why naive parameter translation
produces different results, and show how to obtain matching output.
Those who would like to know more details about the design of the
time-to-event functionality in gsDesign should consult
vignette("SurvivalOverview").
Key differences: SAS SEQDESIGN vs. R gsDesign
There are three fundamental differences that affect the output from the example above.
1. Events formula
- SAS: Schoenfeld (1981). Uses only the null-hypothesis variance.
- gsDesign: Lachin and Foulkes (1986) by default. Uses both null and alternative hypothesis variances. L-F is slightly more conservative, giving ~1-2% more events than Schoenfeld for the same parameters.
2. Alpha handling in gsSurv()
O’Brien-Fleming spending makes things a bit ambiguous due to the one-sided versus two-sided spending.
- SAS: Uses a symmetric two-sided test with alpha = 0.05. Per-tail, \(z_{\alpha/2} = z_{0.025} = 1.96\). Boundaries and event calculations both use 0.025 per tail.
-
gsSurv(): The
test.type = 2withalpha = 0.05creates the same symmetric boundaries, but computes a different group sequential design inflation factor because it counts power from both tails. SAS computes the inflation factor using one-sided power only.
A possible strategy for aligning gsSurv() and SAS
calculations in this case is to use test.type = 1 with
alpha = 0.025 to match SAS per-tail logic, as we will see
momentarily.
Aligning the two approaches
gsSurv() with aligned parameters
Let’s start our work with gsDesign by defining parameters to match SAS Example 109.14:
k <- 4
alpha <- 0.05 # Two-sided total alpha (SAS convention)
beta <- 0.10 # 1 - power = 0.10 -> 90% power
lambdaC <- 0.03466 # Control hazard rate
lambdaE <- 0.01733 # Experimental hazard rate
HR <- lambdaE / lambdaC # = 0.5
timing <- c(0.25, 0.50, 0.75, 1.00) # Equally spaced information fractions
accrate <- 15 # Uniform accrual rate (subjects per time unit)
R <- 18 # Accrual duration (time units)
N <- accrate * RInstead of starting with a call to gsDesign::gsDesign(),
we begin with gsDesign::gsSurv(). The gsSurv()
function combines nSurv() with gsDesign()
(group sequential boundaries) in one call. It is the standard gsDesign
function for designing time-to-event trials.
Two key differences from SAS remain even with the alpha parameter aligned with SAS:
-
gsSurv()defaults to Lachin-Foulkes sample size; Lachin-Foulkes gives slightly more events than Schoenfeld. -
gsSurv()with \(T\) specified varies the accrual rate to achieve power, rather than fixing \(N\) and solving for \(T\) as SAS does. To address this, we set theTparameter equal to the study duration time calculated by SAS.
des_2 <- gsSurv(
k = 4,
test.type = 1, # One-sided (matches SAS per-tail)
alpha = 0.025, # alpha/2 per tail
beta = 0.10,
sfu = sfLDOF,
timing = c(.25, .50, .75, 1),
lambdaC = 0.03466,
hr = 0.5,
eta = 0, # Assume no dropout
gamma = 15, # Input rate; gsSurv will **adjust** this
R = 18,
T = 25.13323, # This is the study duration computed by SAS
ratio = 1,
method = "Schoenfeld" # Use Schoenfeld bounds to match SAS
)
des_2
#> Group sequential design (method=Schoenfeld; k=4 analyses; One-sided (efficacy only))
#> HR=0.500 vs HR0=1.000 | alpha=0.025 (sided=1) | power=90.0%
#> N=278.7 subjects | D=89.1 events | T=25.1 study duration | accrual=19.1 Accrual duration | minfup=6.0 minimum follow-up | ratio=1 randomization ratio (experimental/control)
#>
#> Spending functions:
#> Efficacy bounds derived using a Lan-DeMets O'Brien-Fleming approximation spending function (no parameters).
#>
#> Analysis summary:
#> Method: Schoenfeld
#> Analysis Value Efficacy
#> IA 1: 25% Z 4.3326
#> N: 168 p (1-sided) 0.0000
#> Events: 23 ~HR at bound 0.1594
#> Month: 11 P(Cross) if HR=1 0.0000
#> P(Cross) if HR=0.5 0.0035
#> IA 2: 50% Z 2.9631
#> N: 242 p (1-sided) 0.0015
#> Events: 45 ~HR at bound 0.4115
#> Month: 17 P(Cross) if HR=1 0.0015
#> P(Cross) if HR=0.5 0.2579
#> IA 3: 75% Z 2.3590
#> N: 280 p (1-sided) 0.0092
#> Events: 67 ~HR at bound 0.5615
#> Month: 21 P(Cross) if HR=1 0.0096
#> P(Cross) if HR=0.5 0.6853
#> Final Z 2.0141
#> N: 280 p (1-sided) 0.0220
#> Events: 90 ~HR at bound 0.6526
#> Month: 25 P(Cross) if HR=1 0.0250
#> P(Cross) if HR=0.5 0.9000
#>
#> Key inputs (names preserved):
#> desc item value input
#> Accrual rate(s) gamma 14.568 15
#> Accrual rate duration(s) R 19.133 18
#> Control hazard rate(s) lambdaC 0.035 0.035
#> Control dropout rate(s) eta 0 0
#> Experimental dropout rate(s) etaE 0 etaE
#> Event and dropout rate duration(s) S NULL S
gsBoundSummary(des_2)
#> Method: Schoenfeld
#> Analysis Value Efficacy
#> IA 1: 25% Z 4.3326
#> N: 168 p (1-sided) 0.0000
#> Events: 23 ~HR at bound 0.1594
#> Month: 11 P(Cross) if HR=1 0.0000
#> P(Cross) if HR=0.5 0.0035
#> IA 2: 50% Z 2.9631
#> N: 242 p (1-sided) 0.0015
#> Events: 45 ~HR at bound 0.4115
#> Month: 17 P(Cross) if HR=1 0.0015
#> P(Cross) if HR=0.5 0.2579
#> IA 3: 75% Z 2.3590
#> N: 280 p (1-sided) 0.0092
#> Events: 67 ~HR at bound 0.5615
#> Month: 21 P(Cross) if HR=1 0.0096
#> P(Cross) if HR=0.5 0.6853
#> Final Z 2.0141
#> N: 280 p (1-sided) 0.0220
#> Events: 90 ~HR at bound 0.6526
#> Month: 25 P(Cross) if HR=1 0.0250
#> P(Cross) if HR=0.5 0.9000Observations:
Using gsSurv() with parameters that align with SAS
settings, we notice that
- \(Z\)-boundaries: 4.333, 2.963, 2.359, 2.014 -> matches SAS
- Events: 23, 45, 67, 90 -> matches SAS; gsDesign is just taking
the
ceiling()of the SAS output. - Total N: 280 (slightly more than SAS’s fixed 270). This is a result
of the fact that
gsSurv()is varyingRandgamma(corresponding conceptually to the parametersacctimeandaccratein PROC SEQDESIGN) in order to maintain power.
Bringing it full circle with SAS PROC SEQDESIGN
Observe the final values for R and gamma
calculated by gsSurv() in the “Key inputs” table output
(see above) by des_2: gamma = 14.568, and
R = 19.133 (slightly different than our input 15 and 18).
If we substitute in these values for the accrate and
acctime parameters in SAS, we will see that the SAS
calculations produce a final sample size of N = 278.7,
aligning with the gsSurv() sample size up.
proc seqdesign;
ErrorSpend: design nstages=4 method=errfuncobf;
samplesize model=twosamplesurvival(
nullhazard = 0.03466
hazard = 0.01733
accrual = uniform
accrate = 14.568
acctime = 19.133
ceiling = time);
run;