
Selective bound testing at interim analyses
Keaven Anderson
Source:vignettes/SelectiveBoundTesting.Rmd
SelectiveBoundTesting.RmdIntroduction
In many clinical trial designs, it is desirable to test only certain boundaries at specific interim analyses. For example:
- Futility testing at early interims only: In some trial designs, a futility assessment is only meaningful at the first interim analysis. Later analyses focus solely on efficacy.
- No efficacy testing at the first interim: Some regulatory or operational considerations may preclude testing for efficacy at the very first interim.
-
Selective harm monitoring: For designs with harm
bounds (
test.type = 7or8), harm monitoring may be needed only at certain analyses.
The testUpper, testLower, and
testHarm parameters in gsDesign(),
gsSurv(), and gsSurvCalendar() allow
fine-grained control over which bounds are active at each analysis. When
a bound is inactive at a given analysis, it is set to an extreme value
(\(\pm 20\) on the Z-scale) so that it
cannot be crossed, and is displayed as NA in summaries.
Parameters
Each of testUpper, testLower, and
testHarm accepts either a single logical value (recycled to
all analyses) or a logical vector of length k (the number
of analyses):
| Parameter | Description | Default | Constraints |
|---|---|---|---|
testUpper |
Test the upper (efficacy) bound | TRUE |
Must be TRUE at the final analysis. For
test.type 1 and 2, overridden to all
TRUE. |
testLower |
Test the lower (futility) bound | TRUE |
Ignored for test.type = 1 (one-sided). Overridden to
all TRUE for test.type = 2 (symmetric). For
test.type >= 3, at least one analysis must have
testLower = TRUE. |
testHarm |
Test the harm bound | TRUE |
Only applies to test.type = 7 or 8. At
least one analysis must have testHarm = TRUE. |
Validation: At every analysis, at least one of the
active bounds must be TRUE. If all three are
FALSE at any analysis, an error is raised.
Example 1: Futility testing only at the first interim
A common scenario is to test for futility only at the first interim analysis, with efficacy testing at all analyses. This is useful when the trial’s data monitoring committee wants an early “go/no-go” decision, but not ongoing futility monitoring.
# 3-analysis design with non-binding futility (test.type = 4)
# Futility testing only at IA1
x1 <- gsDesign(
k = 3,
test.type = 4,
alpha = 0.025,
beta = 0.1,
sfu = sfHSD,
sfupar = -4,
sfl = sfHSD,
sflpar = -2,
testLower = c(TRUE, FALSE, FALSE)
)The lower bound is active only at IA1. At IA2 and the final analysis,
the futility bound shows as NA:
gsBoundSummary(x1)
#> Analysis Value Efficacy Futility
#> IA 1: 33% Z 3.0107 -0.2387
#> N/Fixed design N: 0.36 p (1-sided) 0.0013 0.5943
#> ~delta at bound 1.5553 -0.1233
#> P(Cross) if delta=0 0.0013 0.4057
#> P(Cross) if delta=1 0.1412 0.0148
#> IA 2: 67% Z 2.5465 NA
#> N/Fixed design N: 0.71 p (1-sided) 0.0054 NA
#> ~delta at bound 0.9302 NA
#> P(Cross) if delta=0 0.0062 NA
#> P(Cross) if delta=1 0.5815 NA
#> Final Z 1.9992 NA
#> N/Fixed design N: 1.07 p (1-sided) 0.0228 NA
#> ~delta at bound 0.5963 NA
#> P(Cross) if delta=0 0.0244 NA
#> P(Cross) if delta=1 0.9077 NAThe probabilities under the null and alternative are recomputed accounting for the inactive bounds. Notice the cumulative futility crossing probability does not increase after IA1 since no further futility testing occurs.
We can also see the bounds in the print() output:
x1
#> Asymmetric two-sided group sequential design with
#> 90 % power and 2.5 % Type I Error.
#> Upper bound spending computations assume
#> trial continues if lower bound is crossed.
#>
#> Sample
#> Size ----Lower bounds---- ----Upper bounds-----
#> Analysis Ratio* Z Nominal p Spend+ Z Nominal p Spend++
#> 1 0.357 -0.24 0.4057 0.0148 3.01 0.0013 0.0013
#> 2 0.713 NA NA NA 2.55 0.0054 0.0049
#> 3 1.070 NA NA NA 2.00 0.0228 0.0188
#> Total 0.0148 0.0250
#> + lower bound beta spending (under H1):
#> Hwang-Shih-DeCani spending function with gamma = -2.
#> ++ alpha spending:
#> Hwang-Shih-DeCani spending function with gamma = -4.
#> * Sample size ratio compared to fixed design with no interim
#>
#> Boundary crossing probabilities and expected sample size
#> assume any cross stops the trial
#>
#> Upper boundary (power or Type I Error)
#> Analysis
#> Theta 1 2 3 Total E{N}
#> 0.0000 0.0013 0.0049 0.0181 0.0244 0.7779
#> 3.2415 0.1412 0.4403 0.3262 0.9077 0.8016
#>
#> Lower boundary (futility or Type II Error)
#> Analysis
#> Theta 1 2 3 Total
#> 0.0000 0.4057 0 0 0.4057
#> 3.2415 0.0148 0 0 0.0148Plotting
The standard plot shows the active bounds, with inactive bounds omitted:
plot(x1, plottype = 1)Power plot with futility only at IA1
Example 2: No efficacy testing at the first interim
In some settings, particularly early-phase or adaptive designs, efficacy testing may be deferred until sufficient data have accrued. Here we skip the efficacy bound at the first interim:
# 3-analysis design with binding futility (test.type = 3)
# No efficacy testing at IA1
x2 <- gsDesign(
k = 3,
test.type = 3,
alpha = 0.025,
beta = 0.1,
sfu = sfHSD,
sfupar = -4,
sfl = sfHSD,
sflpar = -2,
testUpper = c(FALSE, TRUE, TRUE)
)
gsBoundSummary(x2)
#> Analysis Value Efficacy Futility
#> IA 1: 33% Z NA -0.2579
#> N/Fixed design N: 0.35 p (1-sided) NA 0.6018
#> ~delta at bound NA -0.1346
#> P(Cross) if delta=0 NA 0.3982
#> P(Cross) if delta=1 NA 0.0148
#> IA 2: 67% Z 2.4976 0.9138
#> N/Fixed design N: 0.7 p (1-sided) 0.0063 0.1804
#> ~delta at bound 0.9215 0.3371
#> P(Cross) if delta=0 0.0062 0.8279
#> P(Cross) if delta=1 0.5841 0.0437
#> Final Z 1.9593 1.9593
#> N/Fixed design N: 1.05 p (1-sided) 0.0250 0.0250
#> ~delta at bound 0.5902 0.5902
#> P(Cross) if delta=0 0.0250 0.9750
#> P(Cross) if delta=1 0.9006 0.0994At IA1, only the futility bound is active. The efficacy bound appears
as NA for that analysis.
Example 3: Survival design with selective bounds via gsSurv
The testUpper, testLower, and
testHarm parameters pass through to gsSurv()
and gsSurvCalendar():
# Survival design with futility only at IA1
xs <- gsSurv(
k = 3,
test.type = 4,
alpha = 0.025,
beta = 0.1,
hr = 0.7,
timing = c(0.5, 0.75),
sfu = sfHSD,
sfupar = -4,
sfl = sfHSD,
sflpar = -2,
lambdaC = log(2) / 12,
eta = 0.01,
gamma = 10,
R = 12,
T = 36,
minfup = 24,
testLower = c(TRUE, FALSE, FALSE)
)
gsBoundSummary(xs)
#> Method: LachinFoulkes
#> Analysis Value Efficacy Futility
#> IA 1: 50% Z 2.7500 0.4555
#> N: 524 p (1-sided) 0.0030 0.3244
#> Events: 179 ~HR at bound 0.6623 0.9340
#> Month: 15 P(Cross) if HR=1 0.0030 0.6756
#> P(Cross) if HR=0.7 0.3572 0.0269
#> IA 2: 75% Z 2.4318 NA
#> N: 524 p (1-sided) 0.0075 NA
#> Events: 268 ~HR at bound 0.7427 NA
#> Month: 23 P(Cross) if HR=1 0.0089 NA
#> P(Cross) if HR=0.7 0.6959 NA
#> Final Z 2.0116 NA
#> N: 524 p (1-sided) 0.0221 NA
#> Events: 357 ~HR at bound 0.8081 NA
#> Month: 36 P(Cross) if HR=1 0.0239 NA
#> P(Cross) if HR=0.7 0.9067 NAExample 4: Selective harm monitoring (test.type 7/8)
For designs with a separate harm bound, the testHarm
parameter controls which analyses include harm monitoring. This can be
useful when harm monitoring is most critical during early enrollment,
before longer-term safety data are available.
# Harm bound design with harm monitoring only at IA1 and IA2
xh <- gsDesign(
k = 3,
test.type = 8,
alpha = 0.025,
beta = 0.1,
astar = 0.05,
sfu = sfHSD,
sfupar = -4,
sfl = sfHSD,
sflpar = -2,
sfharm = sfHSD,
sfharmparam = 1,
testHarm = c(TRUE, TRUE, FALSE)
)
gsBoundSummary(xh)
#> Analysis Value Harm Futility
#> IA 1: 33% Z -2.0061 -0.2387
#> N/Fixed design N: 0.36 p (1-sided) 0.9776 0.5943
#> ~delta at bound -1.0363 -0.1233
#> P(Cross) if delta=0 0.0224 0.4057
#> P(Cross) if delta=1 0.0000 0.0148
#> IA 2: 67% Z -1.9827 0.9411
#> N/Fixed design N: 0.71 p (1-sided) 0.9763 0.1733
#> ~delta at bound -0.7242 0.3438
#> P(Cross) if delta=0 0.0385 0.8347
#> P(Cross) if delta=1 0.0000 0.0437
#> Final Z NA 1.9992
#> N/Fixed design N: 1.07 p (1-sided) NA 0.0228
#> ~delta at bound NA 0.5963
#> P(Cross) if delta=0 NA 0.9767
#> P(Cross) if delta=1 NA 0.1000
#> Efficacy
#> 3.0107
#> 0.0013
#> 1.5553
#> 0.0013
#> 0.1412
#> 2.5465
#> 0.0054
#> 0.9302
#> 0.0062
#> 0.5815
#> 1.9992
#> 0.0228
#> 0.5963
#> 0.0233
#> 0.9000The harm bound is NA at the final analysis.
Example 5: Combining selective efficacy and futility
Both testUpper and testLower can be
specified simultaneously. For example, a design with futility-only at
IA1 and efficacy-only at IA2:
# Futility only at IA1, efficacy only at IA2, both at Final
x5 <- gsDesign(
k = 3,
test.type = 4,
alpha = 0.025,
beta = 0.1,
sfu = sfHSD,
sfupar = -4,
sfl = sfHSD,
sflpar = -2,
testUpper = c(FALSE, TRUE, TRUE),
testLower = c(TRUE, FALSE, FALSE)
)
gsBoundSummary(x5)
#> Analysis Value Efficacy Futility
#> IA 1: 33% Z NA -0.2387
#> N/Fixed design N: 0.36 p (1-sided) NA 0.5943
#> ~delta at bound NA -0.1233
#> P(Cross) if delta=0 NA 0.4057
#> P(Cross) if delta=1 NA 0.0148
#> IA 2: 67% Z 2.4979 NA
#> N/Fixed design N: 0.71 p (1-sided) 0.0062 NA
#> ~delta at bound 0.9124 NA
#> P(Cross) if delta=0 0.0062 NA
#> P(Cross) if delta=1 0.5945 NA
#> Final Z 1.9947 NA
#> N/Fixed design N: 1.07 p (1-sided) 0.0230 NA
#> ~delta at bound 0.5949 NA
#> P(Cross) if delta=0 0.0244 NA
#> P(Cross) if delta=1 0.9083 NANote the NA values: efficacy is NA at IA1,
and futility is NA at IA2 and the final analysis.
Validation rules
The following rules are enforced:
-
testUppermust beTRUEat the final analysis (the trial must always be able to reject \(H_0\)). -
At least one bound must be active at every
analysis. For example, setting
testUpper = c(FALSE, TRUE, TRUE)andtestLower = c(FALSE, TRUE, TRUE)would fail because no bound is active at IA1. -
test.type = 1: Only one-sided efficacy testing.testLoweris ignored (set toFALSEinternally). -
test.type = 2: Symmetric two-sided testing. BothtestUpperandtestLowerare overridden to allTRUE. -
test.type3–8:testLowermust beTRUEfor at least one analysis. -
test.type7 and 8:testHarmmust beTRUEfor at least one analysis.
# This fails: testUpper must be TRUE at the final analysis
try(gsDesign(k = 3, test.type = 3, testUpper = c(TRUE, TRUE, FALSE)))
#> Error in gsTestBoundsCheck(x$k, x$test.type, testUpper, testLower, testHarm) :
#> testUpper must be TRUE at the final analysis
# This fails: no bound active at analysis 1
try(gsDesign(k = 3, test.type = 4,
testUpper = c(FALSE, TRUE, TRUE),
testLower = c(FALSE, TRUE, TRUE)
))
#> Error in gsTestBoundsCheck(x$k, x$test.type, testUpper, testLower, testHarm) :
#> At analysis 1 at least one of testUpper, testLower, or testHarm must be TRUEAccessing stored flags
The testUpper, testLower, and
testHarm logical vectors are stored on the returned
gsDesign object:
x1$testUpper
#> [1] TRUE TRUE TRUE
x1$testLower
#> [1] TRUE FALSE FALSE
x1$testHarm
#> [1] FALSE FALSE FALSEThese can be inspected programmatically for downstream analyses or reporting.
Type I Error Preservation
A key property of the selective bounds implementation is that Type I error is preserved at the nominal level regardless of which analyses are skipped.
How it works
When bounds are selectively deactivated, the cumulative spending at each performed analysis remains at the spending function’s planned value. At inactive analyses, the cumulative spending is frozen (no incremental spend), causing the C code to produce \(\pm\)EXTREMEZ bounds. At the next active analysis, the incremental spend absorbs the budget from any prior skipped analyses, so the cumulative spend catches up to the planned level. The efficacy bounds at active analyses are then recomputed using the modified spending with the sample size held fixed, ensuring the total alpha spent equals the nominal level.
Non-binding futility (test.type 4 or 6)
For non-binding designs, the efficacy bounds are computed under the assumption that the trial may continue past the futility bound (i.e., futility does not contribute to the upper alpha calculation). Since upper bounds are independent of lower bounds, removing futility has no effect on the upper (efficacy) bounds or the non-binding alpha:
# Baseline non-binding design
x_nb <- gsDesign(k = 3, test.type = 4, alpha = 0.025, beta = 0.1)
# Remove futility at IA2 and final
x_nb_sel <- gsDesign(k = 3, test.type = 4, alpha = 0.025, beta = 0.1,
testLower = c(TRUE, FALSE, FALSE))
# Non-binding alpha (computed ignoring lower bounds)
nb_alpha_base <- sum(gsDesign:::gsprob(0, x_nb$n.I, rep(-20, 3), x_nb$upper$bound, r = x_nb$r)$probhi)
nb_alpha_sel <- sum(gsDesign:::gsprob(0, x_nb_sel$n.I, rep(-20, 3), x_nb_sel$upper$bound, r = x_nb_sel$r)$probhi)
cat("Baseline non-binding alpha: ", nb_alpha_base, "\n")
#> Baseline non-binding alpha: 0.025
cat("Selective non-binding alpha:", nb_alpha_sel , "\n")
#> Selective non-binding alpha: 0.025
cat("Upper bounds identical: ", all.equal(x_nb$upper$bound, x_nb_sel$upper$bound), "\n")
#> Upper bounds identical: TRUEWhen removing early efficacy bounds, the upper bounds at active analyses adjust to absorb the redistributed spending, still totalling exactly \(\alpha = 0.025\):
# Remove efficacy at IA1
x_nb_eff <- gsDesign(k = 3, test.type = 4, alpha = 0.025, beta = 0.1,
testUpper = c(FALSE, TRUE, TRUE))
nb_alpha_eff <- sum(gsDesign:::gsprob(0, x_nb_eff$n.I, rep(-20, 3), x_nb_eff$upper$bound, r = x_nb_eff$r)$probhi)
cat("Non-binding alpha (skip IA1 efficacy):", nb_alpha_eff, "\n")
#> Non-binding alpha (skip IA1 efficacy): 0.025Binding futility (test.type 3 or 5)
For binding designs, the efficacy bounds depend on the futility bounds. When futility bounds are selectively removed, the bounds are recomputed with the modified spending while holding sample size fixed. This ensures the cumulative Type I error remains at the nominal level:
# Baseline binding design
x_b <- gsDesign(k = 3, test.type = 3, alpha = 0.025, beta = 0.1)
cat("Baseline alpha:", sum(x_b$upper$prob[, 1]), "\n")
#> Baseline alpha: 0.02500087
# Remove futility at IA2 and final
x_b_sel <- gsDesign(k = 3, test.type = 3, alpha = 0.025, beta = 0.1,
testLower = c(TRUE, FALSE, FALSE))
cat("Selective alpha:", sum(x_b_sel$upper$prob[, 1]), "\n")
#> Selective alpha: 0.025
# Remove efficacy at IA1
x_b_eff <- gsDesign(k = 3, test.type = 3, alpha = 0.025, beta = 0.1,
testUpper = c(FALSE, TRUE, TRUE))
cat("Skip IA1 efficacy alpha:", sum(x_b_eff$upper$prob[, 1]), "\n")
#> Skip IA1 efficacy alpha: 0.025In all cases, the actual Type I error is exactly \(\alpha = 0.025\) (within numerical tolerance). The sample size remains unchanged from the baseline design, and the bounds at active analyses adjust to properly allocate the spending budget.