
Futility and harm bounds for overall survival monitoring
Keaven Anderson
Source:vignettes/HarmBound.Rmd
HarmBound.RmdIntroduction
When clinical trials include overall survival (OS) as a secondary or
exploratory endpoint, regulators may recommend not only monitoring for
early evidence of efficacy and futility, but also for potential
harm — that is, evidence that the experimental treatment may be
worsening survival relative to control. This article
demonstrates how the gsDesign package supports group
sequential designs with three boundaries: an efficacy
(upper) bound, a futility (lower) bound, and a
harm bound, using test.type = 8
(non-binding) and test.type = 7 (binding).
Regulatory context: FDA guidance on OS monitoring in oncology
The FDA draft guidance Assessment of Overall Survival Evidence in Support of Accelerated Approval of Oncology Therapeutics (U.S. Food and Drug Administration 2024) describes expectations for monitoring OS in the context of trials that may receive accelerated approval based on surrogate endpoints. The guidance states that sponsors should specify pre-planned boundaries for interim OS monitoring, including criteria for stopping a trial early if there is evidence of a detrimental effect on OS. Key points include:
- Sponsors should include a pre-specified statistical analysis plan for interim OS analyses, including the timing and number of interim looks.
- At a minimum, the guidance expects monitoring for OS harm (i.e., a detrimental trend in overall survival) using pre-specified boundaries.
- Separate from the harm boundary, the sponsor should establish a futility boundary to stop the trial if the experimental treatment is unlikely to demonstrate an OS benefit.
- The statistical plan should describe the spending functions used for each boundary and how the overall Type I error and Type II error are controlled.
This motivates the design framework with test.type = 7
(binding futility and harm bounds) or test.type = 8
(non-binding futility and harm bounds), where three boundaries are
simultaneously specified using spending functions.
Design framework overview
In a standard two-sided asymmetric group sequential design
(test.type = 3 or 4), there are two
boundaries:
- Efficacy (upper) bound: Reject \(H_0\) if the test statistic exceeds this boundary (evidence of treatment benefit).
- Futility (lower) bound: Stop for futility if the test statistic falls below this boundary (insufficient evidence of treatment benefit).
The harm bound extension (test.type = 7 or
8) adds a third boundary:
- Harm bound: Signal that the experimental treatment may be harming patients (evidence of a detrimental effect).
The harm bound lies below the futility bound. At each analysis, there are four possible outcomes:
- Cross the efficacy bound (above): Stop for efficacy.
- Between the efficacy and futility bounds: Continue the trial.
- Cross the futility bound but not the harm bound (between futility and harm): Stop for futility.
- Cross the harm bound (below): Stop for harm.
The harm bound is intended so that if a small observed p-value favoring control is observed, the harm bound will be crossed. That is, the harm bound flags evidence that the experimental treatment may be worsening survival — a negative treatment effect on the log hazard ratio scale.
Design with non-binding bounds (test.type = 8)
We demonstrate a survival design using gsSurvCalendar()
with test.type = 8 (non-binding futility and harm bounds).
The scenario is based on a 1:1 randomized trial monitoring overall
survival with:
- Median control survival: 3 years (36 months), i.e., \(\lambda_C = \log(2)/36\).
- Target hazard ratio: HR = 0.75 (25% reduction in hazard).
- Power: 90% (\(\beta = 0.1\)).
- One-sided \(\alpha\): 0.0125 (e.g., the OS component of a trial with multiplicity adjustment).
- Enrollment: Uniform enrollment over 18 months.
- Study duration: 5 years (60 months) with planned analyses at years 1, 2, 3, 4, and 5 from start of enrollment.
The astar parameter controls the total spending for the
harm bound under \(H_0\). We set
astar = 0.1, meaning the total probability of crossing the
harm bound under \(H_0\) is 10%.
Spending function specification
We specify:
-
Efficacy bound: Lan-DeMets O’Brien-Fleming
(
sfLDOF) spending function (conservative, spending little \(\alpha\) at early analyses). - Futility bound: Hwang-Shih-DeCani (HSD) spending function with \(\gamma = -2\) (moderate \(\beta\)-spending under \(H_1\)).
-
Harm bound: Lan-DeMets Pocock
(
sfLDPocock) spending function (spending under \(H_0\) for detecting harm).
x8 <- gsSurvCalendar(
test.type = 8,
alpha = 0.0125,
beta = 0.1,
astar = 0.1,
calendarTime = c(12, 24, 36, 48, 60),
sfu = sfLDOF,
sfl = sfHSD, sflpar = -2,
sfharm = sfLDPocock,
lambdaC = log(2) / 36,
hr = 0.75,
R = 18,
minfup = 42
)Summary
The summary() method provides a concise description of
the design:
cat(strwrap(summary(x8), width = 65), sep = "\n")
#> Asymmetric two-sided group sequential design with non-binding
#> futility and harm bounds, 5 analyses, time-to-event outcome with
#> sample size 1148 and 657 events required, 90 percent power, 1.25
#> percent (1-sided) Type I error to detect a hazard ratio of 0.75.
#> Enrollment and total study durations are assumed to be 18 and 60
#> months, respectively. Efficacy bounds derived using a Lan-DeMets
#> O'Brien-Fleming approximation spending function (no parameters).
#> Futility bounds derived using a Hwang-Shih-DeCani spending
#> function with gamma = -2. Harm bounds derived using a Lan-DeMets
#> Pocock approximation spending function.Detailed boundary table
The gsBoundSummary() function produces a tabular summary
with columns for each boundary. By default, B-value,
Spending, CP, CP H1, and
PP are excluded. We note that for the first interim
analysis, the efficacy bound is so extreme it is effectively impossible
to cross. However, the harm and futility bounds are more moderate,
allowing for early stopping if there is evidence of harm or futility.
The futility bound is an indicator of why bounds are often non-binding —
the futility bound is not intended to be a strict stopping rule, but
rather a signal that the trial may be unlikely to succeed if it
continues. Crossing the harm bound is a stronger indication that the
treatment may be harmful, and the trial should be at least paused with a
recommendation to review the safety and other endpoint data.
gsBoundSummary(x8)
#> Method: LachinFoulkes
#> Analysis Value Harm Futility Efficacy
#> IA 1: 11% Z -2.1121 -1.4408 7.4336
#> N: 766 p (1-sided) 0.9827 0.9252 0.0000
#> Events: 73 ~HR at bound 1.6434 1.4034 0.1740
#> Month: 12 P(Cross) if HR=1 0.0173 0.0748 0.0000
#> P(Cross) if HR=0.75 0.0004 0.0039 0.0000
#> IA 2: 38% Z -1.7667 0.1212 3.8622
#> N: 1148 p (1-sided) 0.9614 0.4518 0.0001
#> Events: 253 ~HR at bound 1.2491 0.9849 0.6149
#> Month: 24 P(Cross) if HR=1 0.0507 0.5554 0.0001
#> P(Cross) if HR=0.75 0.0004 0.0181 0.0574
#> IA 3: 63% Z -1.7256 1.0566 2.9347
#> N: 1148 p (1-sided) 0.9578 0.1454 0.0017
#> Events: 416 ~HR at bound 1.1846 0.9015 0.7497
#> Month: 36 P(Cross) if HR=1 0.0736 0.8641 0.0017
#> P(Cross) if HR=0.75 0.0004 0.0398 0.4990
#> IA 4: 83% Z -1.7170 1.7357 2.5278
#> N: 1148 p (1-sided) 0.9570 0.0413 0.0057
#> Events: 548 ~HR at bound 1.1580 0.8622 0.8057
#> Month: 48 P(Cross) if HR=1 0.0890 0.9631 0.0062
#> P(Cross) if HR=0.75 0.0004 0.0675 0.7996
#> Final Z -1.7149 2.3072 2.3072
#> N: 1148 p (1-sided) 0.9568 0.0105 0.0105
#> Events: 657 ~HR at bound 1.1433 0.8352 0.8352
#> Month: 60 P(Cross) if HR=1 0.1000 0.9888 0.0112
#> P(Cross) if HR=0.75 0.0004 0.1000 0.9000To include all statistics:
gsBoundSummary(x8, exclude = c())
#> Method: LachinFoulkes
#> Analysis Value Harm Futility Efficacy
#> IA 1: 11% Z -2.1121 -1.4408 7.4336
#> N: 766 p (1-sided) 0.9827 0.9252 0.0000
#> Events: 73 ~HR at bound 1.6434 1.4034 0.1740
#> Month: 12 Spending 0.0173 0.0039 0.0000
#> B-value -0.7011 -0.4782 2.4674
#> CP 0.0000 0.0000 1.0000
#> CP H1 0.4619 0.5942 1.0000
#> PP 0.0011 0.0097 1.0000
#> P(Cross) if HR=1 0.0173 0.0748 0.0000
#> P(Cross) if HR=0.75 0.0004 0.0039 0.0000
#> IA 2: 38% Z -1.7667 0.1212 3.8622
#> N: 1148 p (1-sided) 0.9614 0.4518 0.0001
#> Events: 253 ~HR at bound 1.2491 0.9849 0.6149
#> Month: 24 Spending 0.0334 0.0143 0.0001
#> B-value -1.0954 0.0751 2.3947
#> CP 0.0000 0.0024 1.0000
#> CP H1 0.0097 0.4033 0.9994
#> PP 0.0000 0.0358 0.9994
#> P(Cross) if HR=1 0.0507 0.5554 0.0001
#> P(Cross) if HR=0.75 0.0004 0.0181 0.0574
#> IA 3: 63% Z -1.7256 1.0566 2.9347
#> N: 1148 p (1-sided) 0.9578 0.1454 0.0017
#> Events: 416 ~HR at bound 1.1846 0.9015 0.7497
#> Month: 36 Spending 0.0229 0.0217 0.0016
#> B-value -1.3725 0.8404 2.3343
#> CP 0.0000 0.0396 0.9928
#> CP H1 0.0000 0.3449 0.9928
#> PP 0.0000 0.0776 0.9759
#> P(Cross) if HR=1 0.0736 0.8641 0.0017
#> P(Cross) if HR=0.75 0.0004 0.0398 0.4990
#> IA 4: 83% Z -1.7170 1.7357 2.5278
#> N: 1148 p (1-sided) 0.9570 0.0413 0.0057
#> Events: 548 ~HR at bound 1.1580 0.8622 0.8057
#> Month: 48 Spending 0.0154 0.0277 0.0046
#> B-value -1.5689 1.5860 2.3098
#> CP 0.0000 0.1578 0.8708
#> CP H1 0.0000 0.3906 0.9337
#> PP 0.0000 0.1793 0.8485
#> P(Cross) if HR=1 0.0890 0.9631 0.0062
#> P(Cross) if HR=0.75 0.0004 0.0675 0.7996
#> Final Z -1.7149 2.3072 2.3072
#> N: 1148 p (1-sided) 0.9568 0.0105 0.0105
#> Events: 657 ~HR at bound 1.1433 0.8352 0.8352
#> Month: 60 Spending 0.0110 0.0325 0.0062
#> B-value -1.7149 2.3072 2.3072
#> P(Cross) if HR=1 0.1000 0.9888 0.0112
#> P(Cross) if HR=0.75 0.0004 0.1000 0.9000Interpreting the boundaries
The design has five analyses at calendar times of 12, 24, 36, 48, and 60 months. At each analysis, the test statistic (Z-value) is compared against three boundaries:
bounds <- data.frame(
Analysis = 1:x8$k,
Month = x8$T,
Events = ceiling(x8$n.I),
Harm = round(x8$harm$bound, 2),
Futility = round(x8$lower$bound, 2),
Efficacy = round(x8$upper$bound, 2)
)
kable(bounds, caption = "Z-value boundaries at each analysis")| Analysis | Month | Events | Harm | Futility | Efficacy |
|---|---|---|---|---|---|
| 1 | 12 | 73 | -2.11 | -1.44 | 7.43 |
| 2 | 24 | 253 | -1.77 | 0.12 | 3.86 |
| 3 | 36 | 416 | -1.73 | 1.06 | 2.93 |
| 4 | 48 | 548 | -1.72 | 1.74 | 2.53 |
| 5 | 60 | 657 | -1.71 | 2.31 | 2.31 |
Decision rules at each analysis:
- If \(Z >\) efficacy bound: Stop for efficacy (reject \(H_0\)).
- If futility bound \(< Z \leq\) efficacy bound: Continue the trial.
- If harm bound \(< Z \leq\) futility bound: Stop for futility.
- If \(Z \leq\) harm bound: Stop for harm.
Note that the harm bound is always at or below the futility bound. At early analyses, the harm and futility bounds may coincide when the harm spending function has not yet allocated sufficient spending to differentiate them.
Boundary crossing probabilities
We examine the operating characteristics under two scenarios: no treatment effect (HR = 1, i.e., under \(H_0\)) and the design alternative (HR = 0.75).
probs <- data.frame(
Scenario = c(rep("Under H0 (HR=1)", x8$k), rep("Under H1 (HR=0.75)", x8$k)),
Analysis = rep(1:x8$k, 2),
Month = rep(x8$T, 2),
`P(Efficacy)` = c(x8$upper$prob[, 1], x8$upper$prob[, 2]),
`P(Futility)` = c(x8$lower$prob[, 1], x8$lower$prob[, 2]),
`P(Harm)` = c(x8$harm$prob[, 1], x8$harm$prob[, 2]),
check.names = FALSE
)
kable(probs, digits = 4, caption = "Boundary crossing probabilities")| Scenario | Analysis | Month | P(Efficacy) | P(Futility) | P(Harm) |
|---|---|---|---|---|---|
| Under H0 (HR=1) | 1 | 12 | 0.0000 | 0.0748 | 0.0173 |
| Under H0 (HR=1) | 2 | 24 | 0.0001 | 0.4806 | 0.0334 |
| Under H0 (HR=1) | 3 | 36 | 0.0016 | 0.3087 | 0.0229 |
| Under H0 (HR=1) | 4 | 48 | 0.0045 | 0.0990 | 0.0154 |
| Under H0 (HR=1) | 5 | 60 | 0.0050 | 0.0257 | 0.0110 |
| Under H1 (HR=0.75) | 1 | 12 | 0.0000 | 0.0039 | 0.0004 |
| Under H1 (HR=0.75) | 2 | 24 | 0.0574 | 0.0143 | 0.0000 |
| Under H1 (HR=0.75) | 3 | 36 | 0.4416 | 0.0217 | 0.0000 |
| Under H1 (HR=0.75) | 4 | 48 | 0.3006 | 0.0277 | 0.0000 |
| Under H1 (HR=0.75) | 5 | 60 | 0.1004 | 0.0325 | 0.0000 |
Under \(H_0\), the cumulative probability of crossing the harm bound across all analyses is approximately 0.1, reflecting the spending allocated to the harm boundary. Under \(H_1\) (HR = 0.75), crossing the harm bound is very unlikely (4^{-4}), since the treatment is beneficial.
Visualization
All standard plot() types are supported for
test.type = 7 and 8 designs, with a third line
(or set of lines) shown for the harm bound.
Z-value boundaries
The default plot shows Z-value boundaries at each analysis. Three boundaries are displayed: efficacy (upper), futility (lower), and harm (below futility).
plot(x8)Z-value boundaries for non-binding harm bound design
Boundary crossing probabilities
The power plot (plottype = 2) shows cumulative boundary
crossing probabilities as a function of the treatment effect. Three sets
of lines appear: upper bound (cumulative efficacy crossing probability),
1-Futility bound, and 1-Harm bound. The harm lines are above the
futility lines because the probability of crossing the harm bound is
less than or equal to the probability of crossing the futility bound. We
note that when the underlying treatment effect favors control, the high
probability of crossing the harm bound indicates that the harm bound is
sensitive and serves its intended purpose
plot(x8, plottype = 2)Boundary crossing probabilities for non-binding harm bound design
Approximate treatment effect at boundaries
The effect size plot (plottype = 3) shows the
approximate treatment effect at each boundary. For survival designs,
this is expressed as the approximate hazard ratio at the boundary.
plot(x8, plottype = 3)Approximate treatment effect at boundaries
Conditional power at boundaries
Conditional power (plottype = 4) at each interim
analysis is shown for all three boundaries. This is generally not a very
useful plot.
plot(x8, plottype = 4)Conditional power at boundaries
Spending function plot
The spending function plot (plottype = 5) shows the
three spending functions: \(\alpha\)
(efficacy), \(\beta\) (futility), and
harm.
plot(x8, plottype = 5)Spending functions for non-binding harm bound design
B-values at boundaries
B-values (plottype = 7) are Z-values scaled by \(\sqrt{t}\) where \(t\) is the information fraction. As
discussed by Proschan, Lan, and Wittes
(2006), the expected value of B-values increases linearly with
the information fraction under the assumption of a constant treatment
effect (proportional hazards). This linear relationship makes B-values
useful for visual assessment of treatment effect trends across interim
analyses: departures from linearity may suggest non-proportional hazards
or other changes in treatment effect over time. Three boundary lines are
shown: efficacy, futility, and harm.
plot(x8, plottype = 7)B-values at boundaries
Design with binding bounds (test.type = 7)
For test.type = 7, both the futility and harm bounds are
binding — meaning the computation of the efficacy bound
assumes the trial will stop if either bound is crossed. This
yields a slightly less conservative efficacy bound (easier to cross),
but at the cost of inflated Type I error if the stopping rule is not
strictly followed.
We first create a binding design with \(\alpha = 0.0125\) to compare with the non-binding design above:
x7 <- gsSurvCalendar(
test.type = 7,
alpha = 0.0125,
beta = 0.1,
astar = 0.1,
calendarTime = c(12, 24, 36, 48, 60),
sfu = sfLDOF,
sfl = sfHSD, sflpar = -2,
sfharm = sfLDPocock,
lambdaC = log(2) / 36,
hr = 0.75,
R = 18,
minfup = 42
)Comparing binding and non-binding
comparison <- data.frame(
Bound = c("Efficacy", "Futility", "Harm"),
`Binding (type 7)` = c(
paste(round(x7$upper$bound, 3), collapse = ", "),
paste(round(x7$lower$bound, 3), collapse = ", "),
paste(round(x7$harm$bound, 3), collapse = ", ")
),
`Non-binding (type 8)` = c(
paste(round(x8$upper$bound, 3), collapse = ", "),
paste(round(x8$lower$bound, 3), collapse = ", "),
paste(round(x8$harm$bound, 3), collapse = ", ")
),
check.names = FALSE
)
kable(comparison, caption = "Comparison of binding vs. non-binding Z-value boundaries")| Bound | Binding (type 7) | Non-binding (type 8) |
|---|---|---|
| Efficacy | 7.434, 3.862, 2.934, 2.523, 2.248 | 7.434, 3.862, 2.935, 2.528, 2.307 |
| Futility | -1.458, 0.09, 1.016, 1.689, 2.248 | -1.441, 0.121, 1.057, 1.736, 2.307 |
| Harm | -2.112, -1.767, -1.726, -1.717, -1.715 | -2.112, -1.767, -1.726, -1.717, -1.715 |
Note that the efficacy bounds for test.type = 7
(binding) are slightly lower (easier to cross) than for
test.type = 8 (non-binding). The maximum number of events
for test.type = 7 (639) is also slightly smaller than for
test.type = 8 (657), reflecting the assumption that the
trial will stop at the lower bounds.
gsBoundSummary(x7)
#> Method: LachinFoulkes
#> Analysis Value Harm Futility Efficacy
#> IA 1: 11% Z -2.1121 -1.4578 7.4336
#> N: 746 p (1-sided) 0.9827 0.9275 0.0000
#> Events: 71 ~HR at bound 1.6550 1.4158 0.1698
#> Month: 12 P(Cross) if HR=1 0.0173 0.0725 0.0000
#> P(Cross) if HR=0.75 0.0005 0.0039 0.0000
#> IA 2: 38% Z -1.7667 0.0895 3.8622
#> N: 1118 p (1-sided) 0.9614 0.4643 0.0001
#> Events: 246 ~HR at bound 1.2531 0.9886 0.6107
#> Month: 24 P(Cross) if HR=1 0.0507 0.5430 0.0001
#> P(Cross) if HR=0.75 0.0005 0.0181 0.0539
#> IA 3: 63% Z -1.7256 1.0159 2.9344
#> N: 1118 p (1-sided) 0.9578 0.1548 0.0017
#> Events: 404 ~HR at bound 1.1874 0.9038 0.7467
#> Month: 36 P(Cross) if HR=1 0.0736 0.8551 0.0017
#> P(Cross) if HR=0.75 0.0005 0.0398 0.4829
#> IA 4: 83% Z -1.7170 1.6890 2.5229
#> N: 1118 p (1-sided) 0.9570 0.0456 0.0058
#> Events: 533 ~HR at bound 1.1604 0.8639 0.8037
#> Month: 48 P(Cross) if HR=1 0.0890 0.9592 0.0063
#> P(Cross) if HR=0.75 0.0005 0.0675 0.7881
#> Final Z -1.7149 2.2480 2.2480
#> N: 1118 p (1-sided) 0.9568 0.0123 0.0123
#> Events: 639 ~HR at bound 1.1454 0.8370 0.8370
#> Month: 60 P(Cross) if HR=1 0.1000 0.9875 0.0125
#> P(Cross) if HR=0.75 0.0005 0.1000 0.9000Efficacy bounds at alternate \(\alpha\) levels
The gsBoundSummary() function accepts an
alpha argument to display efficacy bounds at one or more
alternate \(\alpha\) levels alongside
the original design. Here we show the non-binding design
(x8) with efficacy bounds for both \(\alpha = 0.0125\) (the design level) and
\(\alpha = 0.025\):
gsBoundSummary(x8, alpha = 0.025)
#> Analysis Value α=0.0125 α=0.025 Futility Harm
#> IA 1: 11% Z 7.4336 6.6513 -1.4408 -2.1121
#> N: 766 p (1-sided) 0.0000 0.0000 0.9252 0.9827
#> Events: 73 ~HR at bound 0.1740 0.2092 1.4034 1.6434
#> Month: 12 P(Cross) if HR=1 0.0000 0.0000 0.0748 0.0173
#> P(Cross) if HR=0.75 0.0000 0.0000 0.0039 0.0004
#> IA 2: 38% Z 3.8622 3.4312 0.1212 -1.7667
#> N: 1148 p (1-sided) 0.0001 0.0003 0.4518 0.9614
#> Events: 253 ~HR at bound 0.6149 0.6492 0.9849 1.2491
#> Month: 24 P(Cross) if HR=1 0.0001 0.0003 0.5554 0.0507
#> P(Cross) if HR=0.75 0.0574 0.1259 0.0181 0.0004
#> IA 3: 63% Z 2.9347 2.5948 1.0566 -1.7256
#> N: 1148 p (1-sided) 0.0017 0.0047 0.1454 0.9578
#> Events: 416 ~HR at bound 0.7497 0.7751 0.9015 1.1846
#> Month: 36 P(Cross) if HR=1 0.0017 0.0048 0.8641 0.0736
#> P(Cross) if HR=0.75 0.4990 0.6323 0.0398 0.0004
#> IA 4: 83% Z 2.5278 2.2359 1.7357 -1.7170
#> N: 1148 p (1-sided) 0.0057 0.0127 0.0413 0.9570
#> Events: 548 ~HR at bound 0.8057 0.8261 0.8622 1.1580
#> Month: 48 P(Cross) if HR=1 0.0062 0.0138 0.9631 0.0890
#> P(Cross) if HR=0.75 0.7996 0.8684 0.0675 0.0004
#> Final Z 2.3072 2.0432 2.3072 -1.7149
#> N: 1148 p (1-sided) 0.0105 0.0205 0.0105 0.9568
#> Events: 657 ~HR at bound 0.8352 0.8526 0.8352 1.1433
#> Month: 60 P(Cross) if HR=1 0.0112 0.0201 0.9888 0.1000
#> P(Cross) if HR=0.75 0.9000 0.9218 0.1000 0.0004Practical considerations
Choice of spending functions
The choice of spending functions for the three boundaries should reflect regulatory and scientific considerations:
-
Efficacy: A conservative spending function such as
Lan-DeMets O’Brien-Fleming (
sfLDOF) is typical, spending very little \(\alpha\) at early interim analyses when limited information is available. - Futility: Moderate spending (e.g., HSD with \(\gamma = -2\)) allows early stopping for futility when the treatment effect is clearly absent.
-
Harm: The Lan-DeMets Pocock
(
sfLDPocock) spending function provides more aggressive spending at early analyses, which is appropriate for harm monitoring since detecting a detrimental effect early is critical for patient safety.
Interpreting the harm bound
The harm bound is intended so that if a small observed p-value favoring control is observed, the harm bound will be crossed. In terms of the test statistic, a negative Z-value indicates that the hazard rate is higher in the experimental arm than the control arm — i.e., the experimental treatment appears to be worsening survival. When the Z-value falls below the harm bound, this constitutes a statistical signal that the treatment may be harmful, and the trial should be stopped with a recommendation to review the safety data.
The harm spending is computed under \(H_0\) (no treatment effect), reflecting the probability of observing an apparent harmful effect by chance when there is actually no true effect. This controls the probability of a false harm signal.
Harm bound capping
In the implementation, the harm bound is automatically capped so it never exceeds the futility bound. This ensures the ordering: harm bound \(\leq\) futility bound \(\leq\) efficacy bound at every analysis.
When to use test.type = 7
vs. test.type = 8
-
test.type = 8(non-binding) is most often preferred in practice. Regulators will generally expect non-binding bounds, which preserve Type I error control regardless of whether the stopping rules are strictly followed. Since Data Monitoring Committees (DMCs) typically retain discretion to continue or stop a trial based on the totality of the evidence, the non-binding approach ensures that the statistical validity of the efficacy analysis is maintained even if a futility or harm boundary is crossed but the trial continues. -
test.type = 7(binding) is appropriate when there is a firm commitment to stop the trial upon crossing any boundary. This provides a small efficiency gain (slightly easier efficacy bounds and fewer required events) but requires strict protocol adherence. If the trial does not stop after crossing a binding boundary, Type I error may be inflated.
In most regulatory settings, test.type = 8 is the safer
and more common choice.