9 Introduction to R coding
The gsDesign web interface generates code that you may save and reuse in the R programming language. This allows you to reproduce results from the web interface. There are many other capabilities available in R that are not in the web interface. These allow flexibility in the design and analysis of a trial, as well as sample size adaptation.
9.1 Introduction
This chapter covers many topics briefly. We hope the examples presented will be of some use.
The next section begins with notes on how to install R and the gsDesign package used in this book. This section also shows how to access help files and uses a timetoevent trial design as an example. This includes how to update the design when the event counts at analyses differ from planned.
Much of the rest of the chapter is based on an example with normal outcomes. The next two sections show how to derive and then analyze data, including how to update the design when analyses do not have the planned sample sizes or event counts planned. Then we move on to show how to compute conditional and predictive power to predict what the outcome at the end of a trial will be based on interim results. In Section 9.6 and Section 9.7 we show how to update a design based on an interim outcome. First, we show how to update a design based on conditional power followed by demonstrating how to update sample size based on an informationbased approach. We finish with informationbased sample size adaptation for a trial with a binomial endpoint.
9.2 Some basics
9.2.1 Installing R and gsDesign
In order to use R code to generate the designs in this book, you need to install R and the gsDesign package. R is available at https://cran.rproject.org/. Once R is installed, you should be able to run
install.packages("gsDesign")
which will install not only the gsDesign package, but also other packages that it depends on such as the ggplot2 package.
Although not necessary, you also may wish to install the RStudio IDE (integrated development environment) for R, available at https://posit.co/. The free, open source version of RStudio Desktop can be used by most readers.
9.2.2 Attaching gsDesign
Once you have gone through the above installations, you are ready to run some code generated by the web interface. After you have opened the R console or RStudio IDE, you will need to attach the gsDesign package with the command
This command puts the package in the search path in R, so the functions are directly usable without repeated typing of the namespace. For the rest of the chapter, we assume that you have attached gsDesign in the R session.
9.2.3 Running an initial example
Now go to the “Report” tab in the output panel of the web interface. There you will see 4 blocks of code at the beginning of the file to 1) attach the gsDesign package, 2) enter most parameters for the design, 3) enter failure rates for the design, and 4) derive the design. The latter includes the toInteger()
function to adapt the design to an integerbased sample size and integer targeted events at analyses. Following code blocks show how to get a text summary, a tabular bound summary, a plot of design boundaries and a plot of expected enrollment and endpoint accrual.
The note at the bottom of this code will generate which versions of gsDesign and R are run as well as the time the code was generated. Comments (lines starting with the #
symbol) tell you what each line of code will do. You will need to run the first line starting with “x <
” in order to derive the design before running the plot, tabular output, or design summary code that summarize the trial exactly as you have seen in the web interface.
Code generated for each endpoint type is different. The supported endpoint types are shown in Figure 9.1.
All but the “Timetoevent” selection use the gsDesign()
function to derive the design. We will demonstrate that along with interim analysis and sample size reestimation in the following sections.
If you want to know more about the gsDesign()
, gsSurv()
, nSurv()
, nBinomial()
, or nNormal()
functions from the gsDesign package, help files and vignettes are available with the package and at the documentation website https://keaven.github.io/gsDesign/. Within the RStudio IDE you can get help for gsDesign as seen in Figure 9.2; note that you can scroll down in the interface to see the remainder of the help text. You can also just enter ?gsDesign
in the R console window to view the same help text. This chapter is just a brief introduction with examples for some of these and for other functions. The help files should help you understand the functions in more depth. They include function descriptions, input and output specifications, code examples, references, and other details.
In order to get help on the plot command, you will need to enter ?plot.gsDesign
or enter plot.gsDesign
in the search field in the Help tab of the RStudio IDE shown in Figure 9.3. The plot.gsDesign()
command will plot designs produced by either the gsDesign()
or, for timetoevent endpoints, gsSurv()
commands. When coding, you can use the shorter plot(x)
command as the plot()
knows to invoke plot.gsDesign()
when x
is produced by gsDesign()
or gsSurv()
.
Finally, to print a summary of the design, you can use gsBoundSummary(x)
to print a tabular summary, summary(x)
to get a short textual summary, or just enter x
and return to get the summary you can see in the “Text” tab of the web interface.
9.2.4 Fixed design sample size and power
To get the fixed design sample size for the normal endpoint trial above, we use the function nNormal()
.
# Derive normal fixed design
n < nNormal(
delta1 = 1,
delta0 = 0,
ratio = 1,
sd = 1,
sd2 = NULL,
alpha = 0.025,
beta = 0.1
)
n
#> [1] 42.02969
We compute the power for this sample size when the standard deviation is larger; note the output includes the power and the input sample size.
# Derive normal fixed design
nNormal(
n = n,
delta1 = 1,
delta0 = 0,
ratio = 1,
sd = 1.25,
sd2 = NULL,
alpha = 0.025,
beta = 0.1
)
#> [1] 0.7367143
n
#> [1] 42.02969
For the binomial example powering at 90% with 1sided Type I error to detect a reduction from a 15% event rate to 10%, we get a fixed sample size with:
nBinomial(
# Control event rate
p1 = 0.15,
# Experimental event rate
p2 = 0.10,
# Null hypothesis event rate difference
# (control  experimental)
delta0 = 0,
# 1sided Type I error
alpha = 0.025,
# Type II error (1  Power)
beta = 0.1,
# Experimental/control randomization ratio
ratio = 1
)
#> [1] 1834.641
If we want more information and more scenarios, we can set the argument outtype = 3
and get a data frame as output. For this example we see the impact a smaller treatment effect on sample size.
nBinomial(
# Control event rate
p1 = c(0.15, 0.14),
# Experimental event rate
p2 = 0.10,
# Null hypothesis event rate difference
# (control  experimental)
delta0 = 0,
# 1sided Type I error
alpha = 0.025,
# Type II error (1  Power)
beta = 0.1,
# Experimental/control randomization ratio
ratio = 1,
outtype = 3
) >
gt::gt() >
gt::fmt_number(n_sigfig = 3, columns = 4:14) >
gt::fmt_number(decimals = 0, columns = 1:3)
n  n1  n2  alpha  sided  beta  Power  sigma0  sigma1  p1  p2  delta0  p10  p20 

1,835  917  917  0.0250  1.00  0.100  0.900  0.661  0.660  0.150  0.100  0  0.125  0.125 
2,770  1,385  1,385  0.0250  1.00  0.100  0.900  0.650  0.649  0.140  0.100  0  0.120  0.120 
We can also compute power rather than sample size with nBinomial()
.
nBinomial(
n = 1500,
# Control event rate
p1 = c(0.15, 0.14),
# Experimental event rate
p2 = 0.10,
# Null hypothesis event rate difference
# (control  experimental)
delta0 = 0,
# 1sided Type I error
alpha = 0.025,
# Experimental/control randomization ratio
ratio = 1,
outtype = 3
) >
gt::gt() >
gt::fmt_number(n_sigfig = 3, columns = 4:14) >
gt::fmt_number(decimals = 0, columns = 1:3)
n  n1  n2  alpha  sided  beta  Power  sigma0  sigma1  p1  p2  delta0  p10  p20 

1,500  750  750  0.0250  1.00  0.166  0.834  0.661  0.660  0.150  0.100  0  0.125  0.125 
1,500  750  750  0.0250  1.00  0.336  0.664  0.650  0.649  0.140  0.100  0  0.120  0.120 
We can compute power for sample size for a fixed sample size example with nSurv()
for a timetoevent endpoint.
# Power is computed when beta is not input
nSurv(
# Control group failure rate (exponential, median 6)
lambdaC = log(2) / 6,
# Underlying hazard ratio (experimental/control)
hr = 0.5,
# Dropout rate (exponential, median 40)
eta = log(2) / 40,
# Entry rate per time unit (e.g., month)
gamma = 10,
# Duration of enrollment
R = 24,
# Duration of trial
T = 36,
# Minimum followup
minfup = 12,
# Type II error (`NULL` means it will be computed)
beta = NULL,
# 1sided Type I error
alpha = 0.025
)
#> Fixed design, twoarm trial with timetoevent
#> outcome (Lachin and Foulkes, 1986).
#> Solving for: Power
#> Hazard ratio H1/H0=0.5/1
#> Study duration: T=36
#> Accrual duration: 24
#> Min. endofstudy followup: minfup=12
#> Expected events (total, H1): 172.9132
#> Expected sample size (total): 240
#> Accrual rates:
#> Stratum 1
#> 024 10
#> Control event rates (H1):
#> Stratum 1
#> 0Inf 0.1155
#> Censoring rates:
#> Stratum 1
#> 0Inf 0.0173
#> Power: 100*(1beta)=99.5431%
#> Type I error (1sided): 100*alpha=2.5%
#> Equal randomization: ratio=1
When we use beta = 0.1
, the enrollment rate is adjusted to generate the desired 90% power.
# Sample size is computed when beta is input
nSurv(
lambdaC = log(2) / 6,
hr = 0.5,
eta = log(2) / 40,
gamma = 10, T = 36,
minfup = 12,
beta = 0.1
)
#> Fixed design, twoarm trial with timetoevent
#> outcome (Lachin and Foulkes, 1986).
#> Solving for: Accrual rate
#> Hazard ratio H1/H0=0.5/1
#> Study duration: T=36
#> Accrual duration: 24
#> Min. endofstudy followup: minfup=12
#> Expected events (total, H1): 86.3258
#> Expected sample size (total): 119.8184
#> Accrual rates:
#> Stratum 1
#> 024 4.9924
#> Control event rates (H1):
#> Stratum 1
#> 0Inf 0.1155
#> Censoring rates:
#> Stratum 1
#> 0Inf 0.0173
#> Power: 100*(1beta)=90%
#> Type I error (1sided): 100*alpha=2.5%
#> Equal randomization: ratio=1
9.3 Design and analysis based on normal outcomes
We demonstrate the basic summary and plotting functions from the gsDesign package for normal outcomes. These can easily be seen for all endpoint types by looking at the report code generated for designs. We assume equal sample sizes in the two groups and compute the total sample size with both arms combined:
n < nNormal(delta1 = 2, sd = 4, alpha = 0.025, beta = 0.1)
n
#> [1] 168.1188
We would round this up to 170 if actually running a trial with a fixed design. However, for extension to a group sequential design, we convert the unrounded fixed design sample size to a group sequential design with a single interim analysis halfway through. We use many defaults here. See if you can reproduce this design in the web interface; the command summary(x)
will give you critical information needed for this.
# Convert to integer sample size
x < gsDesign(k = 2, n.fix = n, delta1 = 2) > toInteger()
gsBoundSummary(x)
#> Analysis Value Efficacy Futility
#> IA 1: 50% Z 2.7500 0.4167
#> N: 88 p (1sided) 0.0030 0.3385
#> ~delta at bound 2.3452 0.3553
#> P(Cross) if delta=0 0.0030 0.6615
#> P(Cross) if delta=2 0.3428 0.0269
#> Final Z 1.9811 1.9811
#> N: 176 p (1sided) 0.0238 0.0238
#> ~delta at bound 1.1947 1.1947
#> P(Cross) if delta=0 0.0239 0.9761
#> P(Cross) if delta=2 0.9009 0.0991
Here is the standard textual summary:
summary(x)
#> Asymmetric twosided group sequential design with
#> nonbinding futility bound, 2 analyses, sample size 176, 90
#> percent power, 2.5 percent (1sided) Type I error. Efficacy
#> bounds derived using a HwangShihDeCani spending function
#> with gamma = 4. Futility bounds derived using a
#> HwangShihDeCani spending function with gamma = 2.
In this case, most arguments were set as defaults; if you look at an R Markdown report from the web interface, arguments are spelled out in detail. Once you have reproduced the above design, look at the code tab in the interface when looking at different types of plots to see how you can generate these plots. For instance, to produce a plot of spending functions (Figure 9.4), enter the following code:
plot(
x,
plottype = 5,
xlab = "Proportion of total sample size",
ylab = "Proportion of spending"
)
This code demonstrates how you can change plot types with plottype
, as well as how you can change the default labels for the x and yaxes.
We look at the “standardized effect size” for which the trial is powered and note where this comes from. The variable x$delta
contains the standardized effect size. The variable x$theta
contains the null treatment effect of 0 as well as the standardized effect size:
x$delta
#> [1] 0.25
x$theta
#> [1] 0.00 0.25
For a normal outcome, the standardized effect size can be derived using the treatment difference and withingroup standard deviation. We are looking for a value theta
that is a constant times the difference in means for which the trial is powered. The value is such that the estimate of the standardized treatment effect has variance \(1/n\) when the total sample size (both arms combined) is \(n\).
# Natural parameter: difference in means
delta1 < 2
# Within group standard deviation
sd < 4
# Standardized treatment effect
# `2 * sd` is for equal randomization!
delta1 / (2 * sd)
#> [1] 0.25
9.4 Updating a design with normal outcomes
Often analyses do not have exactly the originallyplanned sample size or number of events performed. If the numbers are approximately as planned, this should not be an issue. However, the spending function approach taken by the gsDesign package allows you to adjust boundaries to appropriately control Type I error based on the work originally presented by Lan and DeMets (1983).
We assume actual interim analysis is performed after 100 patients instead of the planned 88. We leave the final analysis as planned at 176. To update, we make another call to gsDesign()
. Defaults here set Type I and Type II error (alpha
, beta
), spending functions (sfu
, sfl
) and corresponding spending function parameters (sfupar
, sflpar
) to be the same as the original design. If any of these were changed in the call to gsDesign()
that provided the original design, the same parameters should be entered here; this is done in detail when you produce an R Markdown report from the web interface. We enter k
, the updated design number of analyses, n.fix
, the original fixed design sample size, maxn.IPlan
, the originally planned maximum sample size for the group sequential design, and n.I
, the actual sample sizes at analyses. You can see that the bounds have been updated slightly from the original design. We use the argument exclude = NULL
to display all available boundary characteristics available in gsBoundSummary()
.
xa < gsDesign(
k = 2,
n.fix = n,
maxn.IPlan = x$n.I[x$k],
n.I = c(100, 176),
delta1 = x$delta1
)
gsBoundSummary(xa, exclude = NULL)
#> Analysis Value Efficacy Futility
#> IA 1: 57% Z 2.6470 0.6631
#> N: 100 p (1sided) 0.0041 0.2536
#> ~delta at bound 2.1176 0.5305
#> Spending 0.0041 0.0331
#> Bvalue 1.9952 0.4998
#> CP 0.9899 0.0461
#> CP H1 0.9859 0.4673
#> PP 0.9592 0.1025
#> P(Cross) if delta=0 0.0041 0.7464
#> P(Cross) if delta=2 0.4416 0.0331
#> Final Z 1.9860 1.9826
#> N: 176 p (1sided) 0.0235 0.0237
#> ~delta at bound 1.1976 1.1955
#> Spending 0.0209 0.0669
#> Bvalue 1.9860 1.9826
#> P(Cross) if delta=0 0.0237 0.9761
#> P(Cross) if delta=2 0.8995 0.1000
We will recreate many of these summaries and discuss them further later in the chapter. For test statistics at bounds, CP
(conditional assuming observed treatment effect), CP H1
(conditional power assuming alternate hypothesis treatment effect), and PP
(predictive power based on updating a prior distribution for treatment effect) are possibly interesting for interim futility bounds, but otherwise not of great interest. P(Cross) if delta=0
is the cumulative probability under the null hypothesis of crossing a bound at or before a given analysis. In the Efficacy
column, this is related to the Spending
row which is the incremental probability of crossing an efficacy bound. However, you see the Spending
column totals \(0.025\) (\(0.0041 + 0.0209\)) while the cumulative P(Cross) if delta=0
is \(0.0237\). This is because for the default of nonbinding futility bounds (test.type = 4
in gsDesign()
), Spending
includes the possibility of crossing the futility bound at the interim analysis and later crossing the efficacy boundary at the final analysis, whereas the \(0.0237\) does not count the probability. As noted previously, the Spending
represents the preferred way of computing Type I error given that it allows continuation of a trial without inflating Type I error when a futility bound is crossed. All other spending and boundary crossing calculations in the table do not count the probability of crossing one boundary and later crossing another.
The function gsDelta()
translates a \(Z\)statistic to an approximate treatment effect:
gsDelta(
# Zvalue test statistic
xa$upper$bound[1],
# Analysis
i = 1,
# Group sequential design
x = xa
)
#> [1] 2.117585
This matches the value of ~delta at bound
(approximate treatment effect required to cross a bound) for the first efficacy bound in the table above. This is based on design assumptions. If translating an observed \(Z\)value, this is just an approximation and will differ from the computed observed treatment effect. For timetoevent endpoints, see also gsHR()
.
9.4.1 Testing, analysis, and repeated confidence interval estimation
Given the above data, we can compute a \(Z\)statistic testing the null hypothesis that the difference in means is 0 by dividing the observed mean difference by its estimated standard error. The following example assumes the observed difference in treatment means (natural parameter) at IA 1 is \(\hat\delta = 1.4\) and the pooled standard deviation is \(5.2\).
# We use updated design
npergroup < ceiling(xa$n.I[1] / 2)
# Assume pooled standard deviation at IA 1
sdhat1 < 5.2
# Compute standard error for estimated
# difference in means
sedelta1 < sqrt(sdhat1^2 * 2 / npergroup)
# Assume observed difference in treatment means
deltahat1 < 1.4
# Compute interim Zstatistic
z1 < deltahat1 / sedelta1
z1
#> [1] 1.346154
# Compute standardized treatment effect for observed difference
thetahat < deltahat1 / (2 * sdhat1)
thetahat
#> [1] 0.1346154
We see that \(z1 = 1.35\) is between the futility and efficacy bounds at the interim analysis in spite of having a positive trend.
c(xa$lower$bound[1], xa$upper$bound[1])
#> [1] 0.663069 2.646981
Looking at gsDelta()
for the given \(Z\)statistic, we see the approximate treatment effect computed under design assumptions is different from the observed treatment effect of 1.2; this is due to the difference in the design and observed standard deviation.
gsDelta(
# Observed Zstatistic
z = z1,
# Analysis
i = 1,
# Updated design
x = xa
)
#> [1] 1.076923
We use the upper \(Z\)bound from the updated design to compute a repeated confidence interval for the treatment effect. Repeated confidence intervals are defined as a sequence of intervals for each analysis for which a simultaneous coverage probability is maintained; see Jennison and Turnbull (2000). We see that
deltahat1 + sedelta1 * c(1, 1) * xa$upper$bound[1]
#> [1] 1.35286 4.15286
These can be done at any planned interim, including planned interims and the final analysis after a boundary has been crossed. You can see that the confidence interval is quite wide at this point, containing both the null hypothesis that the difference in means is 0 and the alternative hypothesis that the difference is 2.
9.5 Conditional power, predictive power, and probability of success
In this section, we cover topics related to predicting what the outcome of a trial will be, either from the start of the trial (probability of success) or at the time of an interim analysis (conditional power, predictive power, conditional probability of success).
9.5.1 Conditional and predictive power
We begin with computing conditional power given an interim result in a trial. We computed the interim normal test statistic \(z1 = 1.3461538\) for the trial we designed with normal outcomes. Now we wish to compute the conditional power for the trial at the final analysis given this interim outcome. The functions gsCP()
and gsCPz()
perform this calculation.
gsCPz(
# Interim Zvalue
z = z1,
# Analysis number,
i = 1,
# Updated design
x = xa
)
#> [1] 0.3803728
This value conditions on the observed approximate treatment effect computed using gsDelta()
above. If we condition on the H1 treatment effect, we get a higher value:
gsCPz(
# Interim Zvalue
z = z1,
# Analysis number,
i = 1,
# Standardized treatment effect of design
theta = xa$delta,
# Updated design
x = xa
)
#> [1] 0.7584727
If we condition on no treatment effect, we get the conditional error which could be used to adapt the design (Müller and Schäfer 2004):
gsCPz(
# Interim Zvalue a
z = z1,
# Analysis number,
i = 1,
# No treatment effect
theta = 0,
# Updated design
x = xa
)
#> [1] 0.069697
We demonstrate a conditional power plot that may be of some use. We will assume a wide range of potential underlying hazard ratios for future events.
delta1 < seq(0.5, 4, 0.1)
We compute conditional probabilities based on the observed interim 1 \(Z\)value over this range:
cp < gsCP(
# Updated design
x = xa,
# Analysis
i = 1,
# Interim Zvalue
zi = z1,
# Standardized treatment effects considered
theta = xa$delta / xa$delta1 * delta1
)
We plot conditional power as a function of future observation underlying difference in means (Figure 9.5). Note the use of the power plot option for the gsDesign plot function. The offset = 1
argument changes the legend to be labeled with “Future Analysis” 2 and 3. The solid black line shows the conditional probability of crossing any future bound prior to crossing a lower bound. More lines would be shown if there were more than one future analysis. In general, black lines on this plot will show the cumulative conditional probability of crossing an efficacy bound prior to crossing a lower bound by the time of any given future analysis by different underlying treatment effect assumptions. The red lines that would be shown if more than one analysis were remaining and would show 1 minus the cumulative conditional probability of crossing a lower bound prior to crossing an efficacy bound by any given future analysis by different underlying treatment effect (HR) assumptions. In order to visualize the red lines, you can see a variation of this in the interface for power at the time of design by selecting the power plot.
plot(
cp,
xval = delta1,
xlab = "Future Difference in Means",
ylab = "Conditional Power/Error",
main = "Conditional probability of crossing future bound",
offset = 1
)
The gsCP()
function produces another group sequential design, in this case with only one analysis.
gsCP(
# Updated design
x = xa,
# Analysis number
i = 1,
# Interim Zvalue
z = z1
)
#> Lower bounds Upper bounds
#> Analysis N Z Nominal p Z Nominal p
#> 1 76 1.47 0.9296 1.48 0.0697
#>
#> Boundary crossing probabilities and expected sample size assume
#> any cross stops the trial
#>
#> Upper boundary
#> Analysis
#> Theta 1 Total E{N}
#> 0.1346 0.3804 0.3804 76
#> 0.0000 0.0697 0.0697 76
#> 0.2500 0.7585 0.7585 76
#>
#> Lower boundary
#> Analysis
#> Theta 1 Total
#> 0.1346 0.6177 0.6177
#> 0.0000 0.9296 0.9296
#> 0.2500 0.2399 0.2399
The top table tells us that if we computed a \(Z\)statistic only for the post interim analyses and achieved a nominal \(p = 0.0697\), we would have a positive result by the method of Müller and Schäfer (2004). Normally, we would just compute based on the combined and use the bound from the updated design.
The relevant table in the above output table produced by the above gsCP()
call is the upper boundary table which provides the power of a positive trial given the input interim \(Z\)value assuming:
 The interim estimated standardized treatment effect
theta = 0.1346
with a conditional power of0.3804
.  No treatment effect
theta = 0
; in this case, the “conditional power” is actually the conditional error or Type I error of a positive result,0.0697
, since there is no treatment effect.  The originally targeted standardized effect size
theta = 0.25
with a conditional power of0.7585
. The first of these is what people usually mean by conditional power.
From the repeated confidence interval above, we see that there is a great deal of uncertainty about what the actual treatment effect is at the time of the interim analysis. To account for this uncertainty in predicting the outcome of the trial at the end of the trial, we compute predictive power. This requires assuming a prior distribution for the treatment effect at the beginning of the trial, updating this prior based on the interim treatment result, and averaging the conditional power of a positive finding across this updated prior (i.e., posterior) distribution. The calculation can be done with different prior distributions to evaluate the sensitivity of the result to the prior distribution assumed. Here, we assume a prior distribution for the standardized treatment effect that is “slightly negative” in that we assume the mean of the prior is 0, but the standard deviation for the prior is large so that the predictive power is impacted mainly on the data and not the prior.
prior < normalGrid(mu = 0, sigma = 3 * x$delta)
Now we proceed to compute the predictive power based on this prior and the interim result. As previously noted, the prior is updated by the interim result in this computation.
# Prior and interim result
pp < gsPP(
# Updated design
x = xa,
# Observed result
zi = z1,
# First interim
i = 1,
# Prior standardized effect size values
theta = prior$z,
# Weights reflection prior distribution
wgts = prior$wgts
)
pp
#> [1] 0.4028769
We see that the resulting predictive probability is similar to the conditional power based on the observed outcome (0.4) predictive power vs. 0.38 conditional power. This does not change much if the prior is centered around half the design treatment effect yielding a predictive power of 0.41.
9.5.2 Probability of success
Assuming the same prior distribution as above, we can compute the probability of success of the trial before the trial starts as the probability of a positive trial averaged over the prior distribution. A more informative prior may be useful.
gsPOS(x = x, theta = prior$z, wgts = prior$wgts)
#> [1] 0.4204578
Now assume you know no bound was crossed, but are blinded to the interim \(Z\)value. We can compute the predictive power based on this degree of uncertainty as follows:
gsCPOS(x = xa, i = 1, theta = prior$z, wgts = prior$wgts)
#> [1] 0.5593224
We see that even with this minimal information about the interim analysis that we can increase our predicted probability of success.
9.6 Sample size adaptation
We continue our example with normal outcomes to demonstrate two methods of sample size adaptation. First, we show how to adapt the sample size and final statistical bound using the conditional power of the trial based on an interim result. Then we demonstrate informationbased design that updates the sample size based on an interim estimate of the standard deviation “nuisance parameter” assumption that was made.
9.6.1 Sample size adaptation based on conditional power
Based on the results of the previous section, the concept of updating the final sample size for a design based on conditional power at an interim analysis has been widely published. While this strategy remains controversial, the methods have improved with time so that they can be relatively efficient.
We reset stage 2 sample size and cutoff to make conditional power for stage 2 equal to originally planned power under the assumption of the original treatment effect. Doing conditional power sample size reestimation requires specifying a combination test to combine evidence from before and after the interim. The options are:

z2NC
: Default; inverse normal combination test of Lehmacher and Wassmer (1999). 
z2Z
: Sufficient statistic for complete data; weight observations before and after interim the same. This requires the lower end of the argumentcpadj
to be 0.5 or greater to apply the method of Chen, DeMets, and Lan (2004). 
z2Fisher
: Fisher combination test.
xupdate < ssrCP(
# Interim test
z1 = z1,
# Updated design
x = xa,
# Original Type II error
beta = x$beta,
# Inverse normal combination test
z2 = z2Z,
# Use conditional power for H1 effect,
# default is `theta = NULL` for CP based on observed effect.
theta = NULL,
# Adapt if conditional power >= 0.35 and less then 90%,
# default is 0.5 to 1.
cpadj = c(0.35, 1  x$beta),
# Increase overall sample size to at most 1.5x
maxinc = 1.5
)
# Reestimated stage 2 sample size total
ceiling(xupdate$dat$n2)
#> [1] 264
# Zstatistic required for stage 2 data only
as.numeric(xupdate$dat$z2)
#> [1] 1.472272
Finally, we look at how the sample size might have been adapted based on different values of the interim test statistic \(Z_1\).
xu < ssrCP(
# Updated design
x = xa,
# Hypothesized Zvalues at IA1
z1 = seq(0, 3, 0.001),
# Original Type II error
beta = x$beta,
# Inverse normal combination test
z2 = z2Z,
# Use conditional power for H1 effect,
# default is `theta = NULL` for CP based on observed effect.
theta = NULL,
# Adapt if conditional power >= 0.35 and less then 90%,
# default is 0.5 to 1.
cpadj = c(0.35, 1  x$beta),
# Increase overall sample size to at most 1.5x
maxinc = 1.5
)
plot(xu)
In Figure 9.6, you can see horizontal lines at sample size \(N = 100\) where \(Z_1\) is above the upper bound or below the lower bound at the interim analysis; in this case, we have just used the original group sequential design. We limit the sample size adaptation to a “promising” set of \(Z_1\) values. Above the lower horizontal lines are horizontal lines at \(N = 176\) where sample size is not adapted. For the smaller \(Z_1\) values, no adaptation is done because the results are not promising enough. For the larger \(Z_1\) values, the conditional power is at least the originallyplanned 90% with the originallyplanned final sample size. The sloped portion of the line denotes a region where the sample size is adapted to achieve 90% conditional power assuming the treatment effect observed at the interim analysis; a common complaint about this is that the interim treatment effect can be backcalculated if the adaptation algorithm is known in detail. The short horizontal line at \(N = 264\) is a region where the results are promising, but increasing the sample size by more than a factor of 1.5x would be required to reach the targeted conditional power.
Now examine the three xaxes at the bottom of the graph.
There is a direct translation of \(Z_1\) values to conditional power (CP) and, asymptotically, to the observed interim effect size relative to the effect size for which the trial is powered (\(\hat{\theta}/\theta_1\)). Thus, the sample size adaptation can be considered to be based on any one of these values. The Chen, DeMets, and Lan (2004) method does not adapt until conditional power is at least 0.5 which and thus the lower value for cadj
is recommended to be 0.5 when z2 = z2Z
(final test is unweighted).
9.6.2 Informationbased sample size adaptation
We consider sample size adaptation based on estimating the variance of the treatment effect at the interim analysis. Note that statistical information is one over the variance of the treatment effect estimate. An informationbased design examines statistical information instead of sample size at each analysis. We plan the analysis based on the treatment effect that we wish to detect, which was a difference of 2 in the means of the two treatment groups in this case (delta = delta1 = 2
below).
xI < gsDesign(k = 2, delta = 2, delta1 = 2)
gsBoundSummary(
xI,
# Label for "sample size"
Nname = "Information",
# Digits for information on the left
tdigits = 2
)
#> Analysis Value Efficacy Futility
#> IA 1: 50% Z 2.7500 0.4122
#> Information: 1.37 p (1sided) 0.0030 0.3401
#> ~delta at bound 2.3496 0.3522
#> P(Cross) if delta=0 0.0030 0.6599
#> P(Cross) if delta=2 0.3412 0.0269
#> Final Z 1.9811 1.9811
#> Information: 2.74 p (1sided) 0.0238 0.0238
#> ~delta at bound 1.1969 1.1969
#> P(Cross) if delta=0 0.0239 0.9761
#> P(Cross) if delta=2 0.9000 0.1000
Recall that we previously computed the standard error of the interim estimate of the difference of means and stored it in sedelta1
. The inverse of the square of this is, thus, the statistical information at the interim analysis.
# Within group estimated standard deviation
sdhat1
#> [1] 5.2
# Sample size per group
npergroup < xa$n.I[1] / 2
# Compute standard error for estimated difference in means
sedelta1 < sqrt(sdhat1^2 * 2 / npergroup)
info1 < 1 / sedelta1^2
info1
#> [1] 0.9245562
This is less than was planned for this case:
xI$n.I[1]
#> [1] 1.369775
We began using the sample sizes provided in the updated design stored in xa
. Using the interim sample size there and the ratio of information observed at the interim compared to what is required at the final analysis, we can estimate the final sample size required to get final planned information. Note that we are getting an integer number per group and doubling from the original design:
# Proportion of targeted information at IA
t_ia < info1 / xI$n.I[2]
t_ia
#> [1] 0.3374847
Now we increase final sample size per group estimated to get final information fraction of 1.
# Total planned adapted sample size
# Round up within each group
n2 < 2 * ceiling(96 / t_ia / 2)
n2
#> [1] 286
Now we move on to the final analysis and suppose final within group standard deviation observed is 5; based on this we compute the final statistical information observed:
# Sample size per group
npergroup < n2 / 2
# Within group standard deviation at final
sdhat2 < 5
# Compute standard error for estimated difference in means
sedelta2 < sqrt(sdhat2^2 * 2 / npergroup)
info2 < 1 / sedelta2^2
info2
#> [1] 2.86
Based on the interim and final statistical information, we update the planned informationbased design. Note that we could have done this calculation at the interim analysis based on an assumption for the final statistical information; the interim updated bounds would be the same as those computed here:
xIBD < gsDesign(
k = 2,
# Targeted effect size
delta = 2, delta1 = 2,
# Max planned information
maxn.IPlan = xI$n.I[2],
n.I = c(info1, info2),
)
gsBoundSummary(
xIBD,
# Label for "sample size"
Nname = "Information",
# Digits for information on the left
tdigits = 2
)
#> Analysis Value Efficacy Futility
#> IA 1: 34% Z 3.0039 0.2447
#> Information: 0.92 p (1sided) 0.0013 0.5967
#> ~delta at bound 3.1241 0.2545
#> P(Cross) if delta=0 0.0013 0.4033
#> P(Cross) if delta=2 0.1399 0.0151
#> Final Z 1.9727 1.9727
#> Information: 2.86 p (1sided) 0.0243 0.0243
#> ~delta at bound 1.1665 1.1665
#> P(Cross) if delta=0 0.0243 0.9757
#> P(Cross) if delta=2 0.9137 0.0863
Now we compute the final \(Z\)statistic based on assuming an observed treatment difference of 1.5:
z2 < 1.5 * sqrt(info2)
z2
#> [1] 2.53673
Since 2.54 is greater than the adjusted final efficacy bound of 1.97 computed just above (in xIBD$upper$bound
), the null hypothesis can be rejected.
9.7 Informationbased adaptation for a binomial design
9.7.1 Overview
The example we will develop here is a trial with a binomial outcome where the control group event rate is not known, and we wish to power the trial for a particular relative risk regardless of the control rate.
This is, by far, the most complex section of the book. It may be skipped by most, but may provide a useful example for others.
9.7.2 Basic strategy
Informationbased design can be done blinded or unblinded, but there may be some preference from regulators that it be done on a blinded basis to ensure that those adapting, presumably a statistician from the trial sponsor, will not be taking into account treatment effect when updating the sample size. Even if somebody independent of the sponsor does the adaptation, there may be some possibility for backcalculation of the treatment effect observed if unblinded sample size adaptation is used.
The basic design strategy for adapting the sample size is as follows:
 Design the study under an assumed value for the nuisance parameter.
 At each interim analysis, compute an estimate of the nuisance parameter and update the statistical information, study bounds and sample size appropriately.
Following is the specific sample size adaptation strategy at each analysis, slightly modified from Zhou et al. (2014):
 If current sample size is enough, meaning the planned final statistical information has been acquired, stop the trial regardless of whether efficacy or futility bound is crossed.
 Otherwise, if overrun of patients enrolled between the analysis cutoff and performance of the analysis is enough to provide the planned final statistical information, stop enrolling more patients but keep collecting data for enrolled patients.
 Otherwise, if the next planned sample size is enough, stop enrollment when that sample size is achieved.
 Otherwise, if the next planned sample size is not enough but the overrun likely at the next interim would provide more than the planned final statistical information, stop at the estimated overrun. This will normally provide a little more than the planned information to be a bit conservative.
 Otherwise, if the next planned sample size is not enough, but the original maximum sample size is, continue enrollment to the next planned interim.
 Otherwise, if the original maximum sample size is not enough, use the updated maximum sample size but with an upper limit depending on practical aspects of certain trials (e.g., two times the originally planned maximum sample size) and continue enrollment to the next analysis.
9.7.3 The design and planned information
The design is loosely based on the CAPTURE study (The CAPTURE Investigators 1997) where abciximab was studied to reduce important cardiovascular outcomes following balloon angioplasty. The design assumed a control event rate of 15% and was powered at 80% to detect a 1/3 reduction to a 10% event rate in the experimental group. For this example, we assume 2.5%, onesided Type I error and use default HwangShihDeCani bounds with \(\gamma = 8\) for efficacy and \(\gamma = 4\) for futility. Following is the code that is generated by the web interface for this trial:
# Derive binomial fixed design
n < nBinomial(
p1 = 0.15,
p2 = 0.1,
delta0 = 0,
alpha = 0.025,
beta = 0.2,
ratio = 1
)
# Derive group sequential design
x < gsDesign(
k = 4,
test.type = 4,
alpha = 0.025,
beta = 0.2,
timing = c(1),
sfu = sfHSD,
sfupar = c(8),
sfl = sfHSD,
sflpar = c(4),
delta = 0,
delta1 = 0.05,
delta0 = 0,
endpoint = "binomial",
n.fix = n
)
Figure 9.7 shows the resulting design with a planned sample size of 1469 and 3 equallyspaced interim analyses.
plot(x)
As noted, we wish to be able to power the trial for a relative risk of 2/3, e.g., to a 10% experimental group event rate when the control rate is 15% or an 8% experimental group event rate when the control rate is 12%. For relative risk, the logarithm of the relative risk is used as the parameter of interest: \(\delta = \log(2/3)\). We used minus the logarithm so that \(\delta\) is positive. If we used the same design parameters, but different control rates, we would get sample sizes as follows:
# Compute ratio of group sequential to
# fixed design sample size above
Inflate < x$n.I[4] / n
# Compute fixed design sample sizes for different control rates
p1 < seq(0.08, 0.2, 0.01)
nalt < nBinomial(
p1 = p1,
p2 = p1 * 2 / 3,
alpha = 0.025,
beta = 0.2
)
df < data.frame(x = p1, y = nalt * Inflate)
# Plot sample sizes using inflation factor and ggplot2
library(ggplot2)
ggplot(df, aes(x = x, y = y)) +
geom_line() +
xlab("Control event rate") +
ylab("Sample size required")
Thus, you can see in Figure 9.8 that our sample size can vary from 1048 to almost 2940 depending on the value of the nuisance parameter. We will initially plan to do an analysis of the first 368 patients for interim 1 and adapt from there.
In the web interface, we can get the informationbased design by entering \(\delta\) = \(\log(2/3) = 0.4054651\) into the “Treatment effect” control (Figure 9.9).
Deleting a couple of unnecessary parameters from the code generated, we rederive a design with the same bounds, but on the information scale. x$n.I
shows the resulting planned statistical information at each analysis.
# Derive group sequential design
x < gsDesign(
k = 4,
test.type = 4,
alpha = 0.025,
beta = 0.2,
timing = 1,
sfu = sfHSD,
sfupar = 8,
sfl = sfHSD,
sflpar = 2,
delta = 0.4054651,
delta1 = 0.4054651,
delta0 = 0,
endpoint = "info"
)
x$n.I
#> [1] 12.77867 25.55734 38.33600 51.11467
The plot for the bounds is now shown in Figure 9.10.
plot(x)
9.7.4 Sample size adaptation
We will give an example of how the adaptation would have proceeded. We use a large control event rate which leads to a smaller sample size and keeps the example short. Following the basic strategy we have laid out we assume the first two analyses result in continuing the trial and adapting bounds. We assume 368 are patients included in the first interim with an equal number of 184 patients in each arm. On a blinded basis, we assume 67 events (18.2%) at interim 1. We can compute the estimated statistical information based on blinded data using the function varBinomial()
as follows:
1 / varBinomial(
x = 67,
n = 368,
scale = "RR"
)
#> [1] 20.47841
The above information calculation is under the null hypothesis. This is slightly different from under the alternative hypothesis:
1 / varBinomial(
x = 67,
n = 368,
scale = "RR",
delta0 = log(2 / 3)
)
#> [1] 19.48577
Computing under either assumption has been adequate in simulations performed. We will use the alternate hypothesis calculation here which assumes that the observed event rate in the experimental group is 2/3 of that in the control group. The proportion of final planned information we have indicates a final sample size of needed to obtain the planned information is 966, rounding up to an even sample size, based on the following:
# Divide IA1 information by final planned information
relinf < 1 / varBinomial(
x = 67,
n = 368,
scale = "RR",
delta0 = log(2 / 3)
) / x$n.I[4]
368 / relinf
#> [1] 965.3298
This is about halfway between the planned second (\(n = 736\)) and third interims (\(n = 1102\)). Assuming an overrun of about 280 patients would occur by the time the \(n = 736\) patients were enrolled, we plan to enroll to \(n = 1016 = 736 + 280\) patients for the final analysis. This is slightly more than the estimated \(n = 966\) needed, but is used here to provide a “cushion” to ensure an adequate sample size since the information at the final analysis is estimated based on the overall event rate remaining unchanged.
We will come back to the interim and final testing bounds momentarily, but assume that no bound is crossed at the interim. We assume that at the final sample size of 1016 that we observe 183 events for an event rate of 18%. The statistical information at this analysis is now computed as
1 / varBinomial(
x = 183,
n = 1016,
scale = "RR",
delta0 = log(2 / 3)
)
#> [1] 53.10206
In spite of having a little lower event rate than at the interim, we still have more than the planned information (\(53.10 > 51.11\)).
We now update the code for the informationbased design above based on the observed information at the two analyses above:
In the above derivation of the updated design in y
, we have just copied arguments for test.type
, alpha
, beta
, delta
, sfu
, sfupar
, sfl
, and sflpar
from what was provided by the web interface for computing the original design stored in x
. Enter ?gsDesign
in the console for more information on what is going on here. The new arguments here are maxn.IPlan
which is the maximum planned sample size from the original design x
and the information at the two analyses provided in the argument n.I
. You can also see ?sfHSD
for documentation of the HwangShihDeCani spending functions used. To see the updated design, we use plot(y)
; note that the information is rounded up here (Figure 9.11):
plot(y)
9.7.5 Testing and confidence intervals
We assume at the first analysis we have 184 patients per treatment group and that the total of 67 events is divided into 40 in the control group and 27 in the experimental group. The test statistic and repeated confidence interval for this difference is computed below. The repeated confidence interval method is described by Jennison and Turnbull (2000) for details. Basically, we use the \(Z\)statistic for the efficacy bound at interim 1 and plug the corresponding confidence level into the function ciBinomial()
. ciBinomial()
computes confidence intervals for 1) the difference between two rates, 2) the riskratio for two rates, or 3) the oddsratio for two rates. This procedure provides inference that is consistent with testBinomial()
in that the confidence intervals are produced by inverting the testing procedures in testBinomial()
. The testing and confidence interval calculations are as described in Miettinen and Nurminen (1985). The Type I error alpha input to ciBinomial()
is always interpreted as 2sided.
testBinomial(
x1 = 40,
n1 = 184,
x2 = 27,
n2 = 184
)
#> [1] 1.756089
level < 2 * pnorm(y$upper$bound[1])
ciBinomial(
alpha = level,
x1 = 27,
n1 = 184,
x2 = 40,
n2 = 184,
scale = "RR"
)
#> lower upper
#> 1 0.3065992 1.467228
This \(Z = 1.76\) is between the efficacy (\(z = 3.54\)) and futility (\(Z = 0.07\)), so the trial continues. At the second, final analysis we assume equal sample sizes of 508 per arm and that the total of 183 events is divided between 108 in the control group and 75 in the experimental group. We compute the test statistic as
testBinomial(
x1 = 108,
n1 = 508,
x2 = 75,
n2 = 508
)
#> [1] 2.694094
which exceeds the final efficacy bound of 1.96, yielding a positive efficacy finding at the final analysis. In this case, the conservative spending functions yielded a final boundary of 1.961 compared to a bound of 1.960 if a fixed design had been used.
9.7.6 Alternate example
In the example above, we have also reduced the overall sample size needed from 1470 under the original assumption of a 15% control group event rate to 1016, a substantial savings. Note that if the overall event rate had been lower, a larger final sample size would have been suggested. Let’s test an example assuming that the low overall event rate is due to a larger than expected treatment effect.
We assume that after 184 per treatment group we have 28 events in the control group (15.2%) and 14 events (7.6%) in the experimental group. After 368 per treatment group we have 58 events in the control group (15.8%) and 27 events (7.3%) in the experimental group. This yields the following adaptation and positive efficacy finding at the second interim analysis:
# Compute information after IA's 1 and 2
info < 1 / c(
varBinomial(
x = 42, n = 368, scale = "RR", delta0 = log(2 / 3)
),
varBinomial(
x = 85, n = 736, scale = "RR", delta0 = log(2 / 3)
)
)
info
#> [1] 11.32031 22.94377
# After IA2, compute estimated maximum sample size needed
relinfo < info[2] / 51.11
nupdate < 736 / relinfo
nupdate
#> [1] 1639.529
# Update design based on observed information at IA1 and
# assumed final information
y < gsDesign(
k = 2,
test.type = 4,
alpha = 0.025,
beta = 0.2,
delta = 0.4054651,
sfu = sfHSD,
sfupar = 8,
sfl = sfHSD,
sflpar = 2,
maxn.IPlan = x$n.I[4],
n.I = info
)
# Compute Zvalues for IA's 1 and 2
testBinomial(
x1 = c(28, 58),
n1 = c(184, 368),
x2 = c(14, 27),
n2 = c(184, 368)
)
#> [1] 2.295189 3.575202
# Compare above to updated IA1 and IA2 efficacy bounds
y$upper$bound
#> [1] 3.938772 3.465813
We see that the updated sample size based on blinded data would be 1662. However, you can see that the second interim \(Z = 3.58\) is greater than the efficacy bound of 3.47 and thus the trial can stop for a positive efficacy finding at that point. Assuming there is an overrun of 280 patients as before, we again end with a final sample size of 1016, a savings compared to the planned maximum sample size. In combination, the two examples show a couple of ways that the trial can adapt to produce savings compared to the planned sample size of 1470.