8  Introduction to R coding

The gsDesign web interface generates code that you may save and reuse in the R programming language. This allows you to reproduce results from the web interface. There are many other capabilities available in R that are not in the web interface.

Topics Covered

  1. Chapter organization
  2. Installing gsDesign and designing and updating a time-to-event trial
  3. Designing a trial with normal outcomes
  4. Testing and confidence intervals
  5. Conditional and predictive power
  6. Updating a design based on conditional power
  7. Converting the normal outcome design to an information-based design
  8. Information-based design for a binary outcome

8.1 Introduction

We cover a lot of topics very briefly in this chapter. Hopefully the examples presented will be of some use.

We begin in the next section with notes on how to install R and the gsDesign package used in this book. This section also shows how to access help files and uses a time-to-event trial de- sign as an example. This includes how to update the design when the event counts at analyses differ from what is planned.

Much of the rest of the chapter is based on an example with normal outcomes. The next two sections show how to derive and then analyze data, respectively. Then we move on to show how to compute conditional and predictive power in order to predict what the outcome at the end of a trial will be based on interim results. Then we move on in the final two sections to show how to update a design based on an interim outcome. First, we show how to update a design based on conditional power. Then we finish with demonstrating how to update sample size based on an information-based approach.

Finally, we demonstrate information-based design for a trial with a binary outcome.

8.2 Some basics

8.2.1 Installing R and gsDesign

In order to use R code to generate the designs in this book, you need to install R software and the gsDesign package. R is available at https://cran.r-project.org/. Once R is installed, you should be able run

install.packages("gsDesign")

which will install not only the gsDesign package, but other packages that it depends on such as the ggplot2 package. Although not necessary, you also may wish to install the RStudio IDE (integrated development environment) for R available at https://posit.co/. All of this software is free.

8.2.2 Running an initial example

Once you have gone through the above installations, you are ready to run some code generated by the web interface. After you have opened the R console or RStudio IDE, you will need to load the gsDesign package with the command

library(gsDesign)

Now go to the “Report” tab in the output panel of the web interface. There you will see 4 blocks of code at the beginning of the file to 1) load the gsDesign package, 2) and 3) enter parameters for the design, and 4) derive the design. Following code blocks show how to get a text summary, a tabular bound summary, a plot of design boundaries and a plot of expected enrollment and endpoint accrual.

The note at the bottom of this code indicates which version of gsDesign and R are run as well as the time the code was generated. Comments (lines starting with the # symbol) tell you what each line of code will do. You will need to run the first line starting with “x <-” in order to derived the design before running the plot, tabular output, or design summary code that summarize the trial exactly as you have seen in the web interface.

Code generated for each endpoint type is different.

All but the “Time-to-event” selection use the gsDesign() function to derive the design. We will demonstrate that along with interim analysis and sample size re-estimation in the following sections.

If you want to know more about the gsDesign(), gsSurv(), nSurv(), nBinomial() or nNormal() functions from the gsDesign package, help files are available with the package or at the documentation web site https://keaven.github.io/gsDesign/index.html. For instance, in the lower-right hand corner of the Posit IDE you could get help for gsDesign as seen below; note that you can scroll down in the interface to see the remainder of the help text. You can also just enter help(gsDesign) in the console window to view the same help text. This chapter is just a brief introduction with examples for some of these and for other functions. The help files should help you understand the functions in more depth. They include descriptions, input and output variable descriptions, examples references and other details.

In order to get help on the plot command, you will need to enter help(plot.gsDesign) or enter plot.gsDesign in the search field in the help tab of the RStudio IDE in the following figure. The plot.gsDesign() command will plot designs produced by either the gsDesign() or, for time-to-event endpoints, gsSurv() commands. When coding, you can use the shorter plot(x) command as the plot() knows to invoke plot.gsDesign() when x is produced by gsDesign() or gsSurv().

Finally, to print a summary of the design you can use the gsBoundSummary(x) to print a tabular summary, summary(x) to get a short textual summary, or just enter x and return to get the summary you can see in the “Text” tab of the web interface. To get the fixed design sample size computed with the command

# Derive Normal Fixed Design
n <- nNormal(
  delta1 = 1,
  delta0 = 0,
  ratio = 1,
  sd = 1,
  sd2 = NULL,
  alpha = 0.025,
  beta = 0.1
)

enter the command n and return; this is equivalent to print(n).

8.2.3 Updating a design with a time-to-event endpoint

This same approach can be taken for the time-to-event design presented earlier, with n.I now representing the number of events at each analysis. We consider the first design presented in this section, which results in the following output indicating analyses after 57, 114, and 296 events:

# Derive Group Sequential Design
x <- gsSurv(
  k = 3,
  test.type = 4,
  alpha = 0.025,
  beta = 0.1,
  timing = c(1),
  sfu = sfLDOF,
  sfupar = c(0),
  sfl = sfLDOF,
  sflpar = c(0),
  lambdaC = log(2) / 6,
  hr = 0.6,
  hr0 = 1,
  eta = 0.01,
  gamma = c(2.5, 5, 7.5, 10),
  R = c(2, 2, 2, 6),
  S = NULL,
  T = 18,
  minfup = 6,
  ratio = 1
)
gsBoundSummary(x)
#>     Analysis              Value Efficacy Futility
#>    IA 1: 33%                  Z   3.7103  -0.6946
#>       N: 218        p (1-sided)   0.0001   0.7563
#>   Events: 57       ~HR at bound   0.3735   1.2024
#>    Month: 10   P(Cross) if HR=1   0.0001   0.2437
#>              P(Cross) if HR=0.6   0.0372   0.0044
#>    IA 2: 67%                  Z   2.5114   1.0024
#>       N: 296        p (1-sided)   0.0060   0.1581
#>  Events: 114       ~HR at bound   0.6242   0.8285
#>    Month: 13   P(Cross) if HR=1   0.0060   0.8434
#>              P(Cross) if HR=0.6   0.5845   0.0440
#>        Final                  Z   1.9930   1.9930
#>       N: 296        p (1-sided)   0.0231   0.0231
#>  Events: 171       ~HR at bound   0.7368   0.7368
#>    Month: 18   P(Cross) if HR=1   0.0233   0.9767
#>              P(Cross) if HR=0.6   0.9000   0.1000

Now suppose the actual interim analyses occur after 85 and 180 events. We adapt the design as follows. This results in the following updated design:

# Assuming actual analyses occurred after
# 85 and 180 events, reset bounds
y <- gsDesign(
  k = 2,
  n.I = c(85, 180),
  test.type = x$test.type,
  n.fix = n,
  sfu = x$upper$sf,
  sfupar = x$upper$param,
  sfl = x$lower$sf,
  sflpar = x$lower$param,
  maxn.IPlan = x$n.I[x$k]
)
gsBoundSummary(y)
#>   Analysis               Value Efficacy Futility
#>  IA 1: 50%                   Z   2.9660   2.5536
#>      N: 85         p (1-sided)   0.0015   0.0053
#>                ~delta at bound   0.6434   0.5540
#>            P(Cross) if delta=0   0.0015   0.9947
#>            P(Cross) if delta=1   0.9499   0.0199
#>      Final                   Z   1.9694   1.9694
#>     N: 180         p (1-sided)   0.0245   0.0245
#>                ~delta at bound   0.2936   0.2936
#>            P(Cross) if delta=0   0.0032   0.9968
#>            P(Cross) if delta=1   0.9801   0.0199

8.3 Design and analysis based on normal outcomes

We begin by loading the gsDesign package and then compute a fixed design sample size for comparing two normal means with a difference of 2 and within group standard deviation of 4. We assume equal sample sizes in the two groups and compute the total sample size with both arms combined:

library(gsDesign)

n <- nNormal(delta1 = 2, sd = 4, alpha = .025, beta = .1)
n
#> [1] 168.1188

We would round this up to 170 if actually running a trial with a fixed design. However, for extension to a group sequential design we convert the unrounded fixed design sample size to a group sequential design with a single interim analysis half way through. We use many defaults here. After running this code, try the commands summary(x) and gsBoundSummary(x) to examine further properties. See if you can reproduce this design in the web interface; the command summary(x) will give you critical information needed for this.

x <- gsDesign(k = 2, n.fix = n, delta1 = 2)

Once you have reproduced the above design, look at the code tab in the interface when looking at different types of plots to see how you can generate these plots. For instance, to produce a plot of spending functions, enter the following code:

plot(
  x,
  plottype = 5,
  xlab = "Proportion of total sample size",
  ylab = "Proportion of spending"
)

This code demonstrates how you can change plot types with plottype, as well as how you can change from default labels for the x- and y-axes.

Now we can print out a summary of the design as follows:

gsBoundSummary(x)
#>   Analysis               Value Efficacy Futility
#>  IA 1: 50%                   Z   2.7500   0.4122
#>      N: 88         p (1-sided)   0.0030   0.3401
#>                ~delta at bound   2.3496   0.3522
#>            P(Cross) if delta=0   0.0030   0.6599
#>            P(Cross) if delta=2   0.3412   0.0269
#>      Final                   Z   1.9811   1.9811
#>     N: 176         p (1-sided)   0.0238   0.0238
#>                ~delta at bound   1.1969   1.1969
#>            P(Cross) if delta=0   0.0239   0.9761
#>            P(Cross) if delta=2   0.9000   0.1000

Now we print a textual summary of the design. This provides information about Type I error, power and the spending functions used to determine the bounds.

cat(summary(x))
#> Asymmetric two-sided group sequential design with non-binding futility bound, 2 analyses, sample size 176, 90 percent power, 2.5 percent (1-sided) Type I error. Efficacy bounds derived using a Hwang-Shih-DeCani spending function with gamma = -4. Futility bounds derived using a Hwang-Shih-DeCani spending function with gamma = -2.

Next we look at the ‘standardized effect size’ for which the trial is powered and note where this comes from. The variable x$delta contains the standardized effect size. The variable x$theta contains the null treatment effect of 0 as well as the standardized effect size:

x$delta
#> [1] 0.25
x$theta
#> [1] 0.00 0.25

This can be derived using the treatment difference and within-group standard deviation. We are looking for a value theta that is a constant times the difference in means for which the trial is powered. The value is such that the estimate of the standardized treatment effect has variance \(1/n\) when the total sample size (both arms combined) is \(n\).

delta1 <- 2
sd <- 4
delta1 / (2 * sd)
#> [1] 0.25

8.3.1 Testing and repeated confidence interval estimation

Given the above data, we can compute a \(Z\)-statistic testing the null hypothesis that the difference in means is 0 by dividing the observed mean difference by its estimated standard error.

# Now assume we performed interim as planned,
# rounding up interim sample size to get an
# integer size per group
npergroup <- ceiling(x$n.I[1] / 2)

# Assume pooled standard deviation at IA 1
sdhat1 <- 4.2

# Compute standard error for estimated
# difference in means
sedelta1 <- sqrt(sdhat1^2 * 2 / npergroup)

# Assume observed treatment difference
deltahat1 <- 1.4
# Compute interim Z-statistic
z1 <- deltahat1 / sedelta1
z1
#> [1] 1.563472

We see that this is between the bounds at the interim analysis:

c(x$lower$bound[1], x$upper$bound[1])
#> [1] 0.4122102 2.7499659

Since no bound has been crossed, the trial should continue to the final analysis. Next, we use the upper \(Z\)-bound to compute a repeated confidence interval for the treatment effect:

deltahat1 + sedelta1 * c(-1, 1) * x$upper$bound[1]
#> [1] -1.062438  3.862438

You can see that the confidence interval is quite wide at this point, containing both the null hypothesis that the difference in means is 0 and the alternative hypothesis that the difference is 2.

8.4 Updating a design

Often analyses do not have exactly the originally-planned sample size or number of events performed. If the numbers are approximately as planned, this should not be an issue. However, the spending function approach taken by the gsDesign package allows you to adjust boundaries to appropriately control Type I error based on the work originally presented by Gordon Lan and DeMets (1983).

We assume actual analyses are performed after 100 and 196 patients instead of the planned 88 and 176. To update, we make another call to gsDesign. Defaults here set Type I and II error (alpha, beta), spending functions (sfu, sfl) and corresponding spending function parameters (sfu, sfl) to be the same as the original design. If any of these were changed in the call to gsDesign that provided the original design, the same parameters should be entered here. We also enter k, the updated design number of analyses, n.fix, the original fixed design sample size, maxn.IPlan, the originally planned maximum sample size for the group sequential design, and n.I, the actual sample sizes at analyses. You can see that the bounds have been updated slightly from the original design.

xa <- gsDesign(k = 2, n.fix = n, maxn.IPlan = x$n.I[x$k], n.I = c(100, 196))
gsBoundSummary(xa)
#>   Analysis               Value Efficacy Futility
#>  IA 1: 57%                   Z   2.6437   0.6659
#>     N: 100         p (1-sided)   0.0041   0.2527
#>                ~delta at bound   1.0575   0.2664
#>            P(Cross) if delta=0   0.0041   0.7473
#>            P(Cross) if delta=1   0.4429   0.0333
#>      Final                   Z   1.9916   1.9916
#>     N: 196         p (1-sided)   0.0232   0.0232
#>                ~delta at bound   0.5690   0.5690
#>            P(Cross) if delta=0   0.0231   0.9769
#>            P(Cross) if delta=1   0.9202   0.0798

8.5 Conditional power, predictive power, and probability of success

In this section, we cover topics related to predicting what the outcome of a trial will be, either from the start of the trial (probability of success) or at the time of an interim analysis (conditional power, predictive power, conditional probability of success, prediction intervals). All of the functions used in this section (and more) are documented in a single help file; the description from this help file is shown in the next column.

8.5.1 Conditional and predictive power

We begin with computing the power or probability of success (predictive power) of a trial given the interim result in a trial. We have previously computed an interim normal test statistic z1 for the trial we designed with normal outcomes. Now we wish to compute the power for the trial at the final analysis given this interim outcome. The function gsCP() performs this calculation. The default is

# Compute conditional power
gsCP(x, i = 1, z = z1)
#>              Lower bounds   Upper bounds
#>   Analysis N   Z   Nominal p  Z   Nominal p
#>          1 88 1.24    0.8922 1.24    0.1078
#> 
#> Boundary crossing probabilities and expected sample size assume
#> any cross stops the trial
#> 
#> Upper boundary
#>           Analysis
#>   Theta      1  Total E{N}
#>   0.167 0.6275 0.6275 87.7
#>   0.000 0.1078 0.1078 87.7
#>   0.250 0.8649 0.8649 87.7
#> 
#> Lower boundary
#>           Analysis
#>   Theta      1  Total
#>   0.167 0.3725 0.3725
#>   0.000 0.8922 0.8922
#>   0.250 0.1351 0.1351

This actually produces another group sequential design. The relevant table in the following output that is produced by the above gsCP() call is the upper boundary table which provides the power of a positive trial given the input interim \(Z\)-value assuming the interim estimated standardized treatment effect (theta = 0.167 with a conditional power of 0.6275), no treatment effect (theta = 0; in this case, the “conditional power” is actually the conditional Type I error of a positive result, 0.1078, since there is no treatment effect), and the originally targeted standardized effect size (theta = 0.25 with a conditional power of 0.8659).

From the repeated confidence interim in the previous section, we saw that there is a great deal of uncertainty about what the actual treatment effect is at the time of the interim analysis. To account for this uncertainty in predicting the outcome of the trial at the end of the trial, we compute predictive power. This requires assuming a prior distribution for the treatment effect at the beginning of the trial, updating this prior based on the interim treatment result, and averaging the conditional power of a positive finding across this updated prior (i.e., posterior) distribution. The calculation can be done with different prior distributions to evaluate the sensitivity of the result to the prior distribution assumed. Here, we assume a prior distribution for the standardized treatment effect that is ‘slightly positive’ in that we assume the mean of the prior is half-way between the null and alternative hypotheses. However, we assume the standard deviation for the prior is the same as the standardized effect size so that there could easily be either a negative treatment effect or a very positive treatment effect.

prior <- normalGrid(mu = 0.75 * x$delta, sigma = x$delta)

Now we proceed to compute the predictive power based on this prior and the interim result. As previously noted, the prior is updated by the interim result in this computation.

# Prior and interim result
gsPP(x = x, zi = z1, i = 1, theta = prior$z, wgts = prior$wgts)
#> [1] 0.6030295

We see that the resulting predictive probability is lower than either of the conditional probability calculations based on the current trend (conditional power of 0.63) or the originally planned effect size (conditional power of 0.86). If we substantially change the prior standard deviation to make the prior less informative by making it 5 times the alternate hypothesis standardized effect size and therefore the computation is more driven by the data, the predictive power is still 0.59.

8.5.2 Probability of success and prediction intervals

Assuming the same prior distribution as above, we can compute the probability of success of the trial before the trial starts as the probability of a positive trial averaged over the prior distribution.

gsPOS(x = x, theta = prior$z, wgts = prior$wgts)
#> [1] 0.5546512

Note that the previous predictive power computation increased the probability of success relative to this when given a positive interim trend. Now assume you know no bound was crossed, but are blinded to the interim \(Z\)-value. We can compute the predictive power based on this degree of uncertainty as follows:

gsCPOS(x = x, i = 1, theta = prior$z, wgts = prior$wgts)
#> [1] 0.5851146

We see that even with this minimal information about the interim analysis that we can slightly increase our predicted probability of success.

Finally, we compute a 90% prediction interval for the \(Z\)-value at the end of the trial based on the interim \(Z\)-value. This is based on the interim \(Z\)-value, the (updated) prior information and the uncertainty of results in the second half of the trial. We can see that the final upper bound of 1.98 is not close to either end of this prediction interval, consistent with the predictive power of 0.60 that we have computed.

gsPI(x = x, i = 1, j = 2, zi = z1, theta = prior$z, wgts = prior$wgts, level = 0.9)
#> [1] 0.6519354 3.8121586

8.6 Sample size adaptation

We continue our example with normal outcomes to demonstrate two methods of sample size adaptation. First, we show how to adapt the sample size and final statistical bound using the conditional power of the trial based on an interim result. Then we demonstrate information-based design that updates the sample size based on an interim estimate of the standard deviation ‘nuisance parameter’ assumption that was made.

8.6.1 Sample size adaptation based on conditional power

Based on the results of the previous section, the concept of updating the final sample size for a design based on conditional power at an interim analysis has been widely published. While this strategy remains controversial and often unsuccessful, the methods have improved with time so that they can be relatively efficient.

We begin by noting the planned stage 2 sample size per group:

ceiling(x$n.I[2] / 2) - ceiling(x$n.I[1] / 2)
#> [1] 44

Next, we reset stage 2 sample size and cutoff to make conditional power for stage 2 equal to originally planned power:

xupdate <- ssrCP(z1 = z1, x = x)

# Re-estimated stage 2 sample size per arm
ceiling(xupdate$dat$n2 / 2)
#> [1] 158

# Z-statistic required for stage 2 data only
as.numeric(xupdate$dat$z2)
#> [1] 1.238271

Finally, we look at how the sample size might have been adapted based on different values the interim test statistic \(Z_1\).

xu <- ssrCP(x = x, z1 = seq(0, 3, 0.001))
plot(xu)

You can see horizontal lines at sample size \(N = 88\) where \(Z_1\) is above the upper bound or below the lower bound at the interim analysis; in this case, we have just used the original group sequential design. We follow the method of Chen, DeMets, and Gordon Lan (2004) here which limits sample size adaptation to a ‘promising’ set of \(Z_1\) values. Above the lower horizontal lines are horizontal lines at \(N = 176\) where sample size is not adapted. For the smaller \(Z_1\) values, no adaptation is done because the results are not promising enough. For the larger \(Z_1\) values, the conditional power is at least the originally-planned 90% with the originally-planned final sample size. The curved portion of the line denotes a region where the sample size is adapted to achieve 90% conditional power assuming the treatment effect observed at the interim analysis. The short horizontal line at \(N = 352\) is a region where the results are promising, but increasing the sample size by more than a factor of 2 would be required to reach the targeted conditional power.

Now examine the three x-axes at the bottom of the graph.

There is a direct translation of \(Z_1\) values to conditional power (CP) and, asymptotically, to the observed interim effect size relative to the effect size for which the trial is powered (\(\hat{\theta}/\theta_1\)). Thus, the sample size adaptation can be considered to be based on any one of these values. See the help file for ssrCP for more on the options available.

8.6.2 Information-based design

Next we consider adaptation based on estimating the variance of the treatment effect at the interim analysis. Note that statistical information is one over the variance of the treatment effect estimate. An information-based design examines statistical information instead of sample size at each analysis. We plan the analysis based on the treatment effect that we wish to detect, which was a difference of 2 in the means of the two treatment groups in this case (delta = 2 below).

xI <- gsDesign(k = 2, delta = 2)

Recall that we previously computed the standard error of the interim estimate of the difference of means and stored it in sedelta1. The inverse of the square of this is, thus, the statistical information at the interim analysis.

info1 <- 1 / sedelta1^2
info1
#> [1] 1.247166

This is less than was planned for this case:

xI$n.I[1]
#> [1] 1.369775

We began using the sample sizes provided in the original design stored in the variable x. Using the interim sample size there and the ratio of information observed at the interim compared to what is required at the final analysis, we can estimate the final sample size required to get final planned information. Note that we are getting an integer number per group and doubling from the original design:

# Total planned sample size
n2 <- 2 * ceiling(x$n.I[2] * xI$n.I[1] / info1 / 2)

Now we move on to the final analysis and suppose final within group standard deviation observed is 4.1; based on this we compute the final statistical information observed:

info2 <- n2 / (4 * 4.1^2)

Based on the interim and final statistical information, we update the planned information-based design. Note that we could have done this calculation at the interim analysis based on an assumption for the final statistical information; the interim updated bounds would be the same as those computed here:

xIBD <- gsDesign(k = 2, delta = 4, maxn.IPlan = xI$n.I[2], n.I = c(info1, info2))
xIBD$upper$bound
#> [1] 2.818138 1.979606

Now we compute the final \(Z\)-statistic based on assuming an observed treatment difference of 1.4:

z2 <- 1.4 * sqrt(info2)
z2
#> [1] 2.378018

Since 2.38 is greater than the adjusted final efficacy bound of 1.98 computed just above (in xIBD$upper$bound), the null hypothesis can be rejected.

8.7 Information-based adaptation for a binomial design

8.7.1 Overview

The example we will develop here is a trial with a binomial outcome where the control group event rate is not known and we wish to power the trial for a particular relative risk regardless of the control rate.

This is, by far, the most complex section of the book. It may be skipped by most, but may provide a useful example for others.

8.7.2 Basic strategy

Information-based design can be done blinded or unblinded, but there may be some preference from regulators that it be done on a blinded basis to ensure that those adapting, presumably a statistician from the trial sponsor, will not be taking into account treatment effect when updating the sample size. Even if somebody independent of the sponsor does the adaptation, there may be some possibility for back-calculation of the treatment effect observed if unblinded sample size adaptation is used.

The basic design strategy for adapting the sample size is as follows:

  1. Design the study under an assumed value for the nuisance parameter.
  2. At each interim analysis, compute an estimate of the nuisance parameter and update the statistical information, study bounds and sample size appropriately.

Following is the specific sample size adaptation strategy at each analysis, slightly modified from Zhou et al. (2014):

  1. If current sample size is enough, meaning the planned final statistical information has been acquired, stop the trial regardless of whether efficacy or futility bound is crossed.
  2. Otherwise, if overrun of patients enrolled between the analysis cutoff and performance of the analysis is enough to provide the planned final statistical information, stop enrolling more patients but keep collecting data for enrolled patients.
  3. Otherwise, if the next planned sample size is enough, stop enrollment when that sample size is achieved.
  4. Otherwise, if the next planned sample size is not enough but the overrun likely at the next interim would provide more than the planned final statistical information, stop at the estimated overrun. This will normally provide a little more than the planned information to be a bit conservative.
  5. Otherwise, if the next planned sample size is not enough, but the original maximum sample size is, continue enrollment to the next planned interim.
  6. Otherwise, if the original maximum sample size is not enough, use the updated maximum sample size but with an upper limit depending on practical aspects of certain trials (e.g. two times the originally planned maximum sample size) and continue enrollment to the next analysis.

8.7.3 The design and planned information

The design is loosely based on the CAPTURE study (The CAPTURE Investigators, 1997) where abciximab was studied to reduce important cardiovascular outcomes following balloon angioplasty. The design assumed a control event rate of 15% and was powered at 80% to detect a 1/3 reduction to a 10% event rate in the experimental group. For this example, we assume 2.5%, one-sided Type I error and use default Hwang-Shih-DeCani bounds with \(\gamma = -8\) for efficacy and \(\gamma = -4\) for futility. Following is the code that is generated by the web interface for this trial:

# This is added to load the gsDesign package
library(gsDesign)

# Derive Binomial Fixed Design
n <- nBinomial(
  p1 = 0.15,
  p2 = 0.1,
  delta0 = 0,
  alpha = 0.025,
  beta = 0.2,
  ratio = 1
)

# Derive Group Sequential Design
x <- gsDesign(
  k = 4,
  test.type = 4,
  alpha = 0.025,
  beta = 0.2,
  timing = c(1),
  sfu = sfHSD,
  sfupar = c(-8),
  sfl = sfHSD,
  sflpar = c(-2),
  delta = 0,
  delta1 = 0.05,
  delta0 = 0,
  endpoint = "binomial",
  n.fix = n
)

Here is the resulting design with a planned sample size of 1469 and 3 equally-spaced interim analyses:

plot(x)

As noted, we wish to be able to power the trial for a relative risk of 2/3, e.g., to a 10% experimental group event rate when the control rate is 15% or an 8% experimental group event rate when the control rate is 12%. For relative risk the logarithm of the relative risk is used as the parameter of interest: \(\delta = -\log(2/3)\). We used minus the logarithm so that \(\delta\) is positive. If we used the same design parameters, but different control rates, we would get sample sizes as follows:

# Compute ratio of group sequential to fixed design sample size above
Inflate <- x$n.I[4] / n

# Compute fixed design sample sizes for different control rates
p1 <- seq(0.08, 0.2, 0.01)
nalt <- nBinomial(p1 = p1, p2 = p1 * 2 / 3, alpha = 0.025, beta = 0.2)
df <- data.frame(x = p1, y = nalt * Inflate)

# Plot sample sizes using inflation factor and the ggplot2 package
library(ggplot2)

ggplot(df, aes(x = x, y = y)) +
  geom_line() +
  xlab("Control event rate") +
  ylab("Sample size required")

Thus, you can see our sample size can vary from 1048 to almost 2940 depending on the value of the nuisance parameter. We will initially plan to do an analysis of the first 368 patients for interim 1 and adapt from there.

In the web interface, if we can get the information-based design by entering \(\delta\) = \(-\log(2/3)\) = 0.4054651 into the “Treatment effect” control:

Deleting a couple of unnecessary parameters from the code generated, we re-derive a design with the same bounds, but on the information scale. x$n.I shows the resulting planned statistical information at each analysis.

# Derive Group Sequential Design
x <- gsDesign(
  k = 4,
  test.type = 4,
  alpha = 0.025,
  beta = 0.2,
  timing = 1,
  sfu = sfHSD,
  sfupar = -8,
  sfl = sfHSD,
  sflpar = -2,
  delta = 0.4054651,
  delta1 = 0.4054651,
  delta0 = 0,
  endpoint = "info"
)
x$n.I
#> [1] 12.77867 25.55734 38.33600 51.11467

The plot for the bounds is now:

plot(x)

8.7.4 Sample size adaptation

We will give an example of how the adaptation would have proceeded. We use a large control event rate which leads to a smaller sample size and keeps the example short. Following the basic strategy we have laid out we assume the first two analyses result in continuing the trial and adapting bounds. We assume 368 are patients included in the first interim with an equal number of 184 patients in each arm. On a blinded basis, we assume 67 events (18.2%) at interim 1. We can compute the estimated statistical information based on blinded data using the function varBinomial() as follows:

1 / varBinomial(x = 67, n = 368, scale = "RR")
#> [1] 20.47841

The above information calculation is under the null hypothesis. This is slightly different than under the alternative hypothesis:

1 / varBinomial(x = 67, n = 368, scale = "RR", delta0 = -log(2 / 3))
#> [1] 19.48577

Computing under either assumption has been adequate in simulations performed. We will use the alternate hypothesis calculation here which assumes that the observed event rate in the experimental group is 2/3 of that in the control group. The proportion of final planned information we have indicates a final sample size of needed to obtain the planned information is 966, rounding up to an even sample size, based on the following:

# Divide IA1 information by final planned information
relinf <- 1 / varBinomial(x = 67, n = 368, scale = "RR", delta0 = -log(2 / 3)) / x$n.I[4]
368 / relinf
#> [1] 965.3298

This is about half-way between the planned second (\(n = 736\)) and third interims (\(n = 1102\)). Assuming an overrun of about 280 patients would occur by the time the \(n = 736\) patients were enrolled, we plan to enroll to \(n = 1016 = 736 + 280\) patients for the final analysis. This is slightly more than the estimated \(n = 966\) needed, but is used here to provide a ‘cushion’ to ensure an adequate sample size since the information at the final analysis is estimated based on the overall event rate remaining unchanged.

We will come back to the interim and final testing bounds momentarily, but assume that no bound is crossed at the interim. We assume that at the final sample size of 1016 that we observe 183 events for an event rate of 18%. The statistical information at this analysis is now computed as

1 / varBinomial(x = 183, n = 1016, scale = "RR", delta0 = -log(2 / 3))
#> [1] 53.10206

In spite of having a little lower event rate than at the interim, we still have more than the planned information (\(53.10 > 51.11\)).

We now update the code for the information-based design above based on the observed information at the two analyses above:

# Update design based on observed information
y <- gsDesign(
  k = 2,
  test.type = 4,
  alpha = .025,
  beta = .2,
  delta = 0.4054651,
  sfu = sfHSD,
  sfupar = -8,
  sfl = sfHSD,
  sflpar = -2,
  maxn.IPlan = x$n.I[4],
  n.I = c(20.47841, 53.10206)
)

In the above derivation of the updated design in y, we have just copied arguments for test.type, alpha, beta, delta, sfu, sfupar, sfl and sflpar from what was provided by the web interface for computing the original design stored in x. Enter help(gsDesign) in the console for more information on what is going on here. The new arguments here are maxn.IPlan which is the maximum planned sample size from the original design x and the information at the two analyses provided in the argument n.I. You can also see help(sfHSD) for documentation of the Hwang-Shih-DeCani spending functions used. To see the updated design, we use plot(y); note that the information is rounded up here:

plot(y)

8.7.5 Testing and confidence intervals

We assume at the first analysis we have 184 patients per treatment group and that the total of 67 events is divided into 40 in the control group and 27 in the experimental group. The test statistic and repeated confidence interval for this difference is computed below. The repeated confidence interval method is described by Jennison and Turnbull (1999) for details. Basically, we use the \(Z\)-statistic for the efficacy bound at interim 1 and plug the corresponding confidence level into the function ciBinomial(). ciBinomial() computes confidence intervals for 1) the difference between two rates, 2) the risk-ratio for two rates, or 3) the odds-ratio for two rates. This procedure provides inference that is consistent with testBinomial() in that the confidence intervals are produced by inverting the testing procedures in testBinomial(). The testing and confidence interval calculations are as described in Miettinen and Nurminen (Statistics in Medicine, 1985). The Type I error alpha input to ciBinomial() is always interpreted as 2-sided.

testBinomial(x1 = 40, n1 = 184, x2 = 27, n2 = 184)
#> [1] 1.756089

level <- 2 * pnorm(-y$upper$bound[1])
ciBinomial(alpha = level, x1 = 27, n1 = 184, x2 = 40, n2 = 184, scale = "RR")
#>       lower    upper
#> 1 0.3065992 1.467228

This \(Z = 1.76\) is between the efficacy (\(z = 3.54\)) and futility (\(Z = 0.07\)), so the trial continues. At the second, final analysis we assume equal sample sizes of 508 per arm and that the total of 183 events is divided between 108 in the control group and 75 in the experimental group. We compute the test statistic as

testBinomial(x1 = 108, n1 = 508, x2 = 75, n2 = 508)
#> [1] 2.694094

which exceeds the final efficacy bound of 1.96, yielding a positive efficacy finding at the final analysis.In this case, the conservative spending functions yielded a final boundary of 1.961 compared to a bound of 1.960 if a fixed design had been used.

8.7.6 Alternate example

In the example above, we have also reduced the overall sample size needed from 1470 under the original assumption of a 15% control group event rate to 1016, a substantial savings. Note that if the overall event rate had been lower, a larger final sample size would have been suggested. Let’s test an example assuming that the low overall event rate is due to a larger than expected treatment effect.

We assume that after 184 per treatment group we have 28 events in the control group (15.2%) and 14 events (7.6%) in the experimental group. After 368 per treatment group we have 58 events in the control group (15.8%) and 27 events (7.3%) in the experimental group. This yields the following adaptation and positive efficacy finding at the second interim analysis:

# Compute information after IA's 1 and 2
info <- 1 / c(
  varBinomial(x = 42, n = 368, scale = "RR", delta0 = -log(2 / 3)),
  varBinomial(x = 85, n = 736, scale = "RR", delta0 = -log(2 / 3))
)
info
#> [1] 11.32031 22.94377

# After IA2, compute estimated maximum sample size needed
relinfo <- info[2] / 51.11
nupdate <- 736 / relinfo
nupdate
#> [1] 1639.529

# Update design based on observed information at IA1 and
# assumed final information
y <- gsDesign(
  k = 2,
  test.type = 4,
  alpha = .025,
  beta = .2,
  delta = 0.4054651,
  sfu = sfHSD,
  sfupar = -8,
  sfl = sfHSD,
  sflpar = -2,
  maxn.IPlan = x$n.I[4],
  n.I = info
)

# Compute Z-values for IA's 1 and 2
testBinomial(x1 = c(28, 58), n1 = c(184, 368), x2 = c(14, 27), n2 = c(184, 368))
#> [1] 2.295189 3.575202

# Compare above to updated IA1 and IA2 efficacy bounds
y$upper$bound
#> [1] 3.938772 3.465813

We see that the updated sample size based on blinded data would be 1662. However, you can see that the second interim \(Z\) = 3.58 is greater than the efficacy bound of 4.48 and thus the trial can stop for a positive efficacy finding at that point. Assuming there is an overrun of 280 patients as before, we again end with a final sample size of 1016, a savings compared to the planned maximum sample size. In combination, the two examples show a couple of ways that the trial can adapt to produce savings compared to the planned sample size of 1470.