Skip to contents

This vignette describes the methodology used in the sample_size_nbinom function for calculating sample sizes when comparing two treatment groups with negative binomial outcomes. It covers two available methods and how to account for variable accrual rates.

Methodology

We assume the outcome YY follows a negative binomial distribution with mean μ\mu and dispersion parameter kk, such that the variance is given by:

Var(Y)=μ+kμ2 Var(Y) = \mu + k \mu^2

Note that in R’s rnbinom parameterization, k=1/𝚜𝚒𝚣𝚎k = 1/\texttt{size}.

We wish to test the null hypothesis H0:λ1=λ2H_0: \lambda_1 = \lambda_2 against the alternative H1:λ1λ2H_1: \lambda_1 \neq \lambda_2, where λ1\lambda_1 and λ2\lambda_2 are the event rates in the control and treatment groups, respectively.

The function implements two methods:

Method 1: Zhu and Lakkis (2014)

This method is based on the asymptotic normality of the log rate ratio. The sample size for the control group (n1n_1) is:

n1=(zα/s+zβ)2V(log(μ1/μ2))2 n_1 = \frac{(z_{\alpha/s} + z_{\beta})^2 \cdot V}{(\log(\mu_1/\mu_2))^2}

where VV is the variance component: V=(1μ1+k)+1r(1μ2+k) V = \left(\frac{1}{\mu_1} + k\right) + \frac{1}{r}\left(\frac{1}{\mu_2} + k\right)

where r=n2/n1r = n_2/n_1 is the allocation ratio, and μi=λit\mu_i = \lambda_i \cdot t is the expected mean count over exposure duration tt.

Method 2: Friede and Schmidli (2010) / Mütze et al. (2019)

This method uses a Wald test statistic and is commonly used in group sequential designs (as implemented in the gscounts package). The total sample size ntotaln_{total} is calculated as:

ntotal=(zα/s+zβ)2V(log(λ1/λ2))2 n_{total} = \frac{(z_{\alpha/s} + z_{\beta})^2 \cdot \bar{V}}{(\log(\lambda_1/\lambda_2))^2}

where V\bar{V} is the average variance per subject: V=1/μ1+kp1+1/μ2+kp2 \bar{V} = \frac{1/ \mu_1 + k}{p_1} + \frac{1/ \mu_2 + k}{p_2}

where p1=n1/ntotalp_1 = n_1/n_{total} and p2=n2/ntotalp_2 = n_2/n_{total} are the allocation proportions.

Note: For a fixed design with equal allocation, both methods yield identical sample sizes.

Average exposure with variable accrual and dropout

When the accrual rate is not constant or when the trial has a fixed duration with ongoing recruitment, the exposure time for patients will vary. The function calculates an average exposure time to use in the sample size formula. Additionally, if a dropout_rate is specified, the exposure is adjusted to account for patients leaving the study early.

Let TT be the total trial duration. Suppose recruitment occurs in JJ segments, where the jj-th segment has accrual rate RjR_j and duration DjD_j.

If dropout_rate is 0:

  1. The expected number of patients recruited in segment jj is Nj=RjDjN_j = R_j \cdot D_j.
  2. The start time of segment jj is Sj1=i=1j1DiS_{j-1} = \sum_{i=1}^{j-1} D_i (with S0=0S_0 = 0).
  3. The midpoint of recruitment for segment jj is Mj=Sj1+Dj/2M_j = S_{j-1} + D_j/2.
  4. The average follow-up (exposure) time for patients recruited in segment jj is approximately Ej=TMjE_j = T - M_j.

If dropout_rate (δ\delta) > 0, the average exposure is calculated by integrating the exposure function over the recruitment interval:

Ej=1Djuminumax1eδuδdu=1δ1δ2Dj(eδumineδumax) \begin{aligned} E_j &= \frac{1}{D_j}\int_{u_{min}}^{u_{max}} \frac{1 - e^{-\delta u}}{\delta} du \\ &= \frac{1}{\delta} - \frac{1}{\delta^2 D_j} \left( e^{-\delta u_{min}} - e^{-\delta u_{max}} \right) \end{aligned} where umaxu_{max} and uminu_{min} are the maximum and minimum potential follow-up times for patients in that segment (TSj1T - S_{j-1} and T(Sj1+Dj)T - (S_{j-1} + D_j) respectively).

Maximum follow-up

If max_followup (FF) is specified, the follow-up time for any individual is capped at FF. This creates three scenarios for a recruitment segment:

  1. All truncated: If uminFu_{min} \ge F, all patients in the segment have potential follow-up F\ge F, so their actual follow-up is FF (subject to dropout).
  2. None truncated: If umaxFu_{max} \le F, no patients reach the cap FF before the trial ends. The calculation is as above.
  3. Partial truncation: If umin<F<umaxu_{min} < F < u_{max}, patients recruited earlier in the segment are capped at FF, while those recruited later are followed until the trial end. The segment is split into two parts for calculation.

The overall average exposure used for the calculation is the weighted average:

t=j=1JNjEjj=1JNj \bar{t} = \frac{\sum_{j=1}^J N_j E_j}{\sum_{j=1}^J N_j}

Variance inflation for variable follow-up

When follow-up times are variable (due to accrual, dropout, or administrative censoring), simply using the average follow-up time t\bar{t} in the variance formula underestimates the true variance of the rate estimator. This is because the variance of the negative binomial distribution depends on the exposure time in a non-linear way (Var(Y)=μ+kμ2=λt+k(λt)2Var(Y) = \mu + k\mu^2 = \lambda t + k (\lambda t)^2).

To account for this, we apply a variance inflation factor QQ to the dispersion parameter kk, as derived by Zhu and Lakkis (2014):

Q=E[t2](E[t])2 Q = \frac{E[t^2]}{(E[t])^2}

The adjusted dispersion parameter used in the sample size calculation is kadj=kQk_{adj} = k \cdot Q. The function automatically calculates E[t]E[t] and E[t2]E[t^2] based on the accrual, dropout, and trial duration parameters.

Event gaps

In some clinical trials, there is a mandatory “dead time” or gap after an event during which no new events can occur (e.g., a recovery period). If an event_gap is specified, the effective exposure time for a subject is reduced by the time spent in these gaps.

The function approximates the effective event rate as: λeffλ1+λgap \lambda_{eff} \approx \frac{\lambda}{1 + \lambda \cdot \text{gap}} This adjusted rate is then used in the sample size calculations. The effective exposure time reported is also adjusted similarly.

Examples

Basic calculation (Zhu and Lakkis 2014)

Calculate sample size for:

  • Control rate λ1=0.5\lambda_1 = 0.5
  • Treatment rate λ2=0.3\lambda_2 = 0.3
  • Dispersion k=0.1k = 0.1
  • Power = 80%
  • Alpha = 0.025 (one-sided)
  • Accrual over 12 months
  • Trial duration 12 months (implying exposure approx 6 months)
sample_size_nbinom(
  lambda1 = 0.5,
  lambda2 = 0.3,
  dispersion = 0.1,
  power = 0.8,
  alpha = 0.025,
  sided = 1,
  accrual_rate = 10, # arbitrary, just for average exposure
  accrual_duration = 12,
  trial_duration = 12,
  method = "zhu"
)
#> Sample size for negative binomial outcome
#> ==========================================
#> 
#> Method:          zhu
#> Sample size:     n1 = 35, n2 = 35, total = 70
#> Expected events: 168.0 (n1: 105.0, n2: 63.0)
#> Power: 80%, Alpha: 0.025 (1-sided)
#> Rates: control = 0.5000, treatment = 0.3000 (RR = 0.6000)
#> Dispersion: 0.1000, Avg exposure (calendar): 6.00
#> Accrual: 12.0, Trial duration: 12.0

Using Friede and Schmidli (2010) method

sample_size_nbinom(
  lambda1 = 0.5,
  lambda2 = 0.3,
  dispersion = 0.1,
  power = 0.8,
  alpha = 0.025,
  sided = 1,
  accrual_rate = 10,
  accrual_duration = 12,
  trial_duration = 12,
  method = "friede"
)
#> Sample size for negative binomial outcome
#> ==========================================
#> 
#> Method:          friede
#> Sample size:     n1 = 35, n2 = 35, total = 70
#> Expected events: 168.0 (n1: 105.0, n2: 63.0)
#> Power: 80%, Alpha: 0.025 (1-sided)
#> Rates: control = 0.5000, treatment = 0.3000 (RR = 0.6000)
#> Dispersion: 0.1000, Avg exposure (calendar): 6.00
#> Accrual: 12.0, Trial duration: 12.0

Piecewise constant accrual

Consider a trial where recruitment ramps up:

  • 5 patients/month for the first 3 months
  • 10 patients/month for the next 3 months
  • Total trial duration is 12 months

The function automatically calculates the average exposure based on this accrual pattern.

sample_size_nbinom(
  lambda1 = 0.5,
  lambda2 = 0.3,
  dispersion = 0.1,
  power = 0.8,
  accrual_rate = c(5, 10),
  accrual_duration = c(3, 3),
  trial_duration = 12
)
#> Sample size for negative binomial outcome
#> ==========================================
#> 
#> Method:          zhu
#> Sample size:     n1 = 26, n2 = 26, total = 52
#> Expected events: 176.8 (n1: 110.5, n2: 66.3)
#> Power: 80%, Alpha: 0.025 (1-sided)
#> Rates: control = 0.5000, treatment = 0.3000 (RR = 0.6000)
#> Dispersion: 0.1000, Avg exposure (calendar): 8.50
#> Accrual: 6.0, Trial duration: 12.0

Accrual with dropout and max follow-up

Same design as above, but with a 5% dropout rate per unit time and a maximum follow-up of 6 months.

sample_size_nbinom(
  lambda1 = 0.5,
  lambda2 = 0.3,
  dispersion = 0.1,
  power = 0.8,
  accrual_rate = c(5, 10),
  accrual_duration = c(3, 3),
  trial_duration = 12,
  dropout_rate = 0.05,
  max_followup = 6
)
#> Sample size for negative binomial outcome
#> ==========================================
#> 
#> Method:          zhu
#> Sample size:     n1 = 38, n2 = 38, total = 76
#> Expected events: 157.6 (n1: 98.5, n2: 59.1)
#> Power: 80%, Alpha: 0.025 (1-sided)
#> Rates: control = 0.5000, treatment = 0.3000 (RR = 0.6000)
#> Dispersion: 0.1000, Avg exposure (calendar): 5.18
#> Dropout rate: 0.0500
#> Accrual: 6.0, Trial duration: 12.0
#> Max follow-up: 6.0

Calculating power for fixed design

Using the accrual rates and design from the previous example, suppose we want to calculate the power if the treatment effect is smaller (λ2=0.4\lambda_2 = 0.4 instead of 0.30.3). We use the accrual_rate computed in the previous step.

# Store the result from the previous calculation
design_result <- sample_size_nbinom(
  lambda1 = 0.5,
  lambda2 = 0.3,
  dispersion = 0.1,
  power = 0.8,
  accrual_rate = c(5, 10),
  accrual_duration = c(3, 3),
  trial_duration = 12,
  dropout_rate = 0.05,
  max_followup = 6
)

# Use the computed accrual rates to calculate power for a smaller effect size
sample_size_nbinom(
  lambda1 = 0.5,
  lambda2 = 0.4, # Smaller effect size
  dispersion = 0.1,
  power = NULL, # Request power calculation
  accrual_rate = design_result$accrual_rate, # Use computed rates
  accrual_duration = c(3, 3),
  trial_duration = 12,
  dropout_rate = 0.05,
  max_followup = 6
)
#> Sample size for negative binomial outcome
#> ==========================================
#> 
#> Method:          zhu
#> Sample size:     n1 = 38, n2 = 38, total = 76
#> Expected events: 177.3 (n1: 98.5, n2: 78.8)
#> Power: 26%, Alpha: 0.025 (1-sided)
#> Rates: control = 0.5000, treatment = 0.4000 (RR = 0.8000)
#> Dispersion: 0.1000, Avg exposure (calendar): 5.18
#> Dropout rate: 0.0500
#> Accrual: 6.0, Trial duration: 12.0
#> Max follow-up: 6.0

Unequal allocation

Sample size with a 2:1 allocation ratio (n2=2n1n_2 = 2 n_1).

sample_size_nbinom(
  lambda1 = 0.5,
  lambda2 = 0.3,
  dispersion = 0.1,
  ratio = 2,
  accrual_rate = 10,
  accrual_duration = 12,
  trial_duration = 12
)
#> Sample size for negative binomial outcome
#> ==========================================
#> 
#> Method:          zhu
#> Sample size:     n1 = 40, n2 = 80, total = 120
#> Expected events: 264.0 (n1: 120.0, n2: 144.0)
#> Power: 95%, Alpha: 0.025 (1-sided)
#> Rates: control = 0.5000, treatment = 0.3000 (RR = 0.6000)
#> Dispersion: 0.1000, Avg exposure (calendar): 6.00
#> Accrual: 12.0, Trial duration: 12.0

Accounting for event gaps

In some recurrent event trials, there may be a mandatory “gap” period after each event during which no new events can be recorded (e.g., a recovery period or administrative window). This effectively reduces the time at risk.

If an event_gap (gg) is specified, the function adjusts the calculation as follows:

  1. Effective Rates: The event rates are adjusted to λ*=λ/(1+λg)\lambda^* = \lambda / (1 + \lambda g) for the sample size calculation (Zhu and Lakkis method).
  2. At-Risk Exposure: The function reports the “average at-risk exposure” Erisk=Ecal/(1+λg)E_{risk} = E_{cal} / (1 + \lambda g) alongside the standard calendar exposure. This provides transparency on the actual time subjects are at risk for events.

Since the gap reduction depends on the event rate (λ\lambda), the at-risk exposure differs between treatment groups if their rates differ.

Example with event gap

Calculate sample size assuming a 5-day gap after each event (approx 0.0137 years).

sample_size_nbinom(
  lambda1 = 0.5,
  lambda2 = 0.3,
  dispersion = 0.1,
  power = 0.8,
  accrual_rate = 10,
  accrual_duration = 12,
  trial_duration = 12,
  event_gap = 5 / 365.25
)
#> Sample size for negative binomial outcome
#> ==========================================
#> 
#> Method:          zhu
#> Sample size:     n1 = 35, n2 = 35, total = 70
#> Expected events: 167.0 (n1: 104.3, n2: 62.7)
#> Power: 80%, Alpha: 0.025 (1-sided)
#> Rates: control = 0.5000, treatment = 0.3000 (RR = 0.6000)
#> Dispersion: 0.1000, Avg exposure (calendar): 6.00
#> Avg exposure (at-risk): n1 = 5.96, n2 = 5.98
#> Event gap: 0.01
#> Accrual: 12.0, Trial duration: 12.0

The output shows both the “Avg exposure (calendar)” and the “Avg exposure (at-risk)” for each group.

References

Friede, T, and H Schmidli. 2010. “Blinded Sample Size Reestimation with Negative Binomial Counts in Superiority and Non-Inferiority Trials.” Methods of Information in Medicine 49 (06): 618–24.
Mütze, Tobias, Ekkehard Glimm, Heinz Schmidli, and Tim Friede. 2019. “Group Sequential Designs for Negative Binomial Outcomes.” Statistical Methods in Medical Research 28 (8): 2326–47.
Zhu, Haiyuan, and Hassan Lakkis. 2014. “Sample Size Calculation for Comparing Two Negative Binomial Rates.” Statistics in Medicine 33 (3): 376–87.