Verification of sample size calculation via simulation
Source:vignettes/verification-simulation.Rmd
verification-simulation.RmdIntroduction
This vignette verifies the accuracy of the
sample_size_nbinom function by comparing its theoretical
predictions for average exposure, statistical information, and power
against results from a large-scale simulation study.
We specifically test a scenario with:
- Piecewise constant accrual rates.
- Piecewise exponential dropout (constant in this example).
- Negative binomial outcomes.
- Fixed follow-up design.
Simulation design
The study sample size suggested by the Friede method is used to evaluate the accuracy of that method. The parameters for both the theoretical calculation and the simulation study are as follows:
Parameters
- Rates: (Control), (Experimental).
- Dispersion: .
- Power: 90%.
- Alpha: 0.025 (one-sided).
- Dropout: 10% per year adjusted to monthly rate ().
- Trial Duration: 24 months.
- Max Follow-up: 12 months.
- Event Gap: 30 days (approx 0.082 years).
- Accrual: Piecewise linear ramp-up over 12 months (Rate for 0-6mo, for 6-12mo).
Theoretical calculation
First, we calculate the required sample size and expected properties
using sample_size_nbinom.
# Parameters
lambda1 <- 0.4
lambda2 <- 0.3
dispersion <- 0.5
power <- 0.9
alpha <- 0.025
dropout_rate <- 0.1 / 12
max_followup <- 12
trial_duration <- 24
event_gap <- 20 / 30.42 # 20 days
# Accrual targeting 90% power
# We provide relative rates (1:2) and the function scales them to achieve power
accrual_rate_rel <- c(1, 2)
accrual_duration <- c(6, 6)
design <- sample_size_nbinom(
lambda1 = lambda1, lambda2 = lambda2, dispersion = dispersion,
power = power,
alpha = alpha, sided = 1,
accrual_rate = accrual_rate_rel,
accrual_duration = accrual_duration,
trial_duration = trial_duration,
dropout_rate = dropout_rate,
max_followup = max_followup,
event_gap = event_gap,
method = "friede"
)
# Extract calculated absolute accrual rates
accrual_rate <- design$accrual_rate
print(design)
#> Sample size for negative binomial outcome
#> ==========================================
#>
#> Method: friede
#> Sample size: n1 = 211, n2 = 211, total = 422
#> Expected events: 1366.9 (n1: 763.1, n2: 603.8)
#> Power: 90%, Alpha: 0.025 (1-sided)
#> Rates: control = 0.4000, treatment = 0.3000 (RR = 0.7500)
#> Dispersion: 0.5000, Avg exposure (calendar): 11.42
#> Avg exposure (at-risk): n1 = 9.04, n2 = 9.54
#> Event gap: 0.66
#> Dropout rate: 0.0083
#> Accrual: 12.0, Trial duration: 24.0
#> Max follow-up: 12.0Simulation results
We simulated 3,600 trials using the parameters defined above from the
Friede method. This number of simulations was chosen to achieve a
standard error for the power estimate of approximately 0.005 when the
true power is 90%
().
The simulation script is located in
data-raw/generate_simulation_data.R.
# Load pre-computed simulation results
results_file <- system.file("extdata", "simulation_results.rds", package = "gsDesignNB")
if (results_file == "" && file.exists("../inst/extdata/simulation_results.rds")) {
results_file <- "../inst/extdata/simulation_results.rds"
}
if (results_file != "") {
sim_data <- readRDS(results_file)
results <- sim_data$results
design_ref <- sim_data$design
} else {
# Fallback if data is not available (e.g. not installed yet)
# This block allows the vignette to build without the data, but warns.
warning("Simulation results not found. Skipping verification plots.")
results <- NULL
design_ref <- design
}1. Average exposure verification
We compare the theoretical average exposure calculated by
sample_size_nbinom with the observed average exposure in
the simulation.
# Theoretical Exposure
theo_exposure <- design_ref$exposure
# Observed Exposure (Average across all trials and arms)
# Note: Exposure is the same for both arms in this design (randomized)
obs_exposure <- mean(c(results$exposure_control, results$exposure_experimental))
comparison_exp <- data.frame(
Metric = "Average Exposure",
Theoretical = theo_exposure,
Simulated = obs_exposure,
Difference = obs_exposure - theo_exposure,
Rel_Diff_Pct = 100 * (obs_exposure - theo_exposure) / theo_exposure
)
knitr::kable(comparison_exp, digits = 4, caption = "Comparison of Average Exposure")| Metric | Theoretical | Simulated | Difference | Rel_Diff_Pct |
|---|---|---|---|---|
| Average Exposure | 11.4195 | 11.3598 | -0.0597 | -0.5232 |
The theoretical exposure should closely match the simulated average exposure.
2. Statistical information and variance
The theoretical variance of the log rate ratio estimator (Wald test) is given by:
where is the expected number of events per subject in group (using effective rates adjusted for event gaps), and is the dispersion parameter inflated for variable follow-up.
The dispersion parameter directly increases the variance of the estimator. In a standard Poisson model, , and the variance depends only on the expected number of events. With negative binomial dispersion (), the variance is inflated, reflecting the extra variability (overdispersion) in the data.
We compare this with the variance of the estimates from the simulation.
# Theoretical Variance (from design object)
theo_var <- design_ref$variance
# Empirical Variance of the Log Hazard Ratio estimates
emp_var <- var(results$estimate, na.rm = TRUE)
# Average Squared Standard Error (mean of the estimated variances)
avg_se_sq <- mean(results$se^2, na.rm = TRUE)
comparison_var <- data.frame(
Metric = c("Variance of Estimator"),
Theoretical = theo_var,
Empirical_Var = emp_var,
Avg_Estimated_Var = avg_se_sq
)
knitr::kable(comparison_var, digits = 5, caption = "Comparison of Variance")| Metric | Theoretical | Empirical_Var | Avg_Estimated_Var |
|---|---|---|---|
| Variance of Estimator | 0.00786 | 0.00755 | 0.00728 |
- Empirical Var: The actual variability of the estimated log rate ratios across 3,600 trials.
- Avg Estimated Var: The average of the variance estimates () produced by the Wald test in each trial.
While there is reasonable agreement, the sample size formula may slightly underestimate the variability of the test statistic and thus may overstate power.
3. Power verification
Finally, we compare the theoretical power with the empirical power (proportion of trials rejecting the null hypothesis).
# Theoretical Power
theo_power <- design_ref$power
# Empirical Power
emp_power <- mean(results$p_value < design_ref$inputs$alpha, na.rm = TRUE)
comparison_pwr <- data.frame(
Metric = "Power",
Theoretical = theo_power,
Simulated = emp_power,
Difference = emp_power - theo_power
)
knitr::kable(comparison_pwr, digits = 4, caption = "Comparison of Power")| Metric | Theoretical | Simulated | Difference |
|---|---|---|---|
| Power | 0.9 | 0.8739 | -0.0261 |
A 95% confidence interval for the empirical power can be calculated to check if the theoretical power falls within the simulation error bounds.
binom.test(sum(results$p_value < design_ref$inputs$alpha, na.rm = TRUE), nrow(results))$conf.int
#> [1] 0.8626014 0.8845653
#> attr(,"conf.level")
#> [1] 0.95Conclusion
The simulation results confirm that sample_size_nbinom
reasonably predicts average exposure, variance (information), and power
for this complex design with piecewise accrual and dropout. However,
given the slight underpowering that the simulation study suggests, it
may be useful to consider a larger sample size than the Friede
approximation suggests, with power verified by simulation.