Group Sequential Design (GSD)

Analysis

Statistical methods for monitoring and analysis

The goal in every trial is to conduct robust inference to produce reliable results about the benefits and harms of new treatments. Specifically, the focus is on obtaining the best (i.e., reliable) estimate of the treatment effect and its uncertainty, and other relevant measures of evidence for testing hypotheses of interest (e.g., p-values) ¹. The analysis of group sequential trials focuses on two elements ¹^,²^,³:

measures of interim results that should be reported at each stage,
overall final results that should be reported once the trial has been stopped.

Stopping rules and repeated significance testing introduce problems during analysis. First, researchers stop early after observing overwhelming or extreme results ¹⁷. As such, earlier interim results tend to exaggerate the true effects and these results tend to have large variance because of less information (see RATPAC example Figure 5 or Figure 1 here). Second, the sampling distribution of the point estimate, the calculation of treatment-related quantities, and their properties are altered. For example, traditional confidence intervals for fixed trial designs tend to result in excessive coverage than desired ⁴, ⁵ and the calculation of p-values depends on how the sample space is ordered to classify what is more extreme than the observed ⁶. Below is a summary of specialised methods that have been developed to address these issues.

First, repeated confidence intervals that should be reported at each interim analysis ensure that the overall coverage is as desired ⁷. Second, several treatment effect estimators with different properties exist, specifically on the magnitude of bias and variance: bias-adjusted mean estimator ⁵, Rao-Blackwell adjusted estimator ⁸^, ⁹, and median unbiased estimator ⁸. The latter is based on a specific ordering of the sample space such as stagewise, likelihood ratio or z-score, mean or maximum likelihood estimate, and score test ordering ²^,¹⁰^,¹¹. The same sample space ordering approach is also used to compute confidence intervals and p-values for final reporting when the trial is stopped ²^,⁶^,¹¹^,¹². Literature discourages the use of score test ordering as it may result in results inconsistent with the stopping decision ⁶^,¹³. Only p-values from stagewise ordering meets the following essential properties (other methods only guarantees the first point) ¹¹^,¹²:

are uniformly distributed;
are consistent with stopping rules (e.g. the p-value must not exceed the planned nominal level when an efficacy boundary is crossed and the trial is stopped early);
do not depend on the timing and/or frequency of future interim analyses;
are the same as those obtained from a fixed trial design (without any interim analysis) if a trial is stopped at the first interim analysis.

Stagewise ordering is widely preferred, it can be implemented together with frequently used flexible stopping rules as it is not influenced by future results ¹¹^,¹². Of note, some authors have also recommended likelihood ratio ordering suggesting it yields desirable confidence intervals ¹⁴ and reliably captures the level of evidence better than other methods ⁶. All these methods except the score ordering are implemented in R package “RCTdesign” ¹⁵ using modules “seqMonitor()” and “seqInference()”. R package “rpact” ¹⁶ and some software such as ADDPLAN and East offer stepwise ordering to produce median unbiased estimates with related confidence intervals and p-values. Recent literature addresses the performance of these estimators ¹⁸.

PANDA users should note that if a trial is stopped early at the first interim analysis, the final results to be reported based on any sample space ordering method will be the same as that obtained using traditional analysis that assumes a fixed trial design.

References

1. Todd et al. Interim analyses and sequential designs in phase III studies. Br J Clin Pharmacol. 2001;51(5):394–9.
2. Jennison et al. Analysis following a sequential test. In: Group sequential methods with applications to clinical trials. Chapman & Hall/CRC. 2000;171–87

3. Whitehead. The analysis of a sequential trial. In: The design and analysis of sequential clinical trials. John Wiley & Sons Ltd. 1997;135–81.
4. Jennison et al. Repeated confidence intervals. In: Group sequential methods with applications to clinical trials. Chapman & Hall/CRC. 2000;89–204.
5. Whitehead. On the bias of maximum likelihood estimation following a sequential test. Biometrika. 1986;73(3):573–81.
6. Cook. P-value adjustment in sequential clinical trials. Biometrics. 2002;58(4):1005–11.
7. Jennison et al. Interim analyses: The repeated confidence interval approach. J R Stat Soc Ser B. 1989;51(3):305–61.
8. Emerson et al. Parameter estimation following group sequential hypothesis testing. Biometrika. 1990;77(4):875–92.
9. Emerson et al. A computationally simpler algorithm for the UMVUE of a normal mean following a group sequential trial. Biometrics. 1997;53(1):365-9.
10. Tsiatis et al. Exact confidence intervals following a group sequential test. Biometrics. 1984;40(3):797–803.
11. Proschan et al. Inference following a group sequential trial. In: Statistical monitoring of clinical trials - A unified approach. Springer. 2006;113–35.
12. Wassmer et al. Group sequential and confirmatory adaptive designs in clinical trials. Springer. 2016.
13. Chang et al. P-values for group sequential testing. Biometrika. 1995;82(3):650.
14. Rosner et al. Exact confidence intervals following a group sequential trial: A comparison of methods. Biometrika. 1988;75(4):723.
15. Gillen et al. Designing , monitoring , and analyzing group sequential clinical trials using the “RCTdesign” package for R. 2012.
16. Lakens et al. Group sequential designs : A tutorial. Preprint. 2021;1–13.
17. Zhang et al. Overestimation of the effect size in group sequential trials. Clin Cancer Res. 2012;18(18):4872–6.
18. Robertson et al. Point estimation for adaptive trial designs. In peer review. 2021.