General considerations about adaptive trials

Analysis

Every trial should generate reliable evidence to inform subsequent research and/or clinical practice. Likewise, following an adaptive trial, it is essential to ensure that the results are reliable to reach credible conclusions on the effects of study treatments. Trial adaptations add a level of complexity which should be taken into account at the design stage, and also incorporated into the analysis.  Adaptive designs can increase the risks of making erroneous claims about the effects of a treatment effect if the impact of trial adaptations is not thought through carefully and accounted for properly. In this section we outline considerations for the appropriate analysis of trials incorporating adaptations.

Methods for statistical inference in adaptive trials can be broadly classified into three key elements:

  1. control of the desired decision error rates about whether a study treatment is beneficial or not,
  2. use of interim results to inform trial adaptations such as early stopping of a trial or treatment arm(s), and,
  3. estimation of treatment effects when the trial ends.

1. Control of decision error rates

Statistical methods underpinning adaptive designs (especially in a traditional frequentist framework) tend to focus on ensuring the decision error rates, such as relevant type I error rates, are controlled at desired levels. This is done at the design stage when required sample sizes to answer research questions are estimated. For example, the stopping boundaries for triggering early stopping decisions can be set up to ensure that the decision error rates are as desired for given sample sizes. These methods depend on the type and scope of trial adaptations considered, the framework of statistical methods (e.g. frequentist or Bayesian), and the sources of multiple hypothesis testing due to interim analyses should be accounted for in decision-making. Multiple hypothesis testing occurs, for example, when the same hypothesis is tested multiple times at interim and final analyses which is often the case in adaptive designs. Other sources of multiple hypothesis testing occur when study treatments or study populations are selected in multi-arm multi-stage (MAMS) and adaptive population enrichment (APE) trials, respectively.

Before a trial begins, researchers choose appropriate statistical methods to ensure desired error rates are controlled in the presence of trial adaptations considered and any sources of multiple hypothesis testing. In PANDA, some of these methods are summarised under the section on “Underpinning statistical methods” within each type of adaptive design under “Planning and Design”. The exact error rates may not be able to be calculated numerically for certain adaptive designs (e.g. those with multiple and complex trial adaptations) and statistical simulations under appropriate scenarios may be needed to quantify error rates in decision-making (see 1, 2).
 
Of note, Bayesian adaptive designs are usually less concerned with frequentist error rate control. Nonetheless, frequentist operating characteristics such as type I error probabilities can be obtained for Bayesian designs, which enhances comparability with frequentist designs.
 

References

1. Mayer et al. Simulation practices for adaptive trial designs in drug and device development. Stat Biopharm Res. 2019;11(4):325-335.
2. FDA. Adaptive designs for clinical trials of drugs and biologics guidance for industry. 2019.

2. Interim decisions

Trial adaptation decisions, especially those leading to an early stopping of a trial or of treatment arm(s), should be convincing to research consumers and informed by good-quality interim data. Trial adaptations based on very noisy or highly uncertain data may result in costly and/or irreversible incorrect treatment decisions (e.g. dropping a truly beneficial treatment arm) and leave too much uncertainty in the conclusions. Also, the calculation or estimation of statistical quantities used to inform trial adaptations such as interim measures of treatment effects (e.g., conditional power) should be clear. A robust interim decision-making process often involves setting up an independent data monitoring committee 1, 2, 3 (see discussion under general considerations).

It is important to ensure that after the trial, the interim decisions made are not controversial and do not raise questions about the credibility of results which would affect their ability to influence clinical practice and/or inform future research. Before the trial begins, researchers are encouraged to involve or engage multidisciplinary key stakeholders to get input and buy-in around adaptation decision-making criteria. This could include clinicians, statisticians, patient representatives, regulators, and funders/sponsors. In PANDA, aspects around adaptation decision rules, such as when they are made and how frequently are discussed under general considerations and within each type of an adaptive design.   

Finally, the integrity of an adaptive trial is enhanced by ensuring that the adaptation decisions made given the observed interim data are consistent with what researchers planned before the trial began. Major deviations from pre-planned decision-making criteria and methods can severely affect the credibility and validity of trial results.

References

1. Sanchez-Kam et al. A practical guide to data monitoring committees in adaptive trials. Ther Innov Regul Sci. 2014;48(3):316-326.
2. Bhattacharyya et al. The changing landscape of data monitoring committees—Perspectives from regulators, members, and sponsors. Biometrical J. 2019;61(5):1232-1241.
3. Chow et al. On the independence of data monitoring committee in adaptive design clinical trials. J Biopharm Stat. 2012;22(4):853-867.

3. Analysis following an adaptive trial

The estimation of treatment effects is essential regardless of whether trial adaptations were triggered at interim analyses or not 1, 2, 3. This encompasses:

  1. point estimates of treatment effects,
  2. uncertainty or precision around point estimates and,
  3. other quantities that are used to measure the level of evidence such as p-values or posterior probabilities of the treatment effect being greater than some value, where relevant.
Statistical methods that do not account for trial adaptations and adaptation decisions made can result in biased point estimates of treatment effects, confidence interval estimates with incorrect coverage, or p-values that are smaller than what they should be 3. Such statistical biases can be introduced in several ways. For example, a trial is stopped early because extreme results have been observed, so the point estimates derived from traditional maximum likelihood (ML) methods tend to exaggerate the true treatment effect 4. Similarly, in multi-arm multi-stage (MAMS) trials, treatments are carried forward because they are promising at an interim analysis, so they are more likely to demonstrate efficacy at the final analysis stage even if all the treatments work equally well. As such, selected treatments may on average exaggerate their true effects than those that have been dropped 5. Similar issues arise in population selection that occurs in adaptive population enrichment (APE) trials. 

PANDA users should note that the level of bias depends on several elements of the design such as the type and scope of trial adaptations, adaptation rules, adaptations that were triggered at interim analysis, the timing and frequency of interim analyses, and size of underlying true treatment effects.  An adaptive design whose possible adaptations were not triggered at interim analyses can also lead to bias if not properly accounted for in the analysis, as can the overruling of an adaptation rule - unless the rule is non-binding. There are situations when bias is small or negligible and traditional ML-based estimation methods can be used without affecting conclusions 6. However, in other situations, bias can be substantial to impact conclusions 5. Also, bias can extend to secondary outcomes depending on how they are correlated with the primary outcome(s) 7, 8 and may affect other objectives such as health economics evaluation 9 and evidence synthesis 10, 11

Several methods have been developed to remove or reduce bias during the estimation of treatment effects in adaptive designs (see 18) and research is ongoing in this area. Notably, there is a trade-off between reducing bias and decreasing precision around the point estimate. For example, a point estimate can be unbiased or can have a very small bias but be highly imprecise (i.e. have a high variance) or it can be highly precise (very small variance) but biased. In adaptive designs, it is often challenging to find an estimator that is unbiased or with very small bias and also highly precise. The literature discusses these issues in detail including a review of available treatment effect estimators with pros and cons as well as guidance for researchers 18. Computing confidence interval estimates with correct coverage can also be challenging for adaptive designs but some solutions have been established for certain designs (e.g., 12, 13, 14, 15, 16

In summary, researchers need to understand and explore potential biases and choose estimation methods that are appropriate for their adaptive trial situation 1 (see case study 17). The estimation methods used should be clearly stated when reporting trial results 2. Also, it is helpful to consumers of research to report both ML estimates and bias-adjusted results where possible so they can assess for themselves whether bias is an issue to be concerned about or not. In PANDA, specific analysis issues during estimation of treatment effects are discussed and available statistical methods summarised under the “Analysis” section within each type of adaptive design

References

1. FDA. Adaptive designs for clinical trials of drugs and biologics guidance for industry. 2019.
2. Dimairo et al. The Adaptive designs CONSORT Extension (ACE) statement: a checklist with explanation and elaboration guideline for reporting randomised trials that use an adaptive design. BMJ. 2020;369:m115.
3. Pallmann et al. Adaptive designs in clinical trials: why use them, and how to run and report them. BMC Med. 2018;16(1):29.
4. Zhang et al. Overestimation of the effect size in group sequential trials. Clin Cancer Res. 2012;18(18):4872-4876.
5. Bauer et al. Selection and bias-two hostile brothers. Stat Med. 2010;29(1):1-13.
6. Pritchett et al. Sample size re-estimation designs in confirmatory clinical trials - Current state, statistical considerations, and practical guidance. Stat Biopharm Res. 2015;7(4):309-321.
7. Whitehead. Supplementary analysis at the conclusion of a sequential clinical trial. Biometrics. 1986;42(3):461.
8. Liu. Unbiased estimation of secondary parameters following a sequential test. Biometrika. 2001;88(3):895-900.
9. Flight. The use of health economics in the design and analysis of adaptive clinical trials. Thesis. 2021.
10. Cameron et al. The importance of considering differences in study design in network meta-analysis: An application using anti-tumor necrosis factor drugs for ulcerative colitis. Med Decis Mak. 2017;37(8):894-904.
11. Todd. Incorporation of sequential trials into a fixed effects meta-analysis. Stat Med. 1997;16(24):2915-2925.
12. Magirr et al. Simultaneous confidence intervals that are compatible with closed testing in adaptive designs. Biometrika. 2013;100(4):985-996.
13. Jaki et al. Considerations on covariates and endpoints in multi-arm multi-stage clinical trials selecting all promising treatments. Stat Med. 2013;32(7):1150-1163.
14. Brannath et al. A new class of powerful and informative simultaneous confidence intervals. Stat Med. 2014;33(19):3365-3386.
15. Bebu et al. Confidence intervals for confirmatory adaptive two-stage designs with treatment selection. Biometrical J. 2013;55(3):294-309.
16. Kimani et al. Point and interval estimation in two-stage adaptive designs with time to event data and biomarker-driven subpopulation selection. Stat Med. 2020;39(19):2568-2586.
17. Steg et al. Design and rationale of the treatment of acute coronary syndromes with otamixaban trial: a double-blind triple-dummy 2-stage randomized trial comparing otamixaban to unfractionated heparin and eptifibatide in non-ST-segment elevation acute coronary syndrome. Am Heart J. 2012;164(6):817-24.e13.
18. Robertson et al. Point estimation for adaptive trial designs I: A methodological review. Stat Med. 2022.