Multi-Arm Multi-Stage (MAMS)


Statistical methods for monitoring and analysis

For treatment selection at an interim analysis, the analysis of these designs is usually straightforward. Here, appropriate stagewise measures of the treatment effects are obtained from a statistical model (e.g., magnitude of treatment effect, critical value, or p-values) and compared with the corresponding selection boundaries 15. This is similar to monitoring in two-arm group sequential designs. Thus, in most cases for cumulative MAMS, there is no need for specialised software for the analysis since traditional test statistics (e.g., from a linear regression model) are obtained using standard software. Unplanned changes may occur due to unforeseeable circumstances, for example, the timing of an interim analysis may be delayed, and this may require updating selection boundaries to match their new timing.  

For stagewise MAMS, independent data contributing to each stage are analysed separately to produce independent summary statistics and corresponding p-values, which are adjusted for multiple testing using a prespecified method (e.g., Bonferroni, Sidak, Simes or Dunnett 1). These stagewise multiplicity adjusted p-values for each treatment comparison are then combined across stages (with available data) using a prespecified weighted combination approach (e.g., inverse normal method 2) to produce a summary test statistic that will be monitored against decision boundaries. To claim evidence of benefit, a closed testing procedure is often applied to control the familywise type I error rate 3 where a null hypothesis relating to a treatment comparison is rejected only if that null hypothesis is rejected and all the intersecting null hypotheses involving that treatment comparison are also rejected at the same specified significance level. For example, see 1 (Figure 6.2 on page 223) on how this closed testing is applied in a 4-arm 2-stage design using the Simes procedure.

Another issue is that the selection of arms and possibly stopping early also leads to bias in the usual treatment effect estimators. Some specific methods have been proposed to overcome this issue 4, 5, 6, 7, 8, 9, 13, 14,  although in general there is a trade-off between unbiasedness and variability that often means that the mean squared error is still smallest for the traditional estimators. For example, Stallard et al  13 illustrated how to obtain a uniformly minimum variance conditionally unbiased estimators (UMVCUEs) of treatment effects following treatment selection using ADVENT trial case study in the context of a binary outcome. For this case study, the naïve difference (maximum likelihood estimate) in clinical response rate between the selected 125mg dose and the placebo was ~9.7% and the UMVCUE was ~11.4%. Here, statistical bias was small and point estimates were similar (within 2 percentage points). This UMVCUE method was implemented in PROVE trial. Similarly, the construction of appropriate confidence intervals is more complex and specialised methods need to be considered (e.g., see 10, 14). 

There are situations when bias is negligible and will have little influence on the interpretation of results. Simulation work may help to explore the extent of bias and whether it is necessary to adjust for using these specialised methods (e.g., see 11). The extent of bias and its implications, the robustness of different estimators, as well as user-friendly open-access statistical implementation resources, are areas of ongoing research and debate. Point estimation in adaptive trials including methods relevant in MAMS setting with pros and cons are discussed in detail, as well as guidance for researchers 12.


 1. Maurer et al. Adaptive designs and confirmatory hypothesis testing. In: Multiple testing problems in pharmaceutical statistics. Chapman and Hall/CRC. 2009.
2. Lehmacher et al. Adaptive sample size calculations in group sequential trials. Biometrics. 1999;55:1286–90.
3. Hommel. Adaptive modifications of hypotheses after an interim analysis. Biometrical J. 2001;43(5):581–9.
4. Bowden et al. Unbiased estimation of selected treatment means in two-stage trials. Biometrical J. 2008;50(4):515–27.
5. Stallard et al. Point estimates and confidence regions for sequential trials involving selection. J Stat Plan Inference. 2005;135(2):402–19.
6. Whitehead et al. Estimation of treatment effects following a sequential trial of multiple treatments. Stat Med. 2020;39(11):1593–609.
7. Carreras et al. Shrinkage estimation in two-stage adaptive designs with midtrial treatment selection. Stat Med. 2013;32(10):1677–90.
8. Robertson et al. Unbiased estimation in seamless phase II/III trials with unequal treatment effect variances and hypothesis-driven selection rules. Stat Med. 2016; 35(22):3907-22.
9. Brückner et al. Estimation in multi-arm two-stage trials with treatment selection and time-to-event endpoint. Stat Med. 2017;36(20):3137–53.
10. Magirr et al. Simultaneous confidence intervals that are compatible with closed testing in adaptive designs. Biometrika. 2013;100(4):985–96.
11. Choodari-Oskooei et al. Impact of lack-of-benefit stopping rules on treatment effect estimates of two-arm multi-stage (TAMS) trials with time to event outcome. Trials. 2013;14:23.
12. Robertson et al. Point estimation for adaptive trial designs I: A methodological review. Stat Med. 2022.
13. Stallard et al. Uniformly minimum variance conditionally unbiased estimation in multi-arm multi-stage clinical trials. Biometrika. 2018;105(2):495–501.
14. Jazić et al. Design and analysis of drop-the-losers studies using binary endpoints in the rare disease setting. J Biopharm Stat. 2021;31(4):507–22.
15. Lee et al. The benefits of covariate adjustment for adaptive multi-arm designs. Stat Methods Med Res. 2022;9622802221114544