Item 16b: Interim analyses | consort-spirit.org

Open Science

Introduction

Methods

Results

Discussion

Sample Size

Item 16b: Explanation of any interim analyses and stopping guidelines

Examples

“Interim analyses of effectiveness and safety endpoints were performed on behalf of the data monitoring committee on an approximately annual basis during the period of recruitment. These analyses were done with the use of the Haybittle–Peto principle and hence no adjustment was made in the final p values to determine significance [255]."

“One interim analysis of the primary endpoint and safety data was planned for when approximately 50% of the participants had completed D28 [day 28]. Statistical significance and futility boundaries were estimated for the interim and final analysis based on 50,000 simulations from the PASS® software (NCSS, Kaysville, Utah) by simulating a group sequential test for two means assuming normality testing. At the interim analysis, the two-sided significance boundary for clinical efficacy was 0.00312 and for futility of detecting μAUT00063 > μplacebo, the one-sided O’Brien-Fleming boundary was 0.39,141. Hence, at the final analysis, the two-sided significance boundary for clinical efficacy would be 0.04761. The Independent Data Monitoring Committee (IDMC) was advised to consider making recommendations for early termination only where there was a clear demonstration of futility [256]."

“Three planned analyses (two interim analyses and one final analysis) were performed when the observed number of events were 25, 47, and 84, respectively. Data were released by DSMC [data and safety monitoring committee] after final analysis. Efficacy stopping boundaries were based on the O’Brien-Fleming spending function. Futility boundaries were based on testing the alternative hypothesis at the 0.039 level [257]."

“Two interim analyses to be performed using the Haybittle-Peto approach were scheduled, after enrolment of 1000 and 2000 patients, respectively. The significance level associated with both interim analyses was 0.001 and the significance level associated with the final analysis was 0.049. With this method, the overall risk of type 1 error was 5% [258]."

Explanation

Numerous randomised trials enrol participants over extended periods of time. If an intervention demonstrates exceptional efficacy, the study might require early termination on ethical grounds. To mitigate this concern, assessing results as data accumulates is advisable, ideally through an independent data monitoring committee (DMC), sometimes referred to as a data and safety monitoring board (DSMB) [259]. However, conducting multiple statistical evaluations on accruing data without proper adjustment may result in misleading conclusions. For instance, examining data from a trial at five interim analyses using a P value of 0.05 would elevate the overall false-positive rate closer to 19% rather than the expected 5%. Further to stopping early for efficacy, interim analyses can be used to evaluate (1) futility, to assess whether a trial is likely to meet its objectives; or (2) safety, to assess whether there is evidence for increased risk of harms (in the intervention group relative to the comparator group) [259]. Interim analyses can also be used to reassess the sample size, using updated information from interim trial data (eg, through an internal pilot), to ensure adequate power of the trial.

Various group sequential statistical approaches exist to adjust for multiple looks (ie, analyses) at the data, and these should be predetermined in the trial protocol (see item 27a of the SPIRIT 2025 statement78). Using these methods, data are compared at each interim analysis, where a P value below the specified critical value by the chosen group sequential method signifies statistical significance. Some researchers view group sequential methods as a tool for decision making, while others regard them as a definitive stopping point, intending to halt the trial if the observed P value falls below the critical threshold.

Authors should disclose whether they or the DMC/DSMB performed multiple looks at the data (interim analyses). If such multiple looks occurred, it is important to specify the frequency; the triggers prompting them; the statistical methods applied (including any formal stopping rules); and whether these procedures were planned and documented in the trial protocol before the trial commenced, before the DMC examined any interim data, or at a later stage. Authors should also report the time point at which any interim analyses where conducted (and by whom); and state who decided to continue, stop or modify the trial, and whether they were blinded to the treatment allocation. Unfortunately, the reporting of interim analyses and stopping rules is frequently inadequate in published trial reports [260], even in cases where trials indeed halted earlier than originally planned.

Item 16a: Sample Size

Item 17a: Sequence Generation