

Open Science
Introduction
Methods
Results
Discussion
Statistical methods
Item 21a: Statistical methods used to compare groups for primary and secondary outcomes, including harms
Examples
“The primary outcome was analysed using a mixed effects log-binomial model to generate an adjusted risk ratio (RR) and an adjusted risk difference (using an identity link function), including centre as a random effect. Statistical significance of the treatment group parameter was determined (p value generated) through examination of the associated χ2 statistic (obtained from the log-binomial model which produced the RR). Binary secondary outcomes were analysed as per the primary outcome. Time to hCG [human chorionic gonadotrophin] resolution was considered in a competing risk framework to account for participants who had surgical intervention for their ectopic pregnancy. A cumulative incidence function was used to estimate the probability of occurrence (hCG resolution) over time. A Fine and Gray model was then used to estimate a subdistribution adjusted hazard ratio (HR) directly from the cumulative incidence function. In addition, a further Cox proportional hazard model was fitted and applied to the cause-specific (non-surgical resolution) hazard function and used to generate an adjusted HR. Return to menses was analysed using a Cox regression model. Number of hospital visits associated with treatment was analysed using a Poisson regression model, including centre as a random effect to generate an adjusted incidence ratio [255]."
“For the primary continuous outcome and secondary outcomes, linear mixed-effect models were used, with outcome measurement (at the two follow-up timepoints) as the dependent variable. The models included fixed effects for timepoint, treatment, timepoint by treatment interactions, the baseline measure of the outcome, and therapist, assuming a linear relationship between baseline and outcome. The dichotomous outcome of recovery in the delusion was analysed using a logistic mixed-effect model. Persecutory delusion conviction was analysed as a continuous and also as a dichotomous (recovery) variable. The models included a random intercept for participant, an unstructured correlation matrix for the residuals, and were fitted using restricted maximum likelihood estimation . . . For each outcome and timepoint, we report the treatment effect estimate as the adjusted mean difference between groups, its SE [standard error], 95% CIs [confidence intervals], and p value. In addition, we report estimates for Cohen’s d effect sizes as the adjusted mean difference of the outcome (between the groups) divided by the sample SD [standard deviation] of the outcome at baseline [361]."
“Analyses followed a prespecified statistical analysis plan. The primary outcome (ODQ [Oswestry Disability Questionnaire] score at 18 weeks after randomisation) was compared between groups with a linear regression model, adjusted for baseline ODQ, with centre as a random effect. ODQ score, visual analogue scores (VAS) for back pain, VAS for leg pain, MRM [modified Roland-Morris] outcome score, and COMI [Core Outcome Measures Index] score at all follow-up visits were analysed with a repeated measures mixed-effects model adjusting for baseline outcome measure, treatment group, time (as a continuous variable), and a time-treatment arm interaction (if significant). Centre and participant were random effects in the repeated measures models. A second model adjusted for other prespecified variables, age, sex, duration of symptoms, body-mass index, and size of disc prolapse (as a percentage of the diameter of the spinal canal, categorised as <25%, 25–50%, or >50%) [362]."
“We analysed the primary outcome (between-group difference in the SPPB [short physical performance battery] at 12 months) using linear mixed models, adjusted for baseline measurements, minimisation variables (age, sex and CKD [chronic kidney disease] category) and a random effect variable for recruitment site. We analysed secondary outcomes using repeated measures mixed models, including all participants and including data from all available timepoints. Models were adjusted for baseline values and the minimisation variables. We conducted time-to-event analyses (time to death, time to commencing renal replacement therapy) using Cox proportional hazards models adjusted for minimisation variables. All participants were included in these analyses, with participants censored at the point of dropout or truncation of follow-up for those not reaching the analysis endpoint before 24 months. For all analyses, we took a two-sided p value of < 0.05 as significant with no adjustment for multiple testing [363]."
Explanation
Various methods can be used to analyse data, and it is crucial to ensure that the chosen approach is suitable for the specific context. Specifying the statistical procedures and software used for each analysis is essential, and additional clarification may be required in the results section of the report. Authors should describe the statistical methods insufficient detail to allow a knowledgeable reader with access to the original data to verify the reported results, as emphasised by the ICMJE (https://www.icmje.org/). It is also important to elaborate on specific aspects of the statistical analysis, such as the intention-to-treat approach.
Details of all statistical analyses are frequently prespecified in a statistical analysis plan, a document that accompanies the trial protocol. In the report of the trial results, authors should detail and justify any deviation from the statistical analysis plan or from the protocol if no statistical analysis plan was developed. They should clarify which analyses were prespecified and which were post hoc.
Most analysis approaches provide an estimate of the treatment effect, representing the difference in outcomes between comparison groups, and authors should also indicate the effect measure (eg, absolute risk) considered. Authors should accompany this with a CI for the estimated effect, delineating a central range of uncertainty regarding the actual treatment effect. The CI may be interpreted as the range of values for the treatment effect that is compatible with the observed data. Typically, a 95% CI is presented, signifying the range anticipated to encompass the true value in 95 of 100 similar studies.
Study findings can also be assessed in terms of their statistical significance. The P value represents the probability that the observed data (or a more extreme result) could have arisen by chance when the interventions did not truly differ. The statistical significance level that will be used should be reported. In the results section, actual P values (for example, P=0.001) are strongly preferable to imprecise threshold reports such as P<0.05 [364, 365].
Some trials may use bayesian methods [366-369]. In this case, the choices of priors, computational decisions, and any modelling methods used should be described. Most bayesian trials so far have been for early phases of drug development, but this approach can be applicable to any phase. Typically, results are presented as treatment effects along with credible intervals.
Where an analysis lacks statistical power (eg, harms outcomes), authors may prefer descriptive approaches over formal statistical analysis.
While the necessity for covariate adjustments is generally reduced in randomised trials compared with epidemiological studies, considering an adjusted analysis can have value in terms of increased power and precision, particularly if there is an indication that one or more variables may have prognostic value [370]. It is preferable for adjusted analyses to be explicitly outlined in the study protocol (item 3). For instance, it is often advisable to make adjustments for stratification variables,[371] in keeping with the principle that the analysis strategy should align with the study design. In the context of randomised trials, the decision to make adjustments should not be based on whether there are baseline covariates that are statistically significantly different between randomised groups. The testing of baseline imbalance in covariates should be avoided,[370] as if randomisation is properly conducted, then by definition, any differences in baseline covariates between treatment arms are random. The rationale for any adjusted analyses and the statistical methods used should be specified, along with clarifying the choice of covariates that were adjusted for, indicating how continuous variables were handled (eg, linear, modelled with splines),[372] and specifying whether the analysis was planned or post hoc. Reviews of published studies show that reporting of adjusted analyses is inadequate with regard to all of these aspects [373, 374].
Multiplicity issues are prevalent in trials and merit special consideration, especially in cases involving multiple primary outcomes, multiple time points stemming from repeated assessments of an outcome, multiple planned analyses for an outcome (such as interim or subgroup analyses (item 21d)), or analyses of numerous secondary outcomes (see CONSORT outcomes extension for more details) [21]. Any methods used to mitigate or account for multiplicity should be described. If no methods have been used to account for multiplicity (eg, not applicable, or not considered), then this should also be reported, particularly when a large number of analyses has been carried out.