

Open Science
Introduction
Methods
Results
Discussion
Statistical methods
Item 21d: Methods for any additional analyses (eg, subgroup and sensitivity analyses), distinguishing prespecified from post hoc
Examples
“We conducted prespecified sensitivity analyses to examine the effect of our assumption that participants who withdrew or were lost to follow-up returned to smoking: (1) a complete case analysis and (2) multiple imputation to impute missing smoking abstinence and reduction data. Multiple imputation was performed using the fully conditional specification approach with 5 imputed data sets and results combined using the Rubin rules (eMethods in Supplement 2). Other prespecified sensitivity analyses examined the effect of imbalances in baseline participant characteristics using multiple logistic regression models to estimate odds ratios and 95% CIs [confidence intervals] for point prevalence abstinence at 12 and 24 weeks, adjusting for characteristics for which the absolute value of the standardized difference was 0.1 or greater. We conducted additional post hoc analyses: (1) to examine potential clustering by site using generalized linear mixed models with a random effect for site to estimate odds ratios and 95% CIs for point prevalence abstinence at 12 and 24 weeks, and (2) to compare the baseline characteristics of participants with self-reported smoking data at 12 weeks (primary end point) with those of participants without self-reported smoking data. Statistical analyses were performed using SAS statistical software (version 9.4; SAS Institute) [406]."
“Several prespecified sensitivity analyses were done. First, assessment of the effect of missing data on the primary outcome was done using multiple imputation by chained equations method (MICE). This imputation model included all the variables in the primary ITT [intention to treat] analysis, secondary outcomes (from each timepoint), and baseline variables associated with the missingness of the primary outcome. 20 imputed datasets were generated and combined using Rubin’s rules, and the primary analysis model was then repeated using the imputed data. We specified a priori the following potential exploratory analyses to assess effect modification on the primary outcome: baseline hypertension, baseline MMSE [Mini-Mental State Examination], baseline age, time since Alzheimer’s disease diagnosis, baseline brain volume, and change in systolic blood pressure. A post-hoc analysis was also done to investigate for differences between aggregated and disaggregated MRI [magnetic resonance imaging] data (according to MRI scanner modality) for the primary outcomes [407]."
“Four sensitivity analyses were done examining the primary outcome: restricted to women who had not received antibiotics in the 7 days before delivery, to examine whether any masking of a prophylactic effect was occurring by inclusion of pretreated women; excluding women prescribed antibiotics (other than the trial intervention) within the first 24 h after delivery, and who might therefore already have had an infection at the time of administration of the intervention; restricted to women whose primary outcome was obtained between weeks 6 and 10 after delivery to exclude any biases by over-reporting of outcomes from data returned at a later timepoint or under-reporting of outcomes in data returned at an earlier timepoint; and including centre as a random effect. No subgroup analyses were planned; however, we did a post-hoc subgroup analysis of the primary outcome according to mode of birth (forceps or vacuum extraction). More stringent 99% CIs [confidence intervals] are presented for the estimate of RR [risk ratio] for this post-hoc subgroup analysis [408]."
“A prespecified subgroup analysis for the primary outcomes, testing for an interaction for baseline anxiety, depression, and opioid use, defined using their median values was completed. Prespecified sensitivity analyses for the primary outcome, excluding participants included in process evaluation interviews, adjusting for the imbalance of death, and split by baseline pain disorders were also completed. Because of the potential for type I error due to multiple comparisons, findings for analyses of secondary end points should be interpreted as exploratory. Statistical analyses were conducted using Stata version 16.1 (StataCorp) [409]."
​
Explanation
Sensitivity analyses can be important additional analyses to examine the robustness of the primary trial results under a range of assumptions about the data, methods, and models that differ from those of the primary analysis. When the findings from a sensitivity analysis are consistent with the primary trial findings, trialists can be confident that any assumptions in the primary analysis had little impact—strengthening the trial results. Morris and colleagues provide a principled approach to guide any sensitivity analyses by posing three questions to trialists: does the proposed sensitivity analysis address the same question as the primary analysis; is it possible for the proposed sensitivity analysis to return a different result to the primary analysis; and if the results do differ, is there any uncertainty as to which will be believed [402, 410].
Subgroup analyses are another set of additional analyses that are widely carried out and reported [411-414]. Here, the focus is on those analyses that look for evidence of a difference in treatment effect in complementary subgroups (eg, older and younger participants), a comparison known as a test of interaction [415]. Empirical analyses of subgroup difference claims for factors such as age, sex, race, ethnicity, and other factors show selective reporting, frequent lack of proper statistical support, and poor independent corroboration [416-418].
A common but misleading approach is to compare P values for separate analyses of the treatment effect in each group. Categorising continuous variables to create subgroups is often done for simplicity and because it is perceived as easier to understand and communicate. Major limitations of the approach include the splitting of a continuous variable into discrete subgroups by arbitrarily chosen cut-off points that lack clinical or biological plausibility, which loses information, and thus reduces statistical power [419]. Choosing cut-off points based on achieving statistical significance should be avoided. It is incorrect to infer a subgroup effect (interaction) from one significant (in one subgroup) and one non-significant P value (in another subgroup). The rationale for any subgroups should be outlined (including how they are defined), along with whether the subgroups were specified a priori in the protocol or statistical analysis plan or were done post hoc. Because of the high risk for spurious findings, subgroup analyses are often discouraged. Post hoc subgroup comparisons (analyses done after looking at the data) are especially likely not to be confirmed by further studies. Most of these analyses do not have substantial credibility.
An alternative and stronger approach, which avoids the need to specify cut-off points to assess the interaction between a continuous variable (eg, age) and treatment effect would be to fit a regression model, which can be presented graphically to examine how the estimated treatment effects varies with the level of the variable [420]. These analyses are more complex, requiring model assumptions to capture the relationship (linear or non-linear) between the variable and the treatment effect. Authors should clearly describe the statistical methods used to explore the treatment-covariate interaction.