

Open Science
Introduction
Methods
Results
Discussion
Ancillary analyses
Item 28: Any other analyses performed, including subgroup and sensitivity analyses, distinguishing prespecified from post hoc
Examples
“In a [prespecified] sensitivity analysis to support the primary binary endpoint, the NRS [numerical rating score] pain score at 1 month was also analyzed using the constrained longitudinal data analysis model . . . Primary Outcome: At 1 month after the intervention, the percentage of responders (Low Back Pain intensity <40) was higher in the glucocorticoid intradiscal injection (GCIDI) group (36 of 65 [55.4%]) than the control group (21of 63 [33.3%]) (absolute risk difference, 22.1 percent-age points [CI, 5.5 to 38.7 percentage points]; P=0.009 [after multiple imputation]) . . . In the sensitivity analysis, the mean reduction in LBP [low back pain] intensity from baseline to 1 month was greater in the GCIDI group (−32.5 [CI,-38.2 to −26.8]) than the control group (−17.5 [CI, −23.3 to −11.7]) (absolute difference, -15.0 [CI,-22.9 to −7.1]; P< 0.001) [422]."
“Owing to the later inclusion of parent cosmetic appearance assessments (to assist with trial conduct), it was decided to perform a post hoc subgroup analysis to determine whether the scores given by the assessors and parents differed between treatment groups [table 15] . . . The assessor scores did not indicate a difference between the nail-replaced and nail-discarded groups. However, the scores given by the parents suggested that there was a statistically significant difference in favour of the nail-discarded group. The treatment by subgroup interaction term was statistically significant (OR [odds ratio] 0.24, 95% CI [confidence interval] 0.06 to 0.96. P= 0.044).”
Explanation
Multiple analyses of the same data create a risk for false-positive findings [480]. Authors should especially resist the temptation to perform many subgroup analyses [481-483]. Analyses that were prespecified in the trial protocol (item 3) are much more reliable than those suggested by the data, and therefore authors should report which analyses were prespecified. If subgroup analyses were undertaken, authors should report which subgroups were examined, why, whether they were prespecified, and how many were prespecified. Selective reporting of subgroup analyses could lead to bias [484]. When evaluating a subgroup, the question is not whether the subgroup demonstrates a statistically significant result but whether the subgroup treatment effects are significantly different from each other. To determine this, a test of interaction is helpful, although the power for such tests is typically low. If formal evaluations of interaction are undertaken (item 21d) they should be reported as the estimated difference in the intervention effect in each subgroup (with a CI), not just as P values.
In one survey [481], 35 of 50 trial reports included subgroup analyses, of which only 42% used tests of interaction. It was often difficult to determine whether subgroup analyses had been specified in the protocol. In another survey of surgical trials published in high impact journals, 27 of 72 trials reported 54 subgroup analyses of which 91% were post hoc and only 6% of subgroup analyses used a test of interaction to assess whether a subgroup effect existed [485].