

Open Science
Introduction
Methods
Results
Discussion
Sample Size
Item 16a: How sample size was determined, including all assumptions supporting the sample size calculation
Examples
“We expected an improvement in PFS [progression free survival], in favor of avelumab, with a hazard ratio (HR) of 0.58. Considering a fixed design with a 2-sided α risk of 5% and a power of 80%, 106 events (progression or death) are needed to demonstrate this difference based on the Schoenfeld method. With an estimated recruitment rate of 3 patients per month, a follow-up period for each patient of 24 months, and a percentage of patients lost to follow-up or not evaluable of 15%, 132 patients had to be randomized, and we planned to enroll a total of 66 patients per group [239]."
“The target sample size was 300 (150 per arm) over a 3-and-a-half-year recruitment period. This was based on an assumed proportion of individuals with clinically meaningful improvement in VA [visual acuity] (>10 letters) of 55% in the standard care arm and a 19% increase in the adjunct group to 75%, with approximately 7% loss to follow-up, at least 90% power and two-sided 5% type 1 error [240]."
“In order to detect a minimum clinically important difference (MCID) in mean volume of daily PA [physical activity] of 2.1 m g [milligravity] at 12 months, and assuming a standard deviation (SD) of 5.3 m g, power of 80%, and a statistical significance level of 5%, a total of 202 participants were required. Allowing for 20% loss to follow-up and 20% non-compliance of accelerometer/intervention attendance meant that at least 338 participants were required (169 per group). The value of 2.1 m g was chosen as it represents an increase in PA that is equivalent to walking at the threshold between light intensity and moderate intensity (for example, 4 km per hour) for 30 min per day or 10–15 min of brisk walking per day [241]."
“Sample size was based on the primary outcome measure, HOS ADL [hip outcome score activities of daily living subscale] at eight months post-randomisation, and was calculated using a minimum clinically important difference between groups of 9 points. We estimated the standard deviation to be 14 points; however, summaries presented at a planned interim data monitoring meeting found that the standard deviation was 18 points. A revised calculation (significance level 5%, power 90%, loss to follow-up 20%) gave a sample size of 214 (107 participants in each group). The data monitoring committee approved the sample size increase from 120 to 214 participants [242]."
Explanation
Sample size calculations are a key design component for a trial and need careful planning. Sample size calculations need to balance ethical and logistical considerations alongside medical and statistical considerations so that the scientific question can be reliably and precisely answered in a timely manner without unnecessarily exposing individuals to ineffective or harmful interventions. They are generally based on one primary outcome. A trial should therefore be sufficiently large to have a high probability (power) of identifying a clinically important difference of a prespecified size that meets a criterion of statistical significance, if such a difference exists. The magnitude of the effect has an inverse relationship with the sample size required for its detection; that is, larger sample sizes are needed to detect smaller differences. Moreover, the inverse relationship is not linear: very small differences require enormous sample sizes to have good power to detect.
All details on how the sample size was determined should be reported to allow replication (in principle). Elements of the sample size calculation that need to be specified are the primary outcome (and time point) on which the calculation was based (item 14); the anticipated values for the outcome in each trial group (which implies the clinically important target difference between the intervention groups) at a specific time point with rationale or provenance of all quantities, including any relevant citations; or continuous outcomes, the standard deviation of the measurements [243]; the statistical test; the α (type I error) value and whether it is two sided; the statistical power (or the β (type II error) value); and the resulting target sample size per trial group (box 4). Details should be given of any inflation of the sample size made for attrition or non-adherence during the study. Reference to any formulas or software packages used for the sample size calculation should all be reported. The reporting will have additional considerations for crossover trials,167 factorial trials,138 cluster trials,168 multi-arm trials,166 within-person trials, [245] and non-inferiority and equivalence trials [129].
Box start
Box 4: DELTA2 recommended reporting items for the sample size calculation of a randomised controlled trial with a superiority question*[244]
Core items
1. Primary outcome (and any other outcome on which the calculation is based). If a primary outcome is not used as the basis for the sample size calculation, state why
2. Statistical significance level and power
3. Express the target difference according to outcome type
(a) Binary—state the target difference as an absolute or relative effect (or both), along with the intervention and control group proportions. If both an absolute and a relative difference are provided, clarify if either takes primacy in terms of the sample size calculation
(b) Continuous—state the target mean difference on the natural scale, common standard deviation, and standardised effect size (mean difference divided by the standard deviation)
(c) Time to event—state the target difference as an absolute or relative difference (or both); provide the control group event proportion, planned length of follow-up, intervention and control group survival distributions, and accrual time (if assumptions regarding these values are made). If both an absolute and relative difference are provided for a particular time point, clarify if either takes primacy in terms of the sample size calculation
4. Allocation ratio. If an unequal ratio is used, the reason for this should be stated
5. Sample size based on the assumptions as per above
(a) Reference the formula/sample size calculation approach, if standard binary, continuous, or survival outcome formulas are not used. For a time-to-event outcome, the number of events required should be stated
(b) If any adjustments (eg, allowance for loss to follow-up, multiple testing) that alter the required sample size are incorporated, they should also be specified, referenced, and justified along with the final sample size
(c) For alternative designs, additional input should be stated and justified
(d) Provide details of any assessment of the sensitivity of the sample size to the inputs used
Additional items for grant application and trial protocol
6. Underlying basis used for specifying the target difference (an important or realistic difference)
7. Explain the choice of target difference—specify and reference any formal method used or relevant previous research
Additional item for trial results paper
8. Reference the trial protocol.
*Taken from Cook et al [244].
Box end
Transparency in the sample size reveals the power of the trial to readers and gives them a measure by which to assess whether the trial attained its planned size. Any differences in the planned sample size described in the trial registration (item 2), study protocol (item 3), or statistical analysis plan should be explained.
Interim analyses are used in some trials to help decide whether to stop early or to continue recruiting sometimes beyond the planned trial end (item 16b). If the actual sample size differed from the originally intended sample size for some other reason (eg, because of poor recruitment or revision of the target sample size), an explanation should be given alongside details of the revised sample size. Many reviews have found that few authors report how they determined the sample size [222, 246-252]
There is no value in conducting and reporting a post hoc calculation of statistical power using the results of a trial, for example, as a pretext to explain non-significant findings; this may even mislead and confuse readers [253, 254]