Item 19: Sample size | consort-spirit.org

Administrative information

Open Science

Introduction

Methods: Patient and public involvement, trial design

Methods: Participants, interventions, and outcomes

Methods: Assignment of interventions

Methods: Data collection, management, and analysis

Methods: Monitoring

Ethics

Sample size

Item 19: How sample size was determined, including all assumptions supporting the sample size calculation

Example

“The sample size for this study is 1128 participants. This full trial sample size is based on the SD of the EQ-5D-5L at 4 months post surgery of 0.3 points [reference] and a minimal clinically important difference of 0.075 [reference] with 2-sided significance of 5% requiring 506 with the primary outcome for 80% power or 676 with the primary outcome for 90% power.

In this population, we expect considerable loss to follow-up. Previous WHiTE trials have indicated that these losses are due mainly to patients declining consent to further follow-up, incapacity, and death [references]. We are able to account for participants who have died in our primary outcome measure and have assumed that only 60% of recruited study participants will be available at the definitive endpoint at 4 months. With a significance level of 5%, this inflates the sample size to 844 for 80% power and 1128 for 90% power. Conservatively, we aim to randomise 1128 in order to ensure a minimum of 676 participants with the primary outcome which will ensure 90% power based on these assumptions“ [289].

“The trial is designed to detect a reduction in hospitalisation or mortality rate of 7.5% at 1 year in patients identified with cardiac dysfunction from 15% anticipated in patients randomised to the standard pathway [reference]. Given approximately one third of patients in both randomised arms are estimated to have cardiac dysfunction, a 7.5% reduction would be diluted to a 9% overall reduction in the enhanced pathway arm. To detect a reduction in events from 15% to 9% (equivalent to a HR equal to 0.58) using log-rank analysis with an overall type 1 error rate of 0.05 (two-sided analysis) and a power of 0.90 requires a total of 146 events to be observed in at least 1070 participants (nQuery Advisor assuming 18-month recruitment) inflated to 1200 in anticipation of minimal drop-out” [290].

Explanation

A key component in the design of a randomised trial is the sample size calculation [291, 292]. Sample size calculations need to balance ethical, logistical, clinical, and statistical considerations to ensure the scientific question can be reliably and precisely answered without unnecessarily exposing individuals to ineffective or harmful interventions. The sample size calculation is generally based on one primary outcome. For trials with more than one primary outcome, a separate calculation can be performed for each, and the largest sample size used.

The sample size should be sufficiently large to have a high probability (power) of detecting a clinically important difference of a prespecified magnitude that meets a criterion of statistical significance. The relationship between sample size and detectable difference is not linear: very small differences require enormous sample sizes if a trial is to be sufficiently powered to detect them. A trial might knowingly be undertaken despite being underpowered, when the intent is for the trial to be incorporated into a prospectively-planned meta-analysis [293].

A complete description of the sample size calculation in the protocol enables an assessment of whether the trial will be adequately powered to detect a minimal clinically important difference [294]. For transparency and reproducibility, the protocol should include the following (Box 3): the outcome (Item 16); the values assumed for the outcome in each study group (e.g., proportion with event, or measure of central tendency (e.g., mean and standard deviation or median and interquartile ranges); the statistical test (Item 27a); alpha (type 1 error level); power; and the calculated sample size per group – both assuming no loss of data and, if relevant, after any inflation for anticipated missing data (Item 27c). Trial investigators are encouraged to also provide a rationale or reference for the outcome values assumed for each study group (thereby defining the target difference deemed important to detect), and to name the software used [291].

The target difference in a superiority trial is the difference in the primary outcome value between the compared groups that the study is designed to detect. This reflects the two distinct concepts of statistical significance and clinical relevance. The target difference should ideally be the smallest clinically important difference, i.e., the minimum clinically important difference [295], though some trials plan for a target difference that is realistically achievable.

The values of certain pre-specified variables tend to be inappropriately inflated (e.g., clinically important target difference) or underestimated (e.g., standard deviation for continuous outcomes) [296], leading to trials having less power in the end than what was originally intended. References to support the sample size formula or approach should be given. When uncertainty about a sample size estimate is acknowledged, methods exist for sample size re-estimation [297]. The rationale, intended use and details of such an adaptive design approach should be detailed in the protocol. If the sample size has been determined based on a series of simulations, it is essential to describe this method in enough detail to ensure a comparable level of transparency and evaluation.

Among randomised trial protocols that describe a sample size calculation, studies often do not state all components necessary to understand and reproduce it (including the derivation of the target difference and where estimated values come from) [9, 10]. Also, a systematic review of articles comparing protocols and published reports (the vast majority being clinical trials or systematic reviews) found discrepancies regarding sample size in 26% to 44% of studies [64].

For trial designs other than parallel-group superiority trials, additional elements should be reported when describing the sample size calculation. For example, an estimate of the standard deviation of within-person changes from baseline should be included for crossover trials [299]; the intra-cluster correlation coefficient for cluster randomised trials [196]; and the equivalence or non-inferiority margin for equivalence or non-inferiority trials, respectively [197]. Such elements are often not described in final trial reports [304], For pilot or feasibility trials where sample size may not be guided by a formal sample size calculation, authors should report how the sample size was determined [305-307].

Box 3 Reporting items for the sample size calculation in the protocol of a randomised superiority trial a

Summary of key elements to address

For sample size calculations:

Primary outcome (and any other outcome) on which the calculations are based
Outcome values (e.g., proportion) assumed for each group, with rationale
Target difference in outcome values between trial groups (including common standard deviation for continuous outcomes), with rationale
Statistical significance level or α (type I) error
Statistical power or β (type II) error
Any upward adjustments (e.g., accounting for missing data or non-adherence)
Target sample size per trial group
Any software used

Item 18: Participant timeline

Item 20: Recruitment