Administrative information

Open Science

Introduction

Methods

Results

Discussion

Statistical methods

Item 27c: How missing data will be handled in the analysis

Example

“We anticipate two types of missing data in this trial. The first are those that are due to mortality during the follow- up period, and the second is due to loss to follow-up. Our previous work in ICU [Critical Care Unit] survivors and other work in the surgical literature has shown that most post-ICU deaths happen within the first 30 days of discharge. We do not expect death rates to be different between randomization groups, and we will monitor death closely during the trial using both follow-up contact and information from the EMR [Electronic Medical Record]. The second type of missing data comes from participant withdrawal during follow-up or inability to contact. This may be more frequent in the usual care group than in the intervention group due to infrequent contact when compared to the intervention group. The mixed effects approach we propose is robust under the missing-at-random assumption (i.e., the probability of missing is unrelated to the missing outcomes). However, we will compare the baseline characteristics of patients with missing outcomes to those with complete outcome ascertainment to detect violation of this assumption. We will also perform sensitivity analyses using various methods of imputation or a full parametric likelihood approach assuming various patterns of missing data.”[426]

Explanation

Most randomised trials encounter some degree of missing outcome and covariate data [423, 427]. Missing data negatively impact trials by reducing statistical power and introducing potential bias [427]. Methods for handling missing data (including analysis of complete cases only) typically rely on unverifiable assumptions. It is thus important to develop and document strategies to maximise completeness of follow-up and prevent missing data from arising in the first place (Item 25b).

Missing values can occur for various reasons, such as when participants withdraw consent for further data collection or fail to attend follow-up visits. Some reasons for missingness could be related to the treatment allocation, prognostic factors, or experiencing a particular adverse event or health outcome [428]. The mechanism causing the data to be missing affects the risk of bias and decisions on how best to handle the missing data [429, 430]. The assumed missingness of data is usually described with the following convoluted terminology: “missing completely at random” (MCAR) means that there is no systematic difference between missing and observed data – they have the same distributions. “Missing at random” (MAR) means that missing data are systematically related to known aspects of the observed data (which enables statistical modelling). Both are different from the third category: “Not missing at random” (NMAR) [431].

When planning to handle anticipated missing data, protocol authors have a choice between various methods, including imputing data, fitting a mixed effects model to repeated measures data, or omitting participants [429, 430]. When the amount of missing outcome data is not large, all randomised participants with outcome observed (a “complete case” population) can be planned to be included in the analysis population under a plausible mechanism for missing data (e.g., missing at random). Sensitivity analyses can be planned to explore departures from this assumption, thereby using all randomised participants at least in sensitivity analyses (Item 27d) [432]. Still, while imputation of missing outcomes allows an intention-to-treat analysis, it does not guarantee avoidance of bias except under strong assumptions about the missing data mechanism, which may be unknown. It is often recommended that participants with missing data be included in the analysis using multiple imputation, wherein the missing outcomes or covariates are estimated using other variables [432]. While imputation aligns with intention-to-treat analysis, it demands strong assumptions that may be challenging to justify or verify.

Simple imputation methods (e.g., last observation carried forward) may seem appealing but are not advisable as they introduce bias and ignore the uncertainty induced by missing data, leading to confidence intervals that are too narrow [434].

If randomised participants with missing data are omitted (a complete case analysis), then the analysis deviates from the intention-to-treat principle and can introduce bias (Item 27b) depending on the amount of missing data and the mechanism causing the data to be missing. A complete case analysis also diminishes statistical power by reducing the sample size.

Despite the high prevalence and important impact of missing data, only 69% and 74% of two samples of trial protocols approved in 2016 addressed statistical methods to handle missing data (and the analysis population relating to protocol non-adherence) [9, 10].

The protocol should outline how missing data will be handled in the analysis, including planned methods for imputing missing data, and details of the variables that will be used in the imputation process, if applicable (Box 6). It is also important to describe any planned sensitivity analyses that will explore the extent to which trial results vary under different missing data assumptions (Item 27d).