

Administrative information
Open Science
Introduction
Methods: Patient and public involvement, trial design
Methods: Participants, interventions, and outcomes
Methods: Assignment of interventions
Methods: Data collection, management, and analysis
Methods: Monitoring
Ethics
Data management
Item 26: Plans for data entry, coding, security, and storage, including any related processes to promote data quality (e.g., double data entry; range checks for data values). Reference to where details of data management procedures can be accessed, if not in the protocol
Example
“Household data, clinical measurements in cohort incidence study, and entomological data collected during the cross-sectional surveys will be captured on electronic forms using smartphones installed with ODK Collect. The data will be stored on a secure server located at LSHTM [London School of Hygiene and Tropical Medicine] and all data management and manipulation will be done using Stata (Stata Corp). Laboratory data output will be available directly from the analyser … and imported into a database. Data extractions will be converted into Stata format for querying and analysis. It will be possible to share de-identified data in several widely used formats.
Data quality and control
Paper case report forms (CRFs) will have numbered and coded items to ensure straightforward and accurate data entry and processing, and drafts will be reviewed by the study team before finalisation. Standard Operating Procedures (SOPs) for data collection will be developed and field staff will be appropriately trained to ensure rigorous data collection. This will include quality control (QC) of their own performance by checking for missing data or implausible responses. Furthermore, more QC checks will be performed by a supervisor to check for data completeness and internal consistency of responses within a few hours of data collection. Corrections, when appropriate, will be done before the CRFs are submitted for data entry. Electronic CRFs will have built in checks for missing data, implausible responses, and internal consistency; data collected using electronic CRFs will include the device serial number and date/time stamp and the device will be password protected. All quantitative data collected on paper CRFs will be double-entered into a database independently by two data clerks. The database will maintain an audit trail with time-date stamps of data entry and all changes that are made to the data.
…
Data security
Every effort will be made to ensure data security, particularly relating to sensitive participant information. All data will be uploaded onto a secure server on the LSHTM cloud. All data will be stored encrypted and will be accessible only by password and encryption keys held by the data manager. In the study database, we will not store any information that could be used to identify individual study participants. We will use anonymised study numbers as our unique participant identifier.
...
Data storage
Upon completion of the study, electronic files will be stored on a server and also copied to encrypted USB and stored offsite in a safebox. CRFs will be stored in the secure archive, which is equipped with locked cabinets for long-term storage of CRFs and documents. All paper source records will be retained for a minimum of 10 years from the point of publication of data on the primary outcome. Electronic data will be stored for a minimum of 10 years following study completion, with regular checks to make sure that the data are still readable ” [400].
Explanation
Plans to handle the data collected from trial participants helps to promote data validity and integrity. A Data Management Plan details how the data will be collected, processed, secured, stored, and shared during and after a trial. Guidance is available on the content of Data Management Plans [401-403].
Differences in data entry methods can affect the trial in terms of data accuracy, cost, and efficiency. For example, when compared with paper case report forms, electronic data capture can reduce the time required for data entry, and allow for efficient data validation, query resolution, and database release by combining data entry with data collection (Item 25a) [386, 404]. When data are collected on paper forms, data entry can be performed locally or at a central site. Local data entry can enable fast correction of missing or inaccurate data, while central data entry facilitates blinding (masking), standardisation, and training of a core group of data entry personnel.
Raw, non-numeric data are usually coded for ease of data storage, review, tabulation, and analysis. It is important to define standard coding practices to reduce errors and observer variation. When data entry and coding are performed by different individuals, it is particularly important that the personnel use unambiguous, standardised terminology and abbreviations to avoid misinterpretation.
Standard processes are often implemented to improve the accuracy of data entry and coding [391, 405]. Common examples include double data entry [406]; verification that the data are in the proper format (e.g., integer) or within an expected range of values; and independent source document verification of a random subset of data to identify missing or apparently erroneous values (Item 29). Though performed to detect data entry errors, the time and costs of independent double data entry from paper forms need to be weighed against the magnitude of reduction in error rates compared with single data entry.
For trials in which both trail participants and personal are blinded, it is important to plan the timing and procedures for unblinding the trial (e.g., after the creation of a cleaned and locked data file).
Among two samples of trial protocols approved in 2016, 64% to 75% reported the data entry and coding processes [9, 10]. The protocol should fully describe the plans for data entry and processing, along with measures to promote their quality, or outline key elements with a reference to the Data Management Plan where full information can be found. These details are particularly important for the primary outcome data. The protocol should also document data security measures to prevent unauthorised access to or loss of participant data, as well as plans for data storage (including time frame) during and after the trial. This information facilitates an assessment of adherence to applicable standards and regulations.
Summary of key elements to address
-
Processes for data management, including:
-
Data entry and coding, including measures to reduce errors (e.g., double data entry, range checks for data values):
-
Data security
-
Data storage, including time frame
-
Reference to where full information can be found (e.g., Data Management Plan), if not in the protocol