Home Page > Research areas > Methodology > Applied stat methodology > Reducing bias and properly reflecting uncertainty from missing data

Reducing bias and properly reflecting uncertainty from missing data

Missing data in CTU trials and observational studies is inevitable but may lead to bias. Even modest amounts of missing data (10-15%) can make study conclusions unreliable.

Simple approaches to dealing with missing data in trials, such as ‘missing equals failure’ and ‘last observation carried forward’, are known to be biased and lead to over-precise estimates of treatment effects. Nevertheless, they are still used in the analysis of trials and are recommended by regulatory agencies.

Better approaches include analysis of all available measurements in a likelihood-based analysis, inverse-probability weighting-based estimating equations, and methods based on multiple  imputation of the missing data. These methods are valid when the missing data are ‘missing at random’.

The CTU has contributed to these approaches, and in particular to the practical implementation of multiple imputation through the development of the ice and mim software packages in Stata.  

Several aspects of multiple imputation methods deserve further exploration and development. We plan to assess the robustness of the methods that assume ‘missing at random’ through a sensitivity analysis varying the degree to which the missing data are related to their unobserved values, i.e. ‘missing not at random’. We are also concerned with definitions and methods to produce an intention-to-treat analysis when there are missing data, and plan to further develop Stata software for the practical use of multiple imputation.

The Unit collaborates with leading missing data methods researchers at the London School of Hygiene and Tropical Medicine, University of Bristol, University of Melbourne (Australia), MRC Biostatistics Unit and beyond, and contributes to workshops on imputation.

 

Key projects

  • Further extensions to the widely used multiple imputation software in Stata: development of the underlying theory, practical strategies around model building and estimation with multiply imputed data
  • Methods to provide a sensitivity analysis for missing data ‘not at random’

 

Selected publications

  • Vergouwe Y, Royston P, Moons KG, Altman DG. Development and validation of a prediction model with missing predictor data: a practical approach. Journal of Clinical Epidemiology 2010; 63:205-214
  • White IR, Royston P. Imputing missing covariate values for the Cox model. Statistics in Medicine 2009; 28:1982-1998
  • Wood AM, White IR, Royston P. How should variable selection be performed with multiply imputed data? Statistics in Medicine 2008; 27:3227-3246
  • Seaman SR, White IR, Copas AJ, Li L Combining multiple imputation and inverse-probability weighting in press - Biometrics