Details of Award

NERC Reference : NE/T013982/1

◀ Back to previous page

Artificial Intelligence for Missing Data Imputation in Electronic Medical Records

Grant Award

Principal Investigator:: Professor A Denniston, University of Birmingham, Institute of Inflammation and Ageing
Grant held at:: University of Birmingham, Institute of Inflammation and Ageing

Science Area:: None
Overall Classification:: Unknown

ENRIs:: None
Science Topics:: Biomedical Informatics; Bioinformatics

Abstract:: Health systems in the UK and Canada have made extensive use of Electronic Medical Records (EMR) for many years as an integral part of their operations. However, whilst digitally recorded data exists, their use as the basis of a "learning health system" whereby continuous improvements in patient experience, hospital operations, and quality of care has are made by collating and examining data and evidence to improve all these areas. However, real-world EMR data can be very challenging to handle. One significant contribution to these difficulties is data quality. Missing data is a particular issue, with rates of missingness of between 10-30% for some records. Properly addressing the missing data issue in EMR data is complicated by the fact that it can be difficult to differentiate between genuine missing data (data was not recorded into the system) and a non-applicable response (e.g. the test was not appropriate therefore it was not done). Data can be missing-at-random (MAR) or missing-not-at-random (MNAR) where, in the latter, there is an underlying factor that determines the missingness patterns. Certain types of missingness can therefore be "informative" since, if a clinician decided not to order certain tests, it indicates a certain implicit belief about the perceived health state of the patient. Failure to account for these sources of bias may lead to incorrect inferences. Artificial Intelligence technologies are seen as an important tool in unlocking the information wealth held in our electronic medical records. This project will contribute to the maturation of these technologies to account for the real-world complexities of EMR datasets. The research proposed here will develop algorithms for data imputation that seek to be more robust, reliable and generalisable. We have chosen to initially focus on automated sepsis diagnosis, a pressing area of biomedical research given that sepsis accounts for around 44,000 deaths each year in the UK alone. Therefore, by applying modern approaches based on machine learning to large EMR datasets we promise to tackle this problem in a unique way that could have meaningful real-world impact. However, as many AI prediction models require complete datasets as input, one popular strategy for handling missing data involves "data imputation", whereby an algorithm is used to fill in missing data values. These methods vary in complexity from simply filling in missing values with the average observed values over the entire dataset through to more advanced methods that attempt to elicit the underlying patterns in the data. However, many current imputation methods are designed for only certain types of EMR data (e.g. clinical time series of molecular measurements) and fail to account for sources of bias and provide measures of certainty about the quality of the imputed data. The overall goal of this project is to develop novel machine learning methods for missing data imputation in EMRs that account for biases and statistical uncertainty in the imputation.

Period of Award:: 1 Apr 2020 - 30 Sep 2021
Value:: £10,312

(FY details)

Authorised funds only

NERC Reference:: NE/T013982/1
Grant Stage:: Completed
Scheme:: NC&C NR1
Grant Status:: Closed
Programme:: Globalink Placement

This grant award has a total value of £10,312

▲ top of page

FDAB - Financial Details (Award breakdown by headings)

Exception - Other Costs
£10,312

If you need further help, please read the user guide.