Smart Data Management
Acute pancreatitis (AP) is a disease of the extremes. It strikes suddenly and then, for three quarters of patients, resolves within days. For the remaining quarter the disease rapidly escalates, often resulting in multiple organ failure and even death. There are currently no reliable clinical tests to predict which course a patient is likely to follow. This is unfortunate in two regards. Firstly, prediction would allow a clinician to choose between a “wait and see” approach vs. aggressive treatment with accordant risks. Secondly, prediction would promote investment into drugs tailored for the severe form of the disease.
In the hunt for better molecular biomarkers for acute pancreatitis, GSK Discovery Partnerships with Academia (DPAc) had collaborated with the University of Edinburgh Department of Clinical Surgery on a prospective study. Typical of molecular biomarker discovery, whilst the patient cohort was small (under 100) the number of measurements per patient was large (over a 1000) with detailed clinical, protein and metabolomic, including longitudinal, data types available. The challenge was twofold; to identify features in this valuable and unique dataset that correlate with disease progression, and to build models to discriminate between severe and mild disease. High-dimensional data of this nature, with its mix of continuous and discrete response variables, presents significant problems to traditional machine learning analysis, often leading to false positives and correlations that cannot be reproduced later on.
The GSK project team knew of Eagle through existing industry-academic collaborations, and was aware of Eagle’s track record finding disease biomarkers in complex biological data. Initial discussions suggested that Eagle’s systematic and technology-agnostic approach to machine learning would be ideal for the complex multi-dimensional task in hand. Eagle’s professional services organisation was also able to flexibly fit with GSK timelines and provide the analysis for a reasonable up-front cost.
In brief, the Eagle platform for biomarker discovery involved:
Curation of the clinical/molecular dataset against biomedical standards including ISA and EFO and loading into the e[catalog] data catalogue.
Systematic application of multiple machine learning methods (support vector machine, penalized ordinal regression, logistic regression etc), to the catalogued data to maximise the chance that biological signals, if present in the dataset, would be discovered.
Feature selection to identify and annotate biomarkers with discriminative accuracy indicative of clinical utility.
Comprehensive reporting of results and identification of next actions.