Hunting metabolomic biomarkers of acute pancreatitis by machine learning
Acute pancreatitis (AP) is a disease of the extremes. It strikes suddenly and then, for three quarters of patients, resolves within days. For the remaining quarter the disease rapidly escalates, often resulting in multiple organ failure and even death.
There are currently no reliable clinical tests to predict which course a patient is likely to follow. This is unfortunate in two regards. Firstly, prediction would allow a clinician to choose between a “wait and see” approach vs. aggressive treatment with accordant risks. Secondly, prediction would promote investment into drugs tailored for the severe form of the disease.
In the hunt for better molecular biomarkers for acute pancreatitis, GSK Discovery Partnerships with Academia (DPAc) had collaborated with the University of Edinburgh Department of Clinical Surgery on a prospective study.
Typical of molecular biomarker discovery, whilst the patient cohort was small (under 100) the number of measurements per patient was large (over a 1000) with detailed clinical, protein and metabolomic, including longitudinal, data types available.
The challenge was twofold; to identify features in this valuable and unique dataset that correlate with disease progression, and to build models to discriminate between severe and mild disease.
High-dimensional data of this nature, with its mix of continuous and discrete response variables, presents significant problems to traditional machine learning analysis, often leading to false positives and correlations that cannot be reproduced later on.
The GSK project team knew of Eagle Genomics through existing industry-academic collaborations, and was aware of Eagle’s track record finding disease biomarkers in complex biological data. Initial discussions suggested that Eagle’s systematic and technology-agnostic approach to machine learning would be ideal for the complex multi-dimensional task in hand. Eagle Genomics was also able to flexibly fit with GSK timelines and provide the analysis for a reasonable up-front cost.
In brief, the e[datascientist] platform for biomarker discovery involved:
Systematic application of multiple machine learning methods (support vector machine, penalized ordinal regression, logistic regression etc), to the catalogued data to maximize the chance that biological signals, if present in the dataset, would be discovered.
Feature selection to identify and annotate biomarkers with discriminative accuracy indicative of clinical utility.
Comprehensive reporting of results and identification of next actions.
This study has provided the partners with biologically-relevant insight that improves their molecular understanding of AP progression. The results are being used to justify investment into, and guide the design of, ambitious follow-up studies with much greater statistical power. By adopting the Eagle e[datascientist] platform for these studies the data was by design findable, accessible, interoperable and reusable (FAIR).
This study represents a significant progress towards a prognostic test of acute pancreatitis – a test that will enable better treatment and better drugs for this common and often fatal disease.