Providing users with prioritised datasets based on scientific value allows for improved data selection, encourages data reuse and hence makes datasets more precious.
Systematic data prioritisation is at the heart of Eagle’s translational medicine platform. In this case study we show how our platform was used to prioritise data in the context of a specific customer project, namely the identification of genetic (haplotype) associations with skin cancer prognosis from publicly available information.
Our starting point for this project was the International Cancer Genome Consortium (ICGC) dataset, with over 20,000 patient donors. ICGC is unique in providing links to primary sequence data across many contributing projects. This provided our association analysis to include a greater number of samples than any single project such as The Cancer Genome Atlas (TCGA).
The general process for the translational medicine platform is shown in Figure 1. There are several software components used; e[catalog] for cataloguing the datasets, e[discover] for valuing and prioritising the data and e[hive] for running the association analysis. We will focus on e[discover] for this case study.