Smart Data Management; Microbiomics
Comparing nucleic acid sequences of genes between tissues with disease and healthy controls can tease out vital clues about cancer prognosis and progression, supporting development of new therapeutics and diagnostics.
However, decoding genes using modern Next Generation Sequencing (NGS) approaches can generate terabytes of data, and it’s difficult to find the nuggets of useful information from within the noise. At the start of the project the world’s leading research-based pharmaceutical company had already carried out a large NGS cancer study generating 50 to 80 million fragments of expressed RNA (sequence reads) per sample, and needed a reliable, scalable solution to ensure that the researchers could compare this data across disease and control quickly, accurately and manageably
The global pharmaceutical company turned to Eagle Genomics for help. After reviewing the project’s aims and goals, Eagle’s team analysed the company’s current approach, including its existing technology and systems. The next step was to build a trial workflow based on this approach, i.e. using expression profiling of RNA sequence reads (RNA-Seq).The company then used the findings from this pilot to develop a process to compare the RNA sequences of the 100 tumours in the clinical study accurately and efficiently.
There was still a final challenge – how to access sufficient computational resources to be able to store, process and access the terabytes of data. Eagle solved this by developing a solution in the securely-protected Amazon cloud and transferring it onto the company’s servers only once it was finalised and ready to go. Using the e[hive] analytics workflow solution provided a safe, secure and seamless transition from development to production deployment.
After just two months, the project went into the next phase – the pharmaceutical team was confident enough to ask Eagle to scale it up.This was a critical test of the workflow, as in addition to the increased data coming into the system, the process had to be able to manage a large volume of temporary data files.This required constant communication between Eagle and the pharma company, and Eagle’s support and fine-tuning ensured that this was a smooth and problem-free process from start to finish.
The scale-up step meant that the company was able to process and interpret expression levels in the RNA-Seq data from hundreds of tumours.The project just fourteen months from beginning to end, and has provided a template for the global pharma company to apply to other big data projects throughout the company.