Skip to content

Tried, tested and trusted

Industry Biopharma | Product e[datascientist]

Global Pharma


Comparing nucleic acid sequences of genes between tissues with disease and healthy controls can tease out vital clues about cancer prognosis and progression, supporting development of new therapeutics and diagnostics.

However, decoding genes using modern Next Generation Sequencing (NGS) approaches can generate terabytes of data, and it is difficult to find the nuggets of useful information from within the noise. At the start of the project the world’s leading research-based pharmaceutical company had already carried out a large NGS cancer study generating 50 to 80 million fragments of expressed RNA (sequence reads) per sample, and needed a reliable, scalable solution to ensure that the researchers could compare this data across disease and control quickly, accurately and manageably


The global pharmaceutical company turned to Eagle Genomics for help. After reviewing the project’s aims and goals, Eagle Genomics’ team analyzed the company’s current approach, including its existing technology and systems. The next step was to build a trial workflow based on this approach, i.e. using expression profiling of RNA sequence reads (RNA-Seq). The company then used the findings from this pilot to develop a process to compare the RNA sequences of the 100 tumours in the clinical study accurately and efficiently.

There was still a final challenge – how to access sufficient computational resources to be able to store, process and access the terabytes of data. Eagle solved this by developing a solution in a securely-protected cloud and transferring it onto the company’s servers only once it was finalised and ready to go. Using the analytics workflow solution provided a safe, secure and seamless transition from development to production deployment.



After just two months, the project went into the next phase – the pharmaceutical team was confident enough to ask Eagle Genomics to scale it up.This was a critical test of the workflow, as in addition to the increased data coming into the system, the process had to be able to manage a large volume of temporary data files.This required constant communication between Eagle and the pharma company, and Eagle’s support and fine-tuning ensured that this was a smooth and problem-free process from start to finish.

The scale-up step meant that the company was able to process and interpret expression levels in the RNA-Seq data from hundreds of tumours.The project just fourteen months from beginning to end, and has provided a template for the global pharma company to apply to other big data projects throughout the company.


  • Development of a process to compare the RNA Sequences of the 100 tumours in a clinical study accurately and efficiently
  • Solution developed in the highly secure Amazon cloud thereby allowing access to sufficient computational resources to be able to store, process and access the terabytes of data
  • Project scaled up after only 2 months. This next project phase meant the company was able to process and interpret expression levels in the RNA-Seq data from hundreds of tumours
  • This project template has formed the basis for other big data projects within the company



  • Accurate, fast and efficient comparison of RNA molecules in large volumes of data
  • Access to secure cloud storage and processing power
  • Seamless transaction from development to production deployment
  • A process that can be applied across other big data projects

Through our collaboration with Eagle Genomics, we have been able to successfully undertake RNA-Seq studies on a large population as part of our oncology trials. The technology is being extended and applied in other projects within the organisation.
Senior Scientist - Global Pharma company
Go Back
Topics: Next Generation Sequencing (NGS), RNA-Seq, cancer, therapeutics, diagnostics

Curious to know more?