pipeline design

  • Are you looking to automate a manual analysis or annotation/curation process?
  • Is your dataset potentially massive and beyond the capacity of your existing systems or procedures?
  • Looking to run occasional large-scale analyses but not regularly enough to justify the purchase of in-house hardware?
  • Need to scale up your R&D genomics data management and analysis capacity to cope with the larger datasets from next-generation sequencing (NGS)?
  • Want to run larger and more complex genomic analyses but are being prevented from doing so by the size of your in-house compute cluster?
  • Are you looking for microRNA discovery, SNP annotation, comparative genomics, NGS assembly, genome annotation and gene prediction, biomarkers, short read mapping to the genome (for microarray probes), or anything related?

Regardless of whether the analysis is one-off, or is something that needs to be run regularly, Eagle Genomics' pipelines can help. Pipelines can be executed remotely on Eagle-managed resources and results delivered back to the customer, or they can be installed locally for direct use by the end-user – either behind a web interface or on the command-line, and on in-house hardware or in the cloud.

What are pipelines?

Pipelines, also known as workflows, are any defined series of analyses through which data is processed in a repeatable, consistent manner. The output of the first analysis is the input to the second, and so on. Complex pipelines include branch conditions which split the data flow and send each part along different routes depending on data content or the output of a certain analysis. Eagle specialises in designing and engineering the most efficient and straightforward pipeline designs possible.

Pipelines are perfect for tasks such as NGS analysis where large amounts of data need to have the same steps applied to each individual element in turn and with precisely the same parameters. In effect they are a method of automating the processes that used to take place by hand, which were often carried out by a bioinformatician entering commands at a computer terminal one at a time. Pipelines are much better at enforcing consistency of analysis and managing failures or exceptions or boundary cases that may occur during the analysis process.

Pipelines are to bioinformatics analysis what Henry Ford's Model T production lines were to the car industry. However, unlike Ford's somewhat restrictive offerings ("you can have any colour you like, as long as its black"), Eagle's pipelines are highly customised to the specific needs of each customer.

Features

Eagle's choice of pipeline platform, eHive, allows full access to the entire flow of data and logic that makes up the workflow design and execution. This level of control allows Eagle to integrate almost any analysis or tool into the pipeline and create extremely flexible and highly specific solutions that answer very specific research questions.

Eagle can work to configure eHive to work with any common grid or cluster management software including LSF and Condor. Eagle can also configure it to run in the cloud on Amazon EC2, or even on your local desktop if the analysis is small.

Once built, Eagle can either maintain and run the pipeline for you in the cloud and securely deliver the analysis results by email or on CD/DVD, or the pipeline can be installed locally with instructions for the customer to run it and reconfigure it themselves as required. Output can be delivered as files or loaded directly into existing database or systems, e.g. as a custom annotation track in Ensembl. 

If you're not sure about outsourcing this kind of work, or would like to read around the subject, why not start by reading our white paper on successful bioinformatics outsourcing. Alternatively, you might like to use our Elastic Eagle service to try out ideas before committing to a larger project.

More Details

Every pipeline is unique, and although Eagle keeps costs down by reusing as much as possible it is rare that there is already an existing design that does exactly what the customer needs. So, each project is treated individually and priced individually. To discuss the complexity of your pipeline and how much it might cost to implement, get in touch with Richard Holland (richard.holland@eaglegenomics.com or phone +44 (0)1223 654481 x3).