Posts Tagged : science

ENCODE: A beachcomber’s guide to the genome

(CC BY 2.0)

ENCODE press coverage focused on their ‘de-junking’ of the genome. But semantic wrangling apart, what of the ENCODE legacy?

The public release of ENCODE (ENCyclopedia of DNA Elements) last week provides the likes of Eagle and our customers with a veritable cornucopia of new toys to play with. Of particular interest is the ENCODE Virtual Machine which contains much of the software used in the analysis. Although supplied as a VirtualBox VM, migration to AWS is reasonable. Whether ENCODE follows modENCODE and 1000 Genomes to EC2 remains to be seen, but is clearly to be encouraged.

Much of the immediate scientific reaction to ENCODE (see the roundup from OpenHelix) concerns their ‘de-junking’ of DNA, claiming that “80% of the genome is functional”. Really? Much of the argument is caused by ambiguities in the term ‘functional’. Following ENCODE it’s now clear that most genomic DNA ‘functions’ (verb) in a biochemical sense (i.e. sticks to the cellular machinery), but it does not follow that all these interactions have a ‘function’ (noun) in a biological sense (implying some wider purpose). For a scientific context see Sean Eddy’s excellent post on the subject. This distinction between ‘functional (v)’ and ‘functional (n)’ has proved problematic in the past. For example, the EBI Functional Genomics Group annotates the biological function of genes, whereas Ensembl FuncGen (also at the EBI) focuses on DNA binding. The latter is now referred to as “Ensembl Regulation” to avoid confusion.

Such nuances in definition are not new to genomics; the term ‘gene’ for example is variously used to refer to a ‘unit of heredity’ vs. a ‘genomic region producing an mRNA’. The distinct advantage of the latter; it’s much easier to comprehensively identify genes in genomic DNA using mRNA evidence than it is using the classical genetics definition. Which brings me to the reason I’m excited by ENCODE; much like RNA-Seq data is used to predict genes, the ENCODE data will be used to build of a comprehensive catalogue of ‘regulants’ (genomic regions putatively regulating mRNA transcription). The Ensembl Regulatory Build is, of course, the epitomy of this approach. Such catalogues will form invaluable frameworks for the systematic annotation of epigenetic processes in genomic context. We derive, therefore, a neat solution to satisfy both ends of the functional vs. functional debate.

To end with the thoughts of Chief ENCODE-ian Ewan BirneyThe real measure of a foundational resource such as ENCODE is not the press reaction, nor the papers, but the use of its data by many scientists in the future.

Genetics ‘cloud’ to create new opportunities for researchers and clinicians

Gene sequencing and analysis could be dramatically speeded up, leading to patients receiving a quicker and more accurate diagnosis, thanks to research led by Eagle Genomics Ltd.

Using cloud computing technology, the researchers have found they can slash the amount of time it takes to store the huge amounts of information produced when individual genes are sequenced and analysed.

Whereas at the moment this process can take up to three months, the scientists believe their new technique could mean results are produced in about a week.

Eagle Genomics, a leading open-source bioinformatics service provider, is carrying out the research in collaboration with The University of Manchester, and Cytocell Ltd., with assistance from NGRL, based at the Central Manchester University Hospitals NHS Foundation Trust, and the NIHR Manchester Biomedical Research Centre. The £500,000 project is part-funded by the UK’s national innovation agency, the Technology Strategy Board.

Access to the information analysed and stored by the cloud will enable medical researchers who are developing and testing new treatments to compare large amounts of information and find common genetic links.

The technology will also help clinicians to look at an individual patient’s genetic make-up to aid diagnosis and ongoing treatment.

Rather than simply testing a patient for one suspected condition, using the cloud technology could allow clinicians to test for a much wider range of complaints.

Currently, the NHS IT systems do not have the resources to cope with the huge demands required. The cloud system can be accessed from a separate site, away from hospitals, freeing up space.

The project will build upon the success of the Taverna Workflow Management System software developed by Professor Carole Goble’s myGrid team at The University of Manchester. Eagle Genomics will work with the University to adapt Taverna to allow non-IT experts to easily add and extract information and share it with their colleagues.

“Taverna is ideal for this project because it allows you to systematically automate the analysis processes of expert geneticists and make them easily available for other to use at the press of a button” said Professor Andy Brass of The University of Manchester.

Example applications identified and described by NGRL and Cytocell will provide a significant and valuable resource to help develop and demonstrate the efficacy of the resulting system.

“Genetic sequencing is an increasingly important diagnostic tool as well as being fundamental to many areas of research,” said Professor Graeme Black, Director of the NIHR Manchester Biomedical Research Centre and a consultant at Manchester Royal Eye Hospital.  “By storing genetic data in the ‘cloud’ indefinitely, we can use it for research studies and also to help clinicians to decide if medical conditions, that patients develop at any stage, may be linked to their genes.”

Abel Ureta-Vidal, CEO of Eagle Genomics Ltd., added: “Thanks to funding from the Technology Strategy Board, this project is looking at ways in which genetic data can be securely and confidentially stored, accessed and analysed only by approved users.”

The project, which started in July 2011, is on target for completion of a fully functional system with an initial selection of analyses available by December 2012. 

Notes to editors:

Eagle Genomics Ltd. is an outsourced bioinformatics services and software company specialising in genome content management and the provision of open-source solutions. Eagle consistently delivers quality and value-for-money for customers across the biotech sector, combining cloud and NGS expertise with a track record in building scalable, efficient genomics analysis workflows.

The University of Manchester, a member of the Russell Group, is one of the most popular universities in the UK. According to the results of the 2008 Research Assessment Exercise, The University of Manchester is now one of the country’s major research universities, rated third in the UK in terms of ‘research power’.

The myGrid team produce and use a suite of tools designed to “help e-Scientists get on with science and get on with scientists”. The tools support the creation of e-laboratories and have been used in domains as diverse as systems biology, social science, music, astronomy, multimedia and chemistry.

Cytocell Ltd. is a leading European developer and manufacturer of FISH probes for use in both routine cytogenetics and in the analysis and classification of Cancers. The Company’s products are well established in cytogenetics as the Company is celebrating its 20th year of supplying them to this market.

The NIHR Manchester Biomedical Research Centre was created by the National Institute for Health Research in 2008 to effectively move scientific breakthroughs from the laboratory. As a partnership between Central Manchester University Hospitals NHS Foundation Trust and The University of Manchester, the Biomedical Research Centre is designated as a specialist centre of excellence in genetics and developmental medicine.

Central Manchester University Hospitals NHS Foundation Trust is a leading provider of specialist healthcare services in Manchester, treating more than a million patients every year. Its five specialist hospitals (Manchester Royal Infirmary, Saint Mary’s Hospital, Royal Manchester Children’s Hospital, Manchester Royal Eye Hospital and the University Dental Hospital of Manchester) are home to hundreds of world class clinicians and academic staff committed to finding patients the best care and treatments.

NGRL provides dedicated support to UK genetic testing centres, focusing on health and bioinformatics, with the aim of bringing new technologies into diagnostic genetics services to the benefit of NHS patients.

The Technology Strategy Board is a business-led government body which works to create economic growth by ensuring that the UK is a global leader in innovation. Sponsored by the Department for Business, Innovation and Skills (BIS), the Technology Strategy Board brings together business, research and the public sector, supporting and accelerating the development of innovative products and services to meet market needs, tackle major societal challenges and help build the future economy.

For further information please contact:

Richard Holland, Operations and Delivery Director
Eagle Genomics Ltd., Babraham Research Campus, Cambridge
+44 (0)1223 654481 x3 /

Full survey results

Update: Go to this year's survey


As promised, and without further todo, here are the full results of the survey that Eagle Genomics ran prior to our 1st Annual Symposium on 5th April 2011 at Babraham, on the subject of "Provisioning Bioinformatics for the Next Decade: Are we prepared?".

Let's go through the questions one by one to see how the responses panned out. There were 118 respondents in total.

Question One


No surprises there – the majority of respondents were academics and non-profits. This may have skewed some of the subsequent responses, but when we broke down by academic vs. commercial we in fact found very little difference in responses, except in one area which we have detailed below.

Why were most of the respondents academics? Could have been because we heavily promoted this to the London BioGeeks network, whilst commercial outfits are generally more reticent in offering their opinion.

Question Two


Given that the majority of respondents were academic, you would expect to see a greater number of bioinformaticians in the organisation (light blue = >10). However, those organisations relying on only one bioinformatician were only about a quarter – most either had none, or at least 5. Do bioinformaticians only do things by extremes?

Question Three


Good to see that the majority of respondents were experienced bioinformaticians with at least 5 years experience, many with 10 or more. This suggests that the responses are based on real experience of the real world as opposed to a perception of it.

Question Four


Most respondents were sole operators, not surprising if tallied with the earlier response that most of them had no dedicated bioinformaticians in their organisation – this suggests that most of the respondents were postdocs or similar having had bioinformatics tasks delegated to them in addition to their normal duties. Of those that do manage people, most only had 2-4 people under them, suggestive of small academic groups rather than larger commercial hierarchies.

Question Five


What's hot and what's not? Gene expression, genomic variation, and other genomics activities are the current area of focus. In future there may be a shift towards proteomics, metagenomics, systems biology and pathways. Metabolomics is not a popular field and comparative genomics appears to be in decline even though it is currently very popular.

Question Six


People are currently most concerned about integrating disparate data sources, followed by genome assembly, resequencing, RNA-seq and comparative genomics. Microarrays are already on their way out and future use of related technologies, including proteomics and mass spec, is looking to be heading for a serious decline.

Question Seven


This one question is the only area where academic/non-profit and commercial respondents significantly differed. Overall, in-house computing, development, analysis are the status quo. Open-source software is wildly popular with not many people seeing any increase in the use of commercial solutions. All this looks like it is unlikely to change, with the exception of cloud computing. Almost a third of respondents said they would be using cloud computing in future – a big leap in terms of potential market share for cloud computing vendors?


When it came to academics vs. commercial respondents, the attitudes show a small but appreciable increase in outsourcing in the commercial sector, and a much bigger usage of commercial software solutions amongst the same people. 

Question Eight


Well, surprise surprise, everyone owns a big cluster and lots of servers! Although interestingly a quarter of respondents say they do their bioinformatics on their desktop PCs. Are PCs more powerful than before, or are datasets getting smaller and more manageable? Only a tiny proportion presently run their analyses on the cloud.

[Apologies for the missing purple legend, this should say 'On the cloud']

Question Nine


Of all the positive terms given as options, not even close to a majority considered open-source bioinformatics tools to be worthy of any of these accolades. In fact, the majority were ambivalent – suggestive of an audience who realise that the tools are not great but still have the technical skills to overcome those shortcomings. The worst score of all went to ease of integration where most people believe open-source bioinformatics tools are hard to integrate with each other. This is true – think of the hundreds of poorly documented formats and data IO methods there are out there. Maybe some serious thought should be put into making tools work with each other nicely (is that why there are so many workflow tools on the market?).

Question Ten


We were not very surprised by this last question. The biggest concerns of bioinformaticians is that their tools are scientifically validated and won't fall over halfway through a big analysis. Ease of use, integration, and security were all way down the priority list. Why is that? Probably because the respondents were back-end users who are technical experts and capable of working around sticky issues like usability. If the respondents had been front-end users who saw nothing except the interface, then the responses may have been very different. Note that the majority of people thought training was good to have, but not essential (most tools can be self-taught?), and in support of the previous question the biggest request was to have tools better integrated.

Take-home message

Overall, this survey shows no surprises. but the takeaway messages for the open-source bioinformatics community of developers are:

  1. Integrate your tools better.
  2. Make them stable.
  3. Scientifically validate them by publishing the algorithms.
  4. Offer training.
  5. Make them server/cluster/cloud aware by default – most people don't run stuff on desktop PCs.
  6. Genomics is the major growth area.
  7. Most bioinformatics teams are one-man bands with no dedicated resources – so make it easy on them to install your stuff.

Hope you enjoyed this review of the survey results! Full (anonymised) raw data is available on request

Eagle Genomics sponsors BOSC 2011

CAMBRIDGE, 20th April 2010: Open Source software has flourished in the bioinformatics community for well over a decade. When the first BOSC (Bioinformatics Open Source Conference) was held in 2000, there were already a number of popular open source bioinformatics packages, and the number and range of these projects has increased dramatically since then.

One of the hallmarks of BOSC is the coming together of the open source developer community in one location to meet face-to-face. This creates synergy where participants can work together to create use cases, prototype working code, or run bootcamps for developers from other projects as short, informal, and hands-on tutorials in new software packages and emerging technologies. In short, BOSC is not just a conference for presentations of completed work, but is a dynamic meeting where collaborative work gets done and attendees can learn about new or ongoing developments that they can directly apply to their own work.

This year, BOSC has an exciting lineup of topics and speakers, including keynote speakers Matt Wood (the Technology Evangelist for Amazon Web Services) and Lawrence Hunter (director of the Computational Bioscience Program at the University of Colorado, and one of the founders of ISMB). "BOSC is one of my favourite bioinformatics conferences," said Matt Wood. "I'm planning to discuss the role cloud computing plays in increasing the impact of open source tools, and I look forward to engaging in a lively conversation with other BOSC attendees."

"I've been involved in BOSC organization since the first BOSC in 2000," said Co-Chair Nomi Harris, "and I think this year's is going to be one of the best yet! I'm particularly looking forward to the panel discussion about meeting the challenges of inter-institutional collaborations."

Thanks to generous support from Eagle Genomics and another sponsor, BOSC will offer three US$250 Student Travel Awards to the best student abstracts submitted by the April 18th deadline.

"Eagle Genomics are proud to support BOSC for the second year running. BOSC's evangelism of open-source bioinformatics fits perfectly with Eagle's ideology and complements our innovative partnerships built with key open-source projects in the bioinformatics domain," said Richard Holland, Operations and Delivery Director at Eagle Genomics. "It is important that the open-source community is supported and encouraged to keep on producing the invaluable free software and data resources that have become such a cornerstone of modern biological research."

About Eagle Genomics

Eagle Genomics is a bioinformatics services and software company specialising in genome content management and the provision of open-source solutions in and out of the cloud. Eagle's team of high calibre developers and researchers is drawn from both industry and academia with extensive experience in bioinformatics and software development. Based in Cambridge, UK, Eagle is situated at the heart of Europe's biggest biotech cluster.

About BOSC

The Bioinformatics Open Source Conference (BOSC), a satellite of the ISMB (Intelligent Systems for Molecular Biology) conference, is now in its twelfth year as the leading annual meeting for those who develop or use open source bioinformatics software. BOSC 2011 ( will take place July 15-16, 2011, in Vienna, Austria.

Cognizant and Eagle Genomics to Work with Pistoia Alliance to Develop a Cloud-based Platform for Streamlining Sequence Services

LONDON, April 12, 2011 /PRNewswire/ — Cognizant (NASDAQ: CTSH), a leading provider of consulting, technology, and business process outsourcing services, and Eagle Genomics Ltd., a bioinformatics software company specializing in genomic data management and integration, today announced they are working with the Pistoia Alliance, Inc., a nonprofit, precompetitive alliance of life science companies and vendors, as one of the groups engaged to develop a conceptual cloud-based platform to facilitate access to public and proprietary sources of gene sequence data.

The Pistoia Alliance's sequence services working group aims to define and document an externally hosted service for securely storing and mining both proprietary derived gene/sequence information and public domain gene databases. This conceptual platform developed by Cognizant and Eagle Genomics, as part of this piloting stage, will enable working group companies to securely share their bioinformatics resources among simultaneous, registered users in a secure, encrypted environment, while leveraging the flexibility, scalability, and cost-efficiencies of a cloud-based Software as a Service (SaaS) platform. The future of collaboration and externalization within the life sciences industry will increasingly utilize hosted information services, and the Pistoia Alliance expects to run future pilots to further explore this business model involving a range of participants.

"This engagement supports the Pistoia Alliance's goal to inspire different ways of thinking in the life sciences industry and effect real change to benefit all our organizations," said Nick Lynch, President at Pistoia Alliance. "With the combined strengths of Cognizant and Eagle Genomics and the broader Pistoia community, we will build a platform to define standards in sequence services, while overcoming the challenges of disparate data and tools."

Cognizant and Eagle Genomics will combine the best of their consulting, domain, technology, and business process expertise to effectively deliver the business solution. While Eagle Genomics will contribute specialized bioinformatics knowledge, Cognizant will manage the development of the platform, oversee testing and security validation, and help strengthen the initiative by managing relationships with existing and potential member organizations. The platform will deploy a secure and scalable installation of Ensembl, a software system and supporting database developed jointly by the Wellcome Trust Sanger Institute and the European Bioinformatics Institute to produce and maintain automatic annotation on selected eukaryotic genomes. The platform will deliver a Plasmapper and a gene alias service as part of the initial functional services.

"Eagle and Cognizant have demonstrated a deep understanding of the open-source bioinformatics world and how best to adapt and support these publicly available resources to meet the high standards required by the leading pharmaceutical companies that are members of the Pistoia Alliance," said Richard Holland, co-founder and Operations and Delivery Director at Eagle Genomics Ltd.

"We look forward to partnering with Eagle Genomics in helping Pistoia Alliance address the challenges of fast evolving sequence services while streamlining communication and workflows," said Peter Sheppard, Assistant Vice President, Life Sciences practice at Cognizant. "We are committed to leveraging our domain-intensive global resources, deep understanding of drug development processes and cloud computing models, and global program management and delivery capabilities to build and manage a platform that supports the Pistoia Alliance's aim to lower barriers to innovation by improving the interoperability of business processes, data, and technology interfaces in the life sciences research industry."

About the Pistoia Alliance

The Pistoia Alliance is a global, not-for-profit, precompetitive alliance of life science companies, vendors, publishers, and academics that aims to lower barriers to innovation by improving the interoperability of R&D business processes. Initially conceived in 2007 by informatics experts at four "Top Five" pharma companies, the Pistoia Alliance now includes over 45 member companies. By assembling and aggregating common use cases, identifying specific, high-value areas of opportunity, and exploiting contemporary technologies and service delivery models, the Pistoia Alliance serves as a hub for envisioning information-based solutions that will drive innovation and productivity in the precompetitive domains of life science R&D. Learn more about the Pistoia Alliance by visiting

About Eagle Genomics Ltd

Eagle Genomics Ltd. is a bioinformatics software company specialising in the provision of open-source solutions for genomic data management and integration. Based in Cambridge, UK at the heart of Europe's largest biotech cluster, the company has rapidly become one of the leading providers of open-source bioinformatics technical support to customers around the world. Eagle can build bespoke solutions based around open-source platforms that fit exact requirements and run them anywhere from desktops to grids, clusters to clouds. For more, please visit:

About Cognizant

Cognizant (NASDAQ: CTSH) is a leading provider of information technology, consulting, and business process outsourcing services, dedicated to helping the world's leading companies build stronger businesses. Headquartered in Teaneck, New Jersey (U.S.), Cognizant combines a passion for client satisfaction, technology innovation, deep industry and business process expertise, and a global, collaborative workforce that embodies the future of work. With over 50 delivery centers worldwide and approximately 104,000 employees as of December 31, 2010, Cognizant is a member of the NASDAQ-100, the S&P 500, the Forbes Global 2000, and the Fortune 1000 and is ranked among the top performing and fastest growing companies in the world. Visit us online at or follow us on Twitter: Cognizant.

Survey preliminary results

At the Eagle Genomics Symposium on Tuesday, our Technical Director Will Spooner presented preliminary results from the survey we ran in parallel with the registration process. Will is going to post a full analysis on this blog soon, but for now here are the key points he picked out as important during his summary at the symposium.

Data integration is the biggest current technology concern. 60% of respondents said it was already something they were working on. The next biggest area was NGS, with an average of 40% saying they were already working on it, and a further 25% saying they would be in the near future. Barely 25% of people said they were currently working on microarray data, and only about 15% on proteomics and mass spec, with both of these fields showing not much anticipated future work either (when compared to the scale of interest in NGS).

When it comes to delivering these analyses, 90% do it on in-house hardware, with 25% including some element of cloud – but a whopping additional 40% intend to use the cloud in the future, meaning that cloud is becoming established as a serious competitor to in-house data centres. Most people do not use and do not plan to use the third option of outsourced computing, e.g. renting time on a third-party cluster.

75% prefer to do their software development work in-house and the stats show that this is not expected to change significantly. The same goes for doing data analysis in-house vs. outsourcing it, although there was a slight preference for outsourcing development over analysis.

The survey did prove one thing we already knew – everyone loves open-source bioinformatics! 75% currently use it, with only 35% currently using commercial solutions. Users of both groups of software didn't perceive any likely change, which suggests that open-source bioinformatics software could account for up to 66% of the bioinformatics software market by user base.

All in all, the strongest message is that people are into the cloud, data integration, and NGS, which just happen to be Eagle's core skills. We see a clear niche here for helping people working in these areas and the survey confirms that whilst the old-fashioned model of in-house hardware and developers is still very much entrenched, change is on the horizon.

Exciting times! I'm sure Will will have plenty more to say and interesting observations to make when he publishes the full results. 

Symposium Abstract #7 – Mick Watson

This is the seventh of the abstracts for talks being given at the 1st Eagle Genomics Symposium, "Provisioning Bioinformatics for the Next Decade – are we prepared?". Here we will hear from Mick Watson, Director at ARK Genomics. Mick will talk on the subject of "Meeting the global challenge of food security: bioinformatics in the animal health and welfare sector."

Here is Mick's abstract:

"Research in the livestock and veterinary health sector is arguably more important than research into human health, yet receives a fraction of the funding from both government and the private sector.  We need to feed the World, and at the same time mitigate the effects of increased livestock production on the environment.  Sir John Beddington estimates that the global demand for food will increase by 50% by 2030; an FAO report estimates global production of meat will double by 2050.  The livestock sector employs an estimated 1.3 billion people, creates livelihoods for one billion of the World's poor and provides approximately one-third of humanity's protein intake.  Growing populations and incomes, along with changing food preferences, are rapidly increasing demand for livestock products.  How are we going to meet this demand?  Science can contribute by producing better vaccines and drugs, and by driving genetic improvement programmes.  However, the profit margins of animal health companies are dwarfed by those in human health, therefore there is an increased burden on academic funding sources.  Even so, government investment in the sector is less than in human health.  All of this comes in the context that veterinary health researchers deal with multiple hosts and multiple pathogens.  The genomes of the World's farmed species are in a mixed state, and many are now benefitting from the revolution in next-generation sequencing.  However, those technologies bring their own challenges.  As in many areas of science, there is a large, growing demand for high quality bioinformatics support and research.  Datasets are getting bigger and biologists are now incapable of interpreting them without expert help.  Sourcing the necessary resources and skills to deliver the predicted growth in animal production, against the background of large cuts to University funding and an economic recession, is a huge challenge which will require a mixed model of funding, large scale collaboration and an increased reliance on an open-source software model."

If you're interested in what Mick and our other speakers have to say, why not sign up and come along! 

Eagle and Manchester sign KTA for Taverna

Eagle Genomics Ltd. has signed a collaboration agreement with The University of Manchester to provide commercial support for Taverna, the open-source Workflow Management System.

The partnership will see Eagle Genomics take on responsibility for providing commercial support for Taverna. Users of Taverna will benefit from an increased range and quality of support options, whilst Taverna’s developers will be able to focus on developing innovative new features for the Taverna product line.

Taverna, part of the myGrid family of projects, is a computer program for designing and executing workflows. Workflows represent scientific experiments and form a key part of biological research as scientists connect together a series of steps to transform their research data into knowledge.

Taverna’s open-source nature allows it to be deployed universally without restriction and at no financial cost, providing scientists with easy access to a high-quality and well-regarded workflow tool.

The Taverna team’s focus on bioinformatics has helped Taverna become the workflow system of choice for scientists in life science labs around the world.

The Taverna team at the University has been awarded Manchester EPSRC Knowledge Transfer Account funding to support their interaction with Eagle Genomics.

The aim of the engagement is to aid Eagle Genomics in the creation of commercial grade support offerings based around the Taverna products.

As a result the company will be able to confidently and expertly provide technical support, customised additions and user training focused on industrial needs. The ongoing collaboration between Eagle Genomics and the University will ensure that they will work together on the Taverna products and do not duplicate efforts.

Professor Carole Goble, of The University of Manchester’s School of Computer Science, said “The partnership with Eagle is an exciting opportunity to make Taverna tools a viable and supported part of the commercial software ecosystem. Researchers will also benefit from the commercial grading of Taverna by it being able to efficiently handle larger amounts of data”.

Abel Ureta-Vidal, CEO of Eagle Genomics Ltd., said “This is a great opportunity to demonstrate Eagle Genomics ’ continued engagement with academic institutions to help spread the impact of the world-class bioinformatics research done in the UK. At the same time, customers can invest in the use of the Taverna workflow management system with peace of mind, knowing that they can get professional support from a company that was set up to solve their specific issues.”

What do Eagle pipelines do?

Eagle Genomics can produce a vast array of genomic data analysis pipelines each tailored to your specific needs. Our pipeline design and construction service is as hands-on or as hands-off as you require – if you have a specific workflow in mind, we can code it, but if you have a general question you just don’t know the answer to, we can help you come up with a plan to answer it.

Our definition of a genomic data analysis pipeline is a pipeline that handles genomic data – i.e. at some stage in the process (although not necessarily exclusively throughout the process) it will be looking at sequence, genome annotations, or genome coordinates. Our area of expertise lies firmly in this field at the moment and we do not believe in working on projects that we are less than expert in.

We don’t make a habit of categorising our pipelines but sometimes it is useful to list out a range of possible areas in which we could make ourselves useful:

  • Sequence assembly services
  • miRNA discovery services
  • BioMarker discovery and detection services
  • Plant and animal trait detection services
  • Comparative genomics services
  • Microarray services
  • Custom genome building services
  • Genome annotation services
  • Multiple sequence alignment services
  • Sequence search services
  • Public private data integration services
  • and many others

Our pipelines are all constructed using open-source components making them entirely configurable and customisable to meet your exact requirements. Having said that, if there is a commercial component available that does the job better and is appropriately licensed and priced then we would of course include it in the pipeline – the choice of tools is determined by functionality not ideology.

Changing understandings

Reading an article on the BBC website ('Computers show how wind could have parted Red Sea') this morning reminded me how sometimes our understanding of the world can change.  Here is a case of what many people would have thought of as  being a colourful but unlikely story,  gaining some new evidence/theory/insight, and moving toward being perceived as an actual  event that we might be able to explain scientifically.

Often in science our understanding of whats really happening can change completely. The case of micro-RNAs is an excellent example. In the early 90's only a couple of MiRNAs were known, and these were found in worms. And so in the early years of miRNAs 'life in science' they were thought of as a oddity or an artifact and surely had no significant function.   Now, less than 20 years later, the number of known miRNAs submitted to miRBase stands at 15172 and is still increasing rapidly, together with the number of species they have been found in.  And of course it turns out they also have some important roles in gene regulation which makes them such a hot topic to study today.

And changing understandings is also very much something we face working in the world of Open Source bioinformatics at Eagle Genomics.  Here are a couple  of example misconceptions:

– 'it would be cheaper to employ a new internal person than outsource the work'.   Factor in the time for a new person to get their head round a project and all the other overheads involved, and in reality it can often be cheaper outsourcing bioinformatics. Even for an academic group ( we have an academic rate ).

– 'Open-source software means badly written and unsupported'.  Sometimes yes! But there is so much great open-source software out there that is well written and supported that it would be a mistake to overlook it because of the bad stuff.  We spend time identifying the software worth using and also provide support where needed.