<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Eagle Genomics</title>
	<atom:link href="http://www.eaglegenomics.com/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.eaglegenomics.com</link>
	<description>Putting science into the cloud</description>
	<lastBuildDate>Wed, 22 Feb 2012 10:29:11 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
		<item>
		<title>Native Amazon workflows for bioinformatics</title>
		<link>http://www.eaglegenomics.com/2012/02/native-amazon-workflows-for-bioinformatics/</link>
		<comments>http://www.eaglegenomics.com/2012/02/native-amazon-workflows-for-bioinformatics/#comments</comments>
		<pubDate>Wed, 22 Feb 2012 10:29:11 +0000</pubDate>
		<dc:creator>Richard Holland</dc:creator>
				<category><![CDATA[Bioinformatics]]></category>
		<category><![CDATA[Cloud]]></category>
		<category><![CDATA[Shell scripting]]></category>

		<guid isPermaLink="false">http://www.eaglegenomics.com/?p=2947</guid>
		<description><![CDATA[Today, Amazon&#39;s Simple Workflow Service (SWF) was launched in beta mode and marketed as &#34;a workflow service for building scalable, resilient applications&#34;. What does this mean for bioinformatics? Quite a lot, probably. One of the biggest headaches of any major bioinformatics task is orchestrating a workflow (or pipeline) to carry out batches of data analysis...]]></description>
			<content:encoded><![CDATA[<p>Today, Amazon&#39;s <a href="http://aws.amazon.com/jp/swf/">Simple Workflow Service</a> (SWF) was launched in beta mode and marketed as &quot;a workflow service for building scalable, resilient applications&quot;.</p>
<p>What does this mean for bioinformatics? Quite a lot, probably. One of the biggest headaches of any major bioinformatics task is orchestrating a workflow (or pipeline) to carry out batches of data analysis in a reproducible and consistent manner. For instance, you might have a LIMS (laboratory information management system) to manage the flow of samples through your lab from test tube to sequencing machine, but you&#39;d also need a workflow to process the output of the sequencing machine into usable information. SWF is designed in such a way that it could easily take on the job of both LIMS and workflow as it allows for human interaction as well as fully automated processes.</p>
<p>Existing workflow tools such as eHive, Taverna, Knime, Pipeline Pilot, Galaxy, etc., were all initially designed long before the days of the cloud when everyone either analysed data on standalone machines (because pre-NGS most data was relatively small) or had a large in-house compute cluster with suitable job management software installed (e.g. SGE, LSF, Condor, etc.) for the workflow software to interact with. Upon the advent of the cloud, all these systems (and their competitors) sprouted cloud-compatible versions but the nature of their compatibility is tenuous at best in most cases (eHive being one good exception to the rule) &#8211; many just wrapped up existing tools and packaged them as cloud images with exactly the same limitations and restrictions as the originals.</p>
<p>Amazon&#39;s SWF is the first workflow management system to be designed specifically for the cloud by the very people who built the most popular cloud in the first place, know exactly how it works, and how to take best advantage of it to manage this type of task. </p>
<p>Like most other Amazon APIs it is HTTP-based, but has extended client APIs available in Java, .NET, PHP and Ruby, plus a richly-featured development SDK in Java for those wishing to get their hands really dirty. Whilst the workflow itself is managed from Amazon&#39;s servers, clients can run on the cloud or on local hardware, mobile phones, or can be human beings interacting with a web interface. Importantly for life science companies dealing with commercially confidential or sensitive data, this means that companies can use SWF to co-ordinate their workflows without needing to upload their private data into the cloud, putting SWF into direct competition with all existing workflow technology from other vendors (Taverna, Galaxy, etc.).</p>
<p>For bioinformatics specifically there is much to be excited about. A true cloud-compatible workflow environment means that more efficient use can be made of the cloud than ever before, which should help speed up the more complex analyses that biologists need on a regular basis. Access costs are very low (in the order of a tiny fraction of cents per workflow execution) which will help encourage experimentation and exploration of this new technology for scientific use. The lack of a Perl API may make adaptation of older scripts a little hard, as Perl is very common in bioinformatics, but it is not insurmountable &#8211; it is easy to set up an SWF-compliant script in another language that then simply calls out to an existing Perl script to carry out the tasks required.</p>
<p>All in all SWF looks great, and we can&#39;t wait to see the first useful bioinformatics workflow implemented on it. In fact I&#39;m pretty sure a few of us here at Eagle have already started work&#8230;</p>
]]></content:encoded>
			<wfw:commentRss>http://www.eaglegenomics.com/2012/02/native-amazon-workflows-for-bioinformatics/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Oxford Nanopore and the promise of pay-as-you-go sequencing</title>
		<link>http://www.eaglegenomics.com/2012/02/oxford-nanopore-and-the-promise-of-pay-as-you-go-sequencing/</link>
		<comments>http://www.eaglegenomics.com/2012/02/oxford-nanopore-and-the-promise-of-pay-as-you-go-sequencing/#comments</comments>
		<pubDate>Tue, 21 Feb 2012 12:13:37 +0000</pubDate>
		<dc:creator>Will Spooner</dc:creator>
				<category><![CDATA[Bioinformatics]]></category>
		<category><![CDATA[blog]]></category>
		<category><![CDATA[Cloud]]></category>

		<guid isPermaLink="false">http://www.eaglegenomics.com/?p=2930</guid>
		<description><![CDATA[It&#39;s always exciting to hear of a new sequencing technology approaching fruition, and Oxford Nanopore&#39;s emergence from &#34;stealth mode&#34; at the AGBT meeting in Florida last week especially so (good coverage here). The technology is appealing as it measures a&#160;single DNA molecule, thus simplifying sample preparation, using integrated electrical sensors, substantially reducing instrument size and...]]></description>
			<content:encoded><![CDATA[<p>It&#39;s always exciting to hear of a new sequencing technology approaching fruition, and <a href="http://www.nanoporetech.com/">Oxford Nanopore</a>&#39;s emergence from &quot;stealth mode&quot; at the <a href="http://agbt.org/">AGBT</a> meeting in Florida last week especially so (good coverage <a href="http://www.genomesunzipped.org/2012/02/making-sequencing-simpler-with-nanopores.php">here</a>). The technology is appealing as it measures a&nbsp;single DNA molecule, thus simplifying sample preparation, using integrated electrical sensors, substantially reducing instrument size and complexity compared with optical sensors. I would argue that these two attributes are the hallmarks of true &#39;third generation&#39; (3G) sequencing.</p>
<p>Assuming that the technology lives up to the hype, how will bioinformatics be driven by 3G sequencing? We&#39;ve already had to adapt to various high-throughput sequencing platforms spewing data with different read length/error models during the transition from Sanger to 2G (next-gen). Alongside the advances in bioinformatics algorithms and workflows, there has been a cascade of capability, with genomics core facilities now able to provide services that were previously the exclusive domain of genome institutes. Going by the &quot;USB drive&quot; sized prototype sequencers exhibited last week, with an expected price tag of $900, one can only assume that the cascade will continue with 3G, from core to lab, soon to reach researcher&#39;s desktop.</p>
<p>What will be the bioinformatics needs of &quot;desktop sequencing&quot;? A new breed of super-efficient GPU-exploiting desktop sequence analysis software? Or (and those familiar with this blog will be unsurprised by the suggestion) a Microsoft-esque cloud service for sequence management and analysis? The latter has a certain resonance when coupled with Oxford Nanopore&#39;s CTO (Clive Brown&#39;s) description to their technology as &quot;sequencing on demand&quot; (<a href="http://www.nytimes.com/2012/02/18/health/oxford-nanopore-unveils-tiny-dna-sequencing-device.html">New York Times</a>). How better to couple &quot;sequencing on demand&quot; than with &quot;computing on demand&quot;.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.eaglegenomics.com/2012/02/oxford-nanopore-and-the-promise-of-pay-as-you-go-sequencing/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Eagle Genomics second annual survey of trends in bioinformatics</title>
		<link>http://www.eaglegenomics.com/2012/02/eagle-genomics-second-annual-survey-of-trends-in-bioinformatics/</link>
		<comments>http://www.eaglegenomics.com/2012/02/eagle-genomics-second-annual-survey-of-trends-in-bioinformatics/#comments</comments>
		<pubDate>Wed, 15 Feb 2012 17:25:21 +0000</pubDate>
		<dc:creator>Ivan Karabaliev</dc:creator>
				<category><![CDATA[Bioinformatics]]></category>
		<category><![CDATA[blog]]></category>

		<guid isPermaLink="false">http://www.eaglegenomics.com/?p=2887</guid>
		<description><![CDATA[Eagle Genomics are conducting a second survey of current issues in operational bioinformatics. We build on the success of last year&#39;s survey which polled over 100 worldwide responses from both industry and academia. A report detailing the results of our first survey can be viewed here.&#160; See also Genomeweb&#39;s coverage. One of the highlights was...]]></description>
			<content:encoded><![CDATA[<p><img align="left" alt="Image by albertogp123 under CC BY 2.0 licence" height="160" src="http://www.eaglegenomics.com/wp-content/uploads/survey.jpg" width="240" /></p>
<p>Eagle Genomics are conducting a second survey of current issues in operational bioinformatics.</p>
<p>We build on the success of last year&#39;s survey which polled over 100 worldwide responses from both industry and academia. A report detailing the results of our first survey can be <a href="http://www.eaglegenomics.com/2011/04/full-survey-results/" target="_blank">viewed here.</a>&nbsp; See also <a href="http://www.genomeweb.com/informatics/qa-eagle-survey-bioinformaticians-finds-data-integration-priority-interest-cloud" target="_blank">Genomeweb&#39;s coverage</a>. One of the highlights was that data integration was one of the largest technology concerns for the community, and while both academic and commercial groups relied to a large extent on in-house data centers, they were considering shifting to the cloud in the future.</p>
<p>We are excited this year to see how the field has shifted over the past 12 months!</p>
<p>In appreciation of your support we will enter all participants in a draw for a $100 Amazon voucher.</p>
<p><a href="http://www.surveymonkey.com/s/F8S7DDW"><img alt="" height="40" src="http://www.eaglegenomics.com/wp-content/uploads/button.png" width="199" /></a></p>
<p>This should take no longer than 5 minutes.</p>
<p>After the survey closes on 28th March, answers will be summarized and represented as a report freely available from Eagle Genomics&#39; website. This report will be of great relevance to both policy makers and software developers as they plan for the coming decade.</p>
<p>The answers will also be discussed at our symposium: &quot; <a href="http://www.eaglegenomics.com/symposium2012">The Next 10 Years of Genome Content Management</a> &quot; on 29th March.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.eaglegenomics.com/2012/02/eagle-genomics-second-annual-survey-of-trends-in-bioinformatics/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>The Magellan Report for Cloud Computing in Science</title>
		<link>http://www.eaglegenomics.com/2012/02/the-magellan-report-for-cloud-computing-in-science/</link>
		<comments>http://www.eaglegenomics.com/2012/02/the-magellan-report-for-cloud-computing-in-science/#comments</comments>
		<pubDate>Fri, 10 Feb 2012 11:32:32 +0000</pubDate>
		<dc:creator>Richard Holland</dc:creator>
				<category><![CDATA[Bioinformatics]]></category>
		<category><![CDATA[Cloud]]></category>

		<guid isPermaLink="false">http://www.eaglegenomics.com/?p=2870</guid>
		<description><![CDATA[In December, the US Department of Energy (DOE) published its Magellan Report on Cloud Computing in Science, comparing public cloud services to two in-house high-performance computing (HPC) data centres at&#160;Argonne Leadership Computing Facility (ALCF) and the National Energy Research Scientic Computing Center (NERSC). Although the title is very general, the report specifically focuses on the...]]></description>
			<content:encoded><![CDATA[<p>In December, the US Department of Energy (DOE) published its <a href="http://www.nersc.gov/assets/StaffPublications/2012/MagellanFinalReport.pdf">Magellan Report on Cloud Computing in Science</a>, comparing public cloud services to two in-house high-performance computing (HPC) data centres at&nbsp;Argonne Leadership Computing Facility (ALCF) and the National Energy Research Scientic Computing Center (NERSC). Although the title is very general, the report specifically focuses on the needs of the DOE&#39;s own scientists and research requitements and appears not to have taken the wider community into account. Still, it makes some interesting points &#8211; so here are my reactions to its key findings:</p>
<h2>Finding 1. Scientic applications have special requirements that require solutions that are tailored to these needs.</h2>
<p>The claim is made that scientific applications are special because they &quot;rely on access to large&nbsp;legacy data sets and pre-tuned application software libraries&quot; which are currently addressed by HPC setups that have&nbsp;&quot;low-latency interconnects and rely on parallel file systems&quot;, giving a set of &quot;unique software and specialized hardware requirements&quot;. Science needs to stop thinking of itself as special &#8211; the kinds of data processing problems faced by science are no different to many of those faced in finance or logistics in terms of their scale, complexity, and structure of data. Whilst it is true that the DOE may be dealing with particularly complex datasets related to environmental and biological research, this does not make their situation unique.</p>
<p>The perceived issues faced in moving from HPC to cloud are to do with addressing paradigm changes, but are the result of trying to make simple like-for-like comparisons.&nbsp;The report authors suggest that cloud&#39;s inability to exactly replicate HPC architectures and performance is more of a roadblock to the use of cloud than the reluctance of scientific software developers to optimise their algorithms to cloud environments, whereas I would suggest the opposite is true.</p>
<p>Blended into this finding in the report is a claim about incompatibility of business models &#8211; clouds work on a pay-per-use basis whilst scientists have an &quot;open-ended need for resources&quot; with an implied reluctance to have to account for the resources they use. That&#39;s not an obstacle, that&#39;s just a change needed in the way that IT budgets are allocated to science, and a change that is probably well overdue at that. If scientists paid the true cost of accessing existing HPC resources by paying for it in direct proportion to their usage of it, and grant providers stopped preferring the purchase of expensive (and often partially redundant) dedicated hardware over the more efficient use of shared or external resources, then this argument would no longer stand. The report says that &quot;the cost model for&nbsp;scientic users is based on account allocations&quot;, well, there is no reason that this couldn&#39;t be provided for on the cloud as well via some kind of dedicated institute accounts managed by the IT department.</p>
<h2>Finding 2. Scientic applications with minimal communication and I/O are best suited for clouds.</h2>
<p>&quot;Performance of tightly coupled applications running on virtualized clouds using&nbsp;commodity networks can be signicantly lower than on clusters optimized for these workloads&quot;. Yes, true. But, the cloud can scale to a much larger number of nodes than most HPCs have, and definitely more nodes than most smaller research departments have access to, meaning that although each individual node is slower the total number of nodes available can help make up for this. Plus, as you only pay for the nodes whilst they are being used, they can be cheaper than keeping a set of fixed nodes up and running waiting for work to appear. So although the report is technically correct in pointing out that the test data ran &quot;7x&nbsp;slower at 1024 cores on Amazon Cluster Compute instances&quot; than it did on the DOE&#39;s HPC centres, that is not necessarily telling the whole story. The comment above about reluctance to optimise algorithms specifically for the cloud could equally well apply here.</p>
<h2>Finding 3. Clouds require signicant programming and system administration support.</h2>
<p>It seems as though the authors are comparing direct access to cloud resources against sys-admin mediated access to HPC resources. It is odd to suggest that a move from HPC to cloud would result in requiring scientists to do all the technical work themselves. HPCs already take huge amounts of programming and sys-admin support to operate and so a move to cloud would simply see the HPC staff working on cloud resources instead of in-house resources. Scientists would not see any difference if the move were managed properly. This finding is nonsense!</p>
<h2>Finding 4. Signicant gaps and challenges exist in current open-source virtualized cloud software stacks for production science use.</h2>
<p>This finding is true, but it applies only to deployment of private in-house clouds, i.e. installing cloud software into existing data centres. It does not apply to public cloud services although it omits to mention this.</p>
<h2>Finding 5. Clouds expose a different risk model requiring different security practices and&nbsp;policies.</h2>
<p>True. But&#8230; only if you permit users to create their own images. If you are offering managed services on a private cloud that use cloud technology behind the scenes to coordinate work whilst still presenting traditional HPC-style interfaces to the end users thus restricting their ability to run arbitrary code, then the issues in this finding are greatly reduced in severity.</p>
<h2>Finding 6. MapReduce shows promise in addressing scientic needs, but current implementations have gaps and challenges.</h2>
<p>Absolutely. The issue here is that science relies a lot on scripted/interpreted languages such as Perl, Python, and Ruby, whilst cloud models (being from enterprise computing backgrounds) use more complex (semi-)compiled languages for increased efficiency, such as Java or C++. Unfortunately, ne&#39;er the twain shall meet. Scientists and scientific software developers who wish to make use of advanced technologies such as MapReduce will have to learn the relevant programming languages, and staff at IT and HPC centres providing cloud-related services to scientists will have to develop and deliver the appropriate training courses.</p>
<p>Still it is true that current implementations of MapReduce do not support the complex and interlinked/referential/hierarchical dataset structures that are common in science. This is one area that the technology needs to improve in before it can fully realise its potential.</p>
<p>My earlier point comes back yet again &#8211; that scientific software developers need to optimise their code for the cloud, rather than simply port existing paradigms and claim it is the cloud&#39;s fault when they do not perform so well. This is easier said than done, as scientists like to use tools that are well-referenced and well-established. Given the choice of old-fashioned-but-functional cloud-ignorant Tool A which is the industry standard, versus brand-new cloud-optimised Tool B which has been shown to produce the same results but is too new to be widely cited in journals, Tool A will win every time (and anyone using Tool B and mentioning it in their subsequent journal paper will almost certainly get pulled up on this fact by the reviewers who will have very strong opinions of their own about which tool is most appropriate based on citations in prior publications, thus perpetuating the reign of Tool A even though Tool B may be more advanced).</p>
<h2>Finding 7. Public clouds can be more expensive than in-house large systems.</h2>
<p>Yes, but usually only if you attempt to replicate in-house systems like-for-like, i.e. you set up a large number of machines that are running constantly regardless of workload. It is widely accepted that cloud costs for a machine that is running 24x7x365 are similar if not greater than the equivalent total cost of ownership (TCO) for the equivalent machine in an in-house data centre, but the point of the cloud is that you shouldn&#39;t need to have machines up and running permanently in anticipation of workload &#8211; rather you should create them just-in-time when work peaks, and tear them down again as soon as they fall idle. As you pay only for the time the machine is up-and-running then this management technique will soon reduce the costs below the level of provisioning and maintaining equivalent in-house hardware. The report does not appear to address this possibility at all.</p>
<p>Interestingly, it is in this finding that the second sign of the authors trying to justify the existence of their own HPC centres comes into light (the first being the earlier complaint about clouds requiring extra programming and sys-admin knowledge amongst scientists), suggesting that the report overall may be biased towards the interests of the authors. They state that the costs they use for comparing cloud with HPC &quot;do not take into consideration the additional&nbsp;services such as user support and training that are provided at supercomputing centers today&quot; which are &quot;essential for scientic users who deal with complex software stacks and dependencies and require help&nbsp;with optimizing their codes to achieve high performance and scalability&quot;. A move from cloud to HPC does not necessarily mean that all that support goes away &#8211; it is highly likely that staff who used to run the HPC would now manage access to the cloud resources instead, providing all the same additional services to scientists as they did before. Does this paragraph of the report imply that the DOE is seriously considering moving its HPC function to the cloud and that the HPC centres who authored this report are trying to prevent it from happening? Who knows.</p>
<h2>Finding 8. DOE supercomputing centers already approach energy eciency levels achieved in&nbsp;commercial cloud centers.</h2>
<p>Great. That might be the case for the huge HPC resources at the DOE, in which case congratulations as that&#39;s quite an achievement, but for the rest of us this point is highly unlikely to be true of our own data centres. For those IT managers needing to comply with green agendas, the cloud is almost always going to be more energy-efficient than in-house operations.</p>
<h2>Finding 9. Cloud is a business model and can be applied at DOE supercomputing centers.</h2>
<p>This is an interesting finding and is very true &#8211; the cloud is indeed a business model as well as a technical innovation. The whole way in which people interact with resources changes under the cloud, with instant access to dedicated virtual resources on demand rather than queued access to shared and traffic-managed HPC resources.</p>
<p>The finding says that &quot;Rapid elasticity and&nbsp;on-demand self-service environments essentially require different resource allocation and scheduling policies&nbsp;that could also be provided through current HPC centers, albeit with an impact on resource utilization&quot;. This is about private cloud vs. public cloud, about the fact that even if converted to private cloud technology, HPC centres will still face resource utilization issues due to the limited size of their compute capacity vs. the number of people needing to make use of it. Public clouds are generally larger than most in-house data centres, able to grow faster through commercial investment processes, are quicker to respond to and manage demand, are more able to fairly share demand over a greater number of diverse users and requirements, and are more highly utilized. All these reasons are why private cloud will only be useful as a halfway-house until systems can be established that make use of the public cloud.</p>
<p>Overall the report makes some good points, when read with a pinch of salt regarding its limitations of research (DOE requirements only) and of authorship (the authors&#39; interests are linked with preserving the role of existing HPC centres). For an organisation of a similar size to the DOE who has made similar investments in HPC, it is a very relevant and salient report. For smaller organisations and those who do not have their own HPC resources already, remember that one size does not fit all.</p>
<p><em>Richard Holland</em></p>
]]></content:encoded>
			<wfw:commentRss>http://www.eaglegenomics.com/2012/02/the-magellan-report-for-cloud-computing-in-science/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Eagle Genomics&#8217; symposium 6th abstract</title>
		<link>http://www.eaglegenomics.com/2012/02/eagle-genomics-symposium-6th-abstract/</link>
		<comments>http://www.eaglegenomics.com/2012/02/eagle-genomics-symposium-6th-abstract/#comments</comments>
		<pubDate>Fri, 10 Feb 2012 11:19:41 +0000</pubDate>
		<dc:creator>Ivan Karabaliev</dc:creator>
				<category><![CDATA[Bioinformatics]]></category>
		<category><![CDATA[blog]]></category>
		<category><![CDATA[Automated analysis of genomics data]]></category>
		<category><![CDATA[Babraham Research Campus]]></category>
		<category><![CDATA[Bioinformatics events]]></category>
		<category><![CDATA[Eagle Genomics symposium]]></category>
		<category><![CDATA[Health Protection Agency]]></category>
		<category><![CDATA[Jonathan Green]]></category>
		<category><![CDATA[March science event]]></category>
		<category><![CDATA[The next 10 years Genome Content Management]]></category>
		<category><![CDATA[whole genome sequencing]]></category>

		<guid isPermaLink="false">http://www.eaglegenomics.com/?p=2873</guid>
		<description><![CDATA[The sixth abstract of one of the talks which will be held at our 2nd Symposium: &#34;The Next 10 Years of Genome Content Management&#34; on 29th March held at Cambridge, Babraham Research Campus. The title is: Exploiting whole genome sequencing for public health microbiology Presented by: Jonathan Green, Head of Bioinformatics, Health Protection Agency &#160;...]]></description>
			<content:encoded><![CDATA[<p>The sixth abstract of one of the talks which will be held at our 2nd Symposium: &quot;The Next 10 Years of Genome Content Management&quot; on 29th March held at Cambridge, Babraham Research Campus.</p>
<p style="text-align: justify;"><span style="font-size:11px;">The title is:</span> <span style="font-weight: bold;">Exploiting whole genome sequencing for public health microbiology</span></p>
<p><span style="font-size:11px;">Presented by:</span><span style="font-weight: bold;"> Jonathan Green</span><strong>, Head of Bioinformatics, Health Protection Agency</strong></p>
<p>&nbsp;</p>
<p>&nbsp;&nbsp; &quot;Next Generation Sequencing (NGS) represents a revolution in DNA sequencing technologies, making routine rapid analysis of the complete DNA sequence of viral and bacterial genomes a reality.&nbsp; It is already the case that microbiological investigations of any significant outbreak of infectious disease include whole genome sequencing (WGS) of a putative causative agent. for example&nbsp; pandemic influenza strains, MRSA and Acinetobacter baumannii.</p>
<p>The potential impact of these technologies on Organisations such as the HPA is far-reaching.&nbsp; The bioinformatics challenges of NGS and other &#39;omic technologies, in terms of IT infrastructure for the storage and management of the large amounts of data generated, are well-known but not resolved.&nbsp; A key challenge is how to ensure it has the appropriate workforce capability for the analysis of these data in order to best use the extracted information for public health purposes. This requires meaningful integration of &#39;omics data with other laboratory information pipelines, clinical datasets and surveillance systems which is a significant challenge. Public health microbiology is increasingly &#39;data-rich&#39; and laboratory scientists are being required to spend more of their time on data analysis using desktop bioinformatics tools or working with bioinformaticians to deliver these analyses, and this will increase as automation replaces much of the current manual approach to development of data.&nbsp; This is likely to create a significant shift in the skill-mix required to deliver reference microbiology, epidemiology and surveillance.&nbsp; The need to enhance microbiology and to integrate molecular typing into International as well as national surveillance context is recognised as paramount and needs to be a component of the current strategy.The presentation will aim to describe the relevance and likely impact&nbsp; of these technologies on public health microbiology, to describe the challenges and the work being undertaken to provide a platform for further development.&quot;</p>
<p>See the titles of the other talks <a href="http://www.eaglegenomics.com/symposium2012">here</a>.</p>
<p><a href="http://www.eaglegenomics.com/2012/02/eagle-genomics-symposium-5th-abstract/" target="_blank">&lt;&lt; Previous abstract </a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.eaglegenomics.com/2012/02/eagle-genomics-symposium-6th-abstract/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>The Pistoia Alliance Selects Eagle Genomics and Cycle Computing to Bolster Pharmaceutical R&amp;D Practices</title>
		<link>http://www.eaglegenomics.com/2012/02/the-pistoia-alliance-selects-eagle-genomics-and-cycle-computing-to-bolster-pharmaceutical-rd-practices/</link>
		<comments>http://www.eaglegenomics.com/2012/02/the-pistoia-alliance-selects-eagle-genomics-and-cycle-computing-to-bolster-pharmaceutical-rd-practices/#comments</comments>
		<pubDate>Wed, 08 Feb 2012 13:00:23 +0000</pubDate>
		<dc:creator>Richard Holland</dc:creator>
				<category><![CDATA[news]]></category>

		<guid isPermaLink="false">http://www.eaglegenomics.com/?p=2860</guid>
		<description><![CDATA[New York &#8211; February 8, 2012 &#8211; Eagle Genomics and Cycle Computing today announced that they have jointly won a competitive bid by the Pistoia Alliance to support the development of a proof-of-concept (PoC) system to meet the future needs of pharmaceutical R&#38;D IT. A four-month project, the Eagle Genomics &#38; Cycle Computing proof-of-concept system...]]></description>
			<content:encoded><![CDATA[<p><strong>New York &ndash; February 8, 2012 &ndash;</strong> Eagle Genomics and Cycle Computing today announced that they have jointly won a competitive bid by the Pistoia Alliance to support the development of a proof-of-concept (PoC) system to meet the future needs of pharmaceutical R&amp;D IT. A four-month project, the Eagle Genomics &amp; Cycle Computing proof-of-concept system is one of three accepted proposals out of a field of eleven total RFP submissions. A commercial release of the Sequencing Analysis Platform is expected by mid-2012.</p>
<p>The Pistoia Alliance, a global, not-for-profit, pre-competitive alliance of life science companies, vendors, publishers, and academic groups, with a vision for the future of managing and sharing pre-competitive pharmaceutical genomics R&amp;D data, has created detailed requirements under the banner of Sequence Services Phase 2. The system, with a number of add-ons for analyzing this data through the use of standard bioinformatics tools and custom workflows, must ensure that each customer using it is completely confident that their data will remain private and confidential.</p>
<p>Building on Cycle Computing&#39;s extensive experience in securing and scaling large-data applications in the cloud, combined with Eagle Genomic&#39;s enviable track record in delivering bioinformatics applications and workflows to 8 out of 10 top global pharmaceutical companies, the joint Eagle/Cycle project is on track to be a robust, scalable, and highly adaptable solution that meets the vast majority of current Pistoia member needs, plus many of their future ones.</p>
<p>David Flanders, CEO of Eagle Genomics, said &quot;The Pistoia Alliance&rsquo;s vision in recognizing the significance of disruptive technologies and associated new business models presents a superb showcase for the talents of Cycle Computing and Eagle Genomics to provide open innovation solutions to customers in the pharmaceutical sector &ndash; and beyond&quot;</p>
<p>&quot;Cycle is committed to supporting the standards that help spark innovation, unify silos and streamline data digestion,&rdquo; said Jason Stowe, CEO of Cycle Computing. &ldquo;We look forward to collaborating with the alliance members and with the talent of Eagle Genomics in bioinformatics and Cycle Computing in Cloud HPC workflow management, meet the challenges facing both researchers today and into the next generation.&rdquo;</p>
<p>The proof-of-concept system for the Sequencing Analysis Platform will be delivered and demonstrated at the annual Pistoia Alliance Conference, to be held in April 2012 in Boston, MA. The award carries 25% of an overall $200K allocated by Pistoia.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.eaglegenomics.com/2012/02/the-pistoia-alliance-selects-eagle-genomics-and-cycle-computing-to-bolster-pharmaceutical-rd-practices/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Heat is on at Sequence Squeeze</title>
		<link>http://www.eaglegenomics.com/2012/02/heat-is-on-at-sequence-squeeze/</link>
		<comments>http://www.eaglegenomics.com/2012/02/heat-is-on-at-sequence-squeeze/#comments</comments>
		<pubDate>Wed, 01 Feb 2012 15:18:20 +0000</pubDate>
		<dc:creator>Richard Holland</dc:creator>
				<category><![CDATA[Bioinformatics]]></category>
		<category><![CDATA[Cloud]]></category>
		<category><![CDATA[Open source]]></category>

		<guid isPermaLink="false">http://www.eaglegenomics.com/?p=2816</guid>
		<description><![CDATA[Over at the Pistoia Alliance Sequence Squeeze contest, which is being administered by Eagle on behalf of the Alliance, the number of entries is rapidly approaching the 40 mark &#8211; an impressive number considering the complexity of the task. Of particular interest is the number of repeat entries from individuals trying to better their previous...]]></description>
			<content:encoded><![CDATA[<p>Over at the <a href="http://www.sequencesqueeze.org">Pistoia Alliance Sequence Squeeze contest</a>, which is being administered by Eagle on behalf of the Alliance, the number of entries is rapidly approaching the 40 mark &#8211; an impressive number considering the complexity of the task. Of particular interest is the number of repeat entries from individuals trying to better their previous attempt. Each one shaves just a fraction of a second from the runtime or a few extra points off the compression ratio, pushing them above the competition and back to the top of the leaderboard. I don&#39;t think it was anticipated that entrants would end up directly competing in this way, but it certainly isn&#39;t doing the quality of the work any harm &#8211; if anything, quite the contrary!</p>
<p>Many questions have been asked about the subjectivity of the way in which the entries will be judged. Given the diversity of possible input formats and variations between platforms it is not possible to provide a test dataset that represents all of them. Indeed, many compression tricks work only if the dataset is known to be of a particular type or relate to a particular organism. </p>
<p>The contest judging script takes a simplistic approach of running each entry using default settings only on a fairly limited test dataset that is at least internally consistent (i.e. it all comes from the same organism and from the same sequencing platform). This provides an easy way to rank generally better entries under a number of useful categories (ratio, speed, etc.) and identify the top few in each category for further scrutiny by the human judges. The leaderboard shows the results of this automated process.</p>
<p>Whether the ratio or the speed is considered more important is largely subjective, and so the judging panel will consider performance in all categories as well as looking into the effectiveness of any optional flags/optimisations that may be included in the code, plus the quality and robustness of the code itself (there&#39;s no benefit in having an open-source algorithm contest if the source code of the winner is indecipherable or does things that would cause concern in a production environment). Admittedly the subjectivity of this process may cause concern, which is why the judging panel consists of leading bioinformaticians from each of the three main sequencing centres of the world &#8211; the Broad, BGI, and Sanger. As a team they stand the greatest chance of identifying what would best suit the needs of NGS data managers.</p>
<p>One question that comes up frequently is to do with the mismatches. Just how important is it to reproduce the input data exactly when decompressing, and can low-information data be discarded? To prevent complicating matters even further with judging this contest, it was decided that input data should be fully reconstructed at decompression (although the sequences do not have to be in the same order or even in the same files, but they should all be present). Entries are scored for the number of sequence headers or lines of bases or quality scores that do not match. A few mismatches might indicate a minor issue with the code that could be fixed with a bit of investigation &#8211; and the human judges will take this into account &#8211; but a large number of mismatches would be a definite problem.</p>
<p>The contest closes to new entries in just over 6 weeks on March 15th, and winners will be announced at the Pistoia Alliance Annual Conference in Boston MA (USA) on April 23rd. We eagerly await the results!</p>
]]></content:encoded>
			<wfw:commentRss>http://www.eaglegenomics.com/2012/02/heat-is-on-at-sequence-squeeze/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Eagle Genomics&#8217; symposium 5th abstract</title>
		<link>http://www.eaglegenomics.com/2012/02/eagle-genomics-symposium-5th-abstract/</link>
		<comments>http://www.eaglegenomics.com/2012/02/eagle-genomics-symposium-5th-abstract/#comments</comments>
		<pubDate>Wed, 01 Feb 2012 14:21:49 +0000</pubDate>
		<dc:creator>Ivan Karabaliev</dc:creator>
				<category><![CDATA[Bioinformatics]]></category>
		<category><![CDATA[blog]]></category>
		<category><![CDATA[ARK Genomics]]></category>
		<category><![CDATA[Babraham Research Campus]]></category>
		<category><![CDATA[Bioinformatics events]]></category>
		<category><![CDATA[Eagle Genomics symposium]]></category>
		<category><![CDATA[March science event]]></category>
		<category><![CDATA[Mick Watson]]></category>
		<category><![CDATA[New wave of genomics research]]></category>
		<category><![CDATA[Roslin Institute]]></category>
		<category><![CDATA[The Age of Bioinformatics]]></category>
		<category><![CDATA[The next 10 years Genome Content Management]]></category>

		<guid isPermaLink="false">http://www.eaglegenomics.com/?p=2811</guid>
		<description><![CDATA[The fifth abstract of one of the talks which will be held at our 2nd Symposium: &#34;The Next 10 Years of Genome Content Management&#34; on 29th March held at Cambridge, Babraham Research Campus. The title is: Out of the shadows &#8211; the future of bioinformatics Presented by: Mick Watson, Director of ARK Genomics (Roslin Institute)...]]></description>
			<content:encoded><![CDATA[<p>The fifth abstract of one of the talks which will be held at our 2nd Symposium: &quot;The Next 10 Years of Genome Content Management&quot; on 29th March held at Cambridge, Babraham Research Campus.</p>
<p style="text-align: justify;"><span style="font-size:11px;">The title is:</span> <span style="font-weight: bold;">Out of the shadows &#8211; the future of bioinformatics</span></p>
<p style="text-align: justify;"><span style="font-size:11px;">Presented by:</span><span style="font-weight: bold;"> Mick Watson</span><strong>, Director of ARK Genomics (Roslin Institute)<br />
	</strong></p>
<p style="text-align: justify;">&nbsp; &quot;With bench scientists increasingly incapable of handling the volumes and types of sequence data, bioinformatics is now the most important aspect of genomics. It is impossible to carry out genomics research without sophisticated tools and intelligent, driven bioinformaticians. Often, bioinformaticians are best placed to design experiments and to advise on how to get the best results from genomics projects. BBSRC describe the current period of research as &quot;The Age of Bioscience&quot; when perhaps it should be &quot;The Age of Bioinformatics&quot;. It is now time for bioinformatics to mature as a science, to increase the emphasis on &quot;bio&quot; as well as &quot;informatics&quot;, and for bioinformaticians to lead the new wave of genomics research. Rather than a single genome per species, we must now recognise that every individual consists of a collection of genomes that are structurally variant; in addition to which, we can now measure epigenetic effects, such as methylation, at single-base accuracy. The paradigm is one individual, many genomes, many epigenomes. In addition to microbial metagenomics and the challenges faced therein, we are rapidly approaching large, eukaryotic metagenomics. All of these, combined with modern ways for communicating scientific research, combine to demand a new paradign for genomics research and an increased emphasis on the importance of bioinformatics.&quot;</p>
<p style="text-align: justify;">See the titles of the other talks <a href="http://www.eaglegenomics.com/symposium2012">here</a>.</p>
<p style="text-align: justify;">&nbsp;</p>
<p style="text-align: justify;"><a href="http://www.eaglegenomics.com/2012/01/eagle-genomics-symposium-4th-abstract/" target="_blank">&lt;&lt; Previous abstract </a>&nbsp;&nbsp;&nbsp;&nbsp; <a href="http://www.eaglegenomics.com/2012/02/eagle-genomics-symposium-6th-abstract/">Next abstract &gt;&gt;</a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.eaglegenomics.com/2012/02/eagle-genomics-symposium-5th-abstract/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Eagle Genomics&#8217; symposium 4th abstract</title>
		<link>http://www.eaglegenomics.com/2012/01/eagle-genomics-symposium-4th-abstract/</link>
		<comments>http://www.eaglegenomics.com/2012/01/eagle-genomics-symposium-4th-abstract/#comments</comments>
		<pubDate>Fri, 27 Jan 2012 11:07:28 +0000</pubDate>
		<dc:creator>Ivan Karabaliev</dc:creator>
				<category><![CDATA[Bioinformatics]]></category>
		<category><![CDATA[blog]]></category>
		<category><![CDATA[Almac Diagnostics]]></category>
		<category><![CDATA[Babraham Research Campus]]></category>
		<category><![CDATA[Bioinformatics events]]></category>
		<category><![CDATA[Eagle Genomics symposium]]></category>
		<category><![CDATA[March science event]]></category>
		<category><![CDATA[The next 10 years Genome Content Managemen]]></category>
		<category><![CDATA[Vitali Proutski]]></category>

		<guid isPermaLink="false">http://www.eaglegenomics.com/?p=2802</guid>
		<description><![CDATA[The fourth abstract of one of the talks which will be held at our 2nd Symposium: &#34;The Next 10 Years of Genome Content Management&#34; on 29th March held at Cambridge, Babraham Research Campus. The title is: A toolbox for high throughput in-depth analysis of omics data.&#160; Presented by: Vitali Proutski, Head of Bioinformatics, Almac Diagnostics...]]></description>
			<content:encoded><![CDATA[<p>The fourth abstract of one of the talks which will be held at our 2nd Symposium: &quot;The Next 10 Years of Genome Content Management&quot; on 29th March held at Cambridge, Babraham Research Campus.</p>
<p style="text-align: justify;"><span style="font-size:11px;">The title is:</span> <span style="font-weight: bold;">A toolbox for high throughput in-depth analysis of omics data.&nbsp;</span></p>
<p style="text-align: justify;"><span style="font-size:11px;">Presented by:</span> <strong>Vitali Proutski, Head of Bioinformatics, Almac Diagnostics<br />
	</strong></p>
<p style="text-align: justify;">&nbsp; &quot;Almac Diagnostics is a premier provider of Bioinformatics and Biostatistics consultancy services focusing on the analysis and interpretation of complex &#39;omics&#39; datasets. In order to satisfy the requirements of internal and external customer projects Almac have developed an extensive and highly customisable set of tools covering the entire continuum of a typical data analysis project, from data QC to in-depth mechanistic analysis or development, evaluation and selection of optimal predictive biomarker models. The tool box, some components of which will be presented and discussed, ensures consistency and high efficiency of analysis without compromising the quality and depth of it.&quot;</p>
<p style="text-align: justify;">See the titles of the other talks <a href="http://www.eaglegenomics.com/symposium2012">here</a>.</p>
<p style="text-align: justify;">&nbsp;</p>
<p style="text-align: justify;"><a href="http://www.eaglegenomics.com/2012/01/eagle-genomics-symposium-3rd-abstract/" target="_self">&lt;&lt; Previous abstract</a>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <a href="http://www.eaglegenomics.com/2012/02/eagle-genomics-symposium-5th-abstract/">Next abstract &gt;&gt;</a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.eaglegenomics.com/2012/01/eagle-genomics-symposium-4th-abstract/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Eagle Genomics&#8217; symposium 3rd abstract</title>
		<link>http://www.eaglegenomics.com/2012/01/eagle-genomics-symposium-3rd-abstract/</link>
		<comments>http://www.eaglegenomics.com/2012/01/eagle-genomics-symposium-3rd-abstract/#comments</comments>
		<pubDate>Thu, 19 Jan 2012 10:52:19 +0000</pubDate>
		<dc:creator>Ivan Karabaliev</dc:creator>
				<category><![CDATA[Bioinformatics]]></category>
		<category><![CDATA[blog]]></category>
		<category><![CDATA[Babraham Research Campus]]></category>
		<category><![CDATA[Bioinformatics events]]></category>
		<category><![CDATA[Dan MacLean]]></category>
		<category><![CDATA[Eagle Genomics symposium]]></category>
		<category><![CDATA[March science event]]></category>
		<category><![CDATA[Sainsbury Laboratory]]></category>
		<category><![CDATA[The next 10 years Genome Content Managemen]]></category>

		<guid isPermaLink="false">http://www.eaglegenomics.com/?p=2795</guid>
		<description><![CDATA[The third abstract of one of the talks which will be held at our 2nd Symposium: &#34;The Next 10 Years of Genome Content Management&#34; on 29th March held at Cambridge, Babraham Research Campus. The title is: Just Enough Developed Infrastructure Presented by: Dan MacLean, Head of Bioinformatics, Sainsbury Laboratory &#160; &#34;The Sainsbury Laboratory is a...]]></description>
			<content:encoded><![CDATA[<p>The third abstract of one of the talks which will be held at our 2nd Symposium: &quot;The Next 10 Years of Genome Content Management&quot; on 29th March held at Cambridge, Babraham Research Campus.</p>
<p style="text-align: justify;"><span style="font-size:11px;">The title is:</span> <span style="font-weight: bold;">Just Enough Developed Infrastructure</span></p>
<p style="text-align: justify;"><span style="font-size:11px;">Presented by:</span> <strong>Dan MacLean, Head of Bioinformatics, Sainsbury Laboratory<br />
	</strong></p>
<p style="text-align: justify;">&nbsp; &quot;The Sainsbury Laboratory is a small independent research lab of approximately 80 researchers, we concentrate on Plant/Pathogen interactions with a focus on the genetics and genomics of a wide and frequently shifting line-up of organisms under study. A large proportion of our projects are focussed on a relatively modest goal, for example identifying Resistance genes in a novel model, rather than producing a gold-standard annotated genome. Recently we have made the decision to abandon our own compute cluster in favour of merging with larger users ie TGAC in an &#39;Access is more important than ownership model&#39;. Similarly, we have found that many existing tools for genome sequence and feature annotation management, while great in their own way, can be too large and more difficult to manage than is useful for our projects and personnel profile. I&#39;ll describe the sequencing pipelines and hardware that feed into our data store as background for the core of our management structure, our lightweight sequence and feature versioning tool Gee Fu. Gee Fu interacts with our assembly and annotation pipelines from the command-line and from our Galaxy installation. It acts simultaneously as a store and annotation editor that has appropriate interfaces for users with a wide range of computing skills and can be used to share data externally via its RESTful interface and a web based front end.&quot;</p>
<p style="text-align: justify;">See the titles of the other talks <a href="http://www.eaglegenomics.com/symposium2012">here</a>.</p>
<p style="text-align: justify;"><a href="http://www.eaglegenomics.com/2012/01/eagle-genomics-symposium-2nd-abstract/">&lt;&lt; Previous abstract</a>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <a href="http://www.eaglegenomics.com/2012/01/eagle-genomics-symposium-4th-abstract/">Next abstract &gt;&gt;</a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.eaglegenomics.com/2012/01/eagle-genomics-symposium-3rd-abstract/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

