Accessible data, UX and the philosophy of pipeline building: Q&A with Mauro Saporita
Mauro is a bioinformatician and scrum master at Eagle Genomics. In this article he discusses why user experience is fundamental to building effective pipelines and how data accessibility will direct the future of bioinformatics.
Q: What is your role at Eagle Genomics?
I have two different roles, bioinformatician and scrum master. As a bioinformatician I spend most of my time coding! I work in the data science team developing pipelines and tools which allow lab scientists to upload, neatly catalogue and analyse their data. Essentially our pipelines power our software platform which in turn enables researchers to transition from manual to automated management of their experimental data.
My background is in biomedical engineering and, in the past, I was more focused on the engineering element rather than biology and science. Part of my masters degree was focused on bioinformatics and that’s how I began this career path.
One of the projects I work on is Eagle’s Fast Start package, which focuses on pipelines for curation and analysis of microbiome data. There are many steps involved in building a pipeline and which require support from the development and operations (DevOps) section of the team. The development team at Eagle Genomics is so diverse and multi-skilled. My part of the team specialises in the biological aspects of pipeline building but we are also supported by the DevOps side with their expert knowledge of services, the cloud and storage; all the aspects which sit behind the systems and pipelines we develop and build.
Working in the data science team is great because, although we all have areas of development we specialise in, we’re still constantly learning from each other. For example, part of my job is to carry out code review, making sure code is accurate and consistent across all members of the team. As scrum master I ensure the team is following best practice principles for Scrum and Agile methodologies.
Q: The way we think about scientific research is changing and discoveries can now be made digitally as well as in the lab. How do researchers uncover scientific breakthroughs just by using data?
We are moving away from the traditional concept of science in the lab to science which can be done at a desk in front of a computer. This is all because of data. There’s a lot of data coming out of labs and in bioinformatics we generate even more of it by using the data provided by the lab scientists! That’s why there’s a vital need for bioinformatics and why there is such a variety of roles within the field, from software developers to DevOps specialists and biocurators. There are so many aspects and branches of bioinformatics which need to be taken care of; annotation, analysis, security, storage and making data shareable.
This doesn’t mean that lab work has become less important, bioinformatics couldn’t happen without lab research, but we are now at a point where bioinformatics can assist and enhance lab work by using data to enable new and novel discoveries. Pipelines and algorithms are able to spot connections between results and data entities which the human eye alone simply can’t.
Q: What is the role of bioinformatics in tackling the huge volumes of data produced by biological research?
Data should be clean, digitalised and securely held. There are companies that still keep records of DNA analysis on printed pages, meaning it is not only difficult to physically store large quantities of data but very time consuming to find specific pieces of information held in an archive. Bioinformatics is vital for enabling the transfer and storage of huge volumes of data and making information easily findable, instead of having rooms stacked full of printed records!
Bioinformatics data is often digitally stored in data centres
But bioinformatics isn’t just a matter of space and storage, it’s also about the quality of data and making it shareable amongst the right people and organisations; otherwise scientists and researchers will lose vital opportunities for both diagnosis and discovery in the context of human health.
Q: Why is consistent, high quality data so important for effective analysis of big data?
In order for data to be sharable and of practical use to multiple researchers and teams, it’s essential that data entities are of high quality and categorised using a universal standard. Without standardisation and accessibility data becomes siloed and cannot be effectively used to support research and discovery.
Bioinformatics extrapolates from lab data to carefully categorise and annotate metadata (contextual data) including pathways, associated metabolites, drug-disease association etc. To do so effectively bioinformaticians must use uniform concepts, ontologies and annotation. Only then can all available research data be fully accessible and easily searchable, enabling the analyses that can uncover new scientific discoveries.
Q: For you, what is the most important factor in building a successful analysis pipeline?
You always need to be thinking about the final user. As developers we need to build pipelines that are easy to use, even if what those pipelines are doing in the background is very complex. You can build a high quality pipeline or system which would enable a user, in this case a scientist, to find the answers they are looking for in a matter of minutes but, if it’s not easy to navigate, they will spend hours manually combing through data or using a series of less efficient tools because they are familiar with how to use them.
The e[datascientist] platform is designed for use by scientists who are non-data-experts
So yes, to my mind, keeping the user and their experience of the pipeline tools at the forefront is a vital part of the process!
Q: What are your hopes for the future of bioinformatics and life sciences research?
Everything in bioinformatics and genomics is changing so fast. In the last 20 years the whole landscape of bioinformatics has completely changed! With the field moving at such pace it’s difficult to say what might happen in the future, but improved data quality is definitely something we can look forward to seeing. Bioinformaticians will probably achieve better infrastructure and techniques for data storage, as well as data collection and sharing. Generating a universal standard is going to be really important and will impact the capacity for bioinformatics to reveal new discoveries.
At Eagle Genomics we want our e[datascientist] software platform to enable automated data science discovery. We want researchers to use the platform to collect and analyse their data in a way which enables them to spot previously undiscovered connections between entities, generating new discoveries and insights that result in game-changing research outcomes. Without systems like our platform the complexity and volume of life science data can seem overwhelming and prevent researchers from identifying connections which could ultimately change the way we think about human and environmental health.
Q: Tell us something about yourself we couldn’t find out from Google.
Google certainly couldn’t tell you how much I hate pineapple pizza! Never ask for a pizza with pineapple in Italy; Italians don't like it, they don't consider it a pizza. I think it should be illegal!