Making the most of the metadata
Fast Moving Consumer GoodsA large global life sciences innovation company had accumulated vast amounts of data in a format that was impossible to sort, search and share.
A large global life sciences innovation company had accumulated vast amounts of data from hundreds of separate microarray experiments, held in a variety of locations and file formats including raw microarray .cel files, Excel spreadsheets, text files and Word documents. This created an amazing resource with almost endless potential, but in this format it was impossible to sort, search and share. Even just managing the terabytes of existing data was proving to be a distinct challenge, let alone dealing with the new data arriving at around a terabyte a year from the thousands of microarray experiments.
The company knew that it really wanted a more structured and systematic database and catalogue with a focus on metadata (data describing the data). This would allow researchers to upload their own data, cross-reference it with existing information, search across the database to find potential avenues for further research and share their data with colleagues. However, this would take up precious in-house resources, taking IT technicians and developers away from support work on research and development projects.
Eagle Genomics worked with the company to further develop e[datascientist] Catalog. By working closely with Eagle Genomics, the company was able to provide real life examples of the problems they and other similar organisations faced, and so shape the development of the platform. The end result was a central structured catalogue that allowed all of the company’s data to be tagged with metadata and to be linked together securely.
The user-friendly system has allowed the metadata and data to be structured in a logical hierarchical manner using an established open standards model. Experiments can be easily searched, shared with colleagues and exported for further analysis and as a result the company will be able to get the best out of its wealth of data, including seeing how existing information could relate to current and new experiments. The company will be able to see where there are potential gaps in their knowledge, and therefore plan the next generation of experiments. This will support further development of products currently in the pipeline, as well as inform new areas of research.
- Data and metadata structured in a logical hierarchical manner
- Significantly faster and simpler searching and sharing of data
- Data easily exported for further analysis
- Flexible and scalable platform for the future
- Cross-references and connections between previously unrelated sets of data
- Easier visualizations of gaps in existing data