Data management plans...why?
What is a data management plan and why is it important?
A data management plan (DMP) is a formal document that manages the information lifecycle needs of an enterprise in an effective manner and leads to the implementation of good data governance practices. DMPs are a valuable asset for almost all scientific studies, including Eagle consultancy projects. Many funding agencies require DMPs to be submitted alongside project proposals. They should not be considered as an additional burden as they bring benefits to every project member on several levels. It is good to gain a familiarity with what is required in a DMP, this blog aims to provide some guidance that we hope will be useful.
A well considered data management plan is an effective document throughout a project lifecycle and provides information that includes:
The known location of all project data; defined roles and responsibilities for generation and maintenance of the data; ownership that allows members to ask questions to the right person to gain understanding; ensures project continuity as staff members come and go; avoids unnecessary duplication of work generating and analysing data; data underlying peer reviewed publications is maintained and data sharing enhances collaborations and advances research.
Taking academic grant applications as an example, DMPs come in several varying flavours (depending on the funding agency) and have core principles that need to be respected. Key features of a plan are that it:
Outlines the core data requirements;
Is an integral, supporting document to a grant application;
Will develop into a living document from project initiation to completion and beyond
The Digital curation centre provides a lot of unbiased advice to help develop and maintain a data management plan.
Typically these application level plans will be concise, limited to 3-4 pages. To provide all the required information, you will need to be armed with details from the project members, namely their expected data flows from samples to analysis, to gain an understanding of the
- data file inputs and outputs;
- use of standards and methodologies during data collection;
- expected file content and format;
- estimated volume of files to be generated;
- supporting material to gain correct interpretation of the data;
- expose existing data resources available at the beginning of the project and how data generated during the project will add value.
Data security is an important part of the DMP - implementation of best practice will ensure that the right people will have the correct level of access to files needed for their planned activities. Data confidentiality, integrity and availability are important principles that provide adequate protection against data corruption or loss. Partners with information security certification, for example ISO27001 as Eagle Genomics have, will be prepared to mitigate potential data security risks.
If your project is awarded (congratulations!), the clear and achievable aims detailed in the DMP will need to be developed and the plan becomes an extended living document enhanced by collaborative data governance. Enterprise data management extends to cover 6 core themes:
- Data Types, Formats, Standards and Capture Methods
- Ethics and Intellectual Property
- Access, Data Sharing and Reuse
- Short-Term Storage and Data Management
- Deposit and Long-Term Preservation
The eagle-eyed amongst you will notice the themes are overlapping with those in the initial document (depending on the funding agency requirements), and now is the time to develop and implement policies and procedures as guidelines for your research group, department or institution.
The living DMP will change as processes and techniques evolve over time and your project will need to respond to new opportunities or changes in research to succeed. Do you know the active projects in your area that you may want to contribute to or watch with interest? For example the development of “Automatable Discovery and Access Matrix (ADA-M)” which aims to ensure patient consent and other conditions of use can be represented in a standardised, computer-readable manner. Do you want to use the Broad Institute “Data Use Ontology” (DUO) which is registered at the OBO foundry and can be searched via the European Bioinformatics Institute (EMBL-EBI) Ontology Lookup Service (OLS). It is always good to be aware of the activities of the Global Alliance for Genomics and Health (GA4GH) as they are working to create interoperable approaches to catalyze projects that will help unlock the great potential of genomic data.