Select research data
When you have finished your research publications, you will need to consider how you will store your data in the long term.
What data needs to be preserved?
Not all the data you have collected will be suitable for long term preservation; as a guideline you should only preserve the data that underpins your research publications. For example if you haven taken 200 photos during your research but only eighty have been used in your research publications, only these eighty images should be preserved.
Why can’t I just keep everything?
- Massive amounts of data complicate finding and access of truly useful information.
- Storage costs money and requires effort and staff hours.
- Freedom of Information laws mean that what you keep on file may have to be disclosed on request.
Criteria to appraise data
1. Are there relevant institutional or legal requirements?
- Does your funding body have a data policy with a specific retention period for your data?
- Does legislation such as the Data Protection or Freedom of Information Acts affect your data?
2. Do the data have particular value?
- Do the data have historic, scientific or subject-specific value?
3. Is the dataset unique?
- Do other copies exist elsewhere? Will they be preserved?
4. Can the data be reused by others?
- Are human subjects issues addressed?
- Was informed consent obtained from the research subjects for archiving and re-use of personal data?
- Was approval by an Ethics Committee required to collect the data?
5. Has the total cost of retaining the data been considered?
- Is there a part of your research grant available to preserve the data?
6. Is the data well documented?
- Is there documentation to support sharing, access and re-use of the data such as the context of the project, methodology, copyright/IPR and ethical issues?
What data does my funding body expects me to keep and make available?
This is often a tricky question as it depends on your individual research project. In summary UKRI give the following guidelines:
You are expected to publish:
- the data that underpins publications
- the data that validates research findings
- the data that is worth keeping
The important principle is that the validity of the published research findings is testable. The minimum expectation is that you should provide the information that someone would need to be able to validate published work — this is also critical to maximise the impact of publicly funded research and to maintain public trust in science and research.
Every research is different
In some cases one criterion may outweigh another. For instance, a dataset including health data may be impossible to anonymise but have a very high scientific significance. In such cases data should be retained but the issues which challenge re-use must also be addressed (for example by depositing the data with a repository which has a proven mechanism for granting controlled access).
Do I need to share software or code I have written?
This depends on the research which is being carried out. Consider whether it is necessary to validate research findings, such as those published in a journal paper. As a rule of thumb, if your journal paper does not include sufficient detail for others to unambiguously replicate your work, you should share your code as part of your research data.
Even if you don’t need to preserve software, it is good practice to make available the software and adequate documentation to enable others to more easily validate your research findings, and to access and reuse your research data. Normally it will be the PI of the research project and/or Head of Department/Research Centre who will make the decision about what research data should be preserved and made available.
For more information and examples please have a look at the useful article of the Software Sustainability Institute. It refers to EPSRC's data policy but the learnings are applicable to all major funders.
Are you depositing in a data centre?
Where data is destined for deposit into a subject-based data centre, subject-specific evaluation criteria may apply. Where this is the case researchers are advised to follow the guidance provided by the data centre in question.
Please also have a look at our data preservation archive webpage.
Data selection guide
Look at our simple Data selection guide (pdf).