Preserve and Share
Tab Content: Preserve and Share
Learn how to select what to keep and how to preserve it properly. When you share your data you have to think about an appropriate license. Pure, the University’s Research Information System, can help you make your data visible.
What is digital preservation?
"The set of processes, activities and management of digital information over time to ensure its long-term accessibility. Because of the relatively short lifecycle of digital information, preservation is an ongoing process." — University of Bristol.
Preservation involves actions and procedures to hold data for some period of time and/or to set data aside for future use. This may include data archiving and submission to a data repository.
Lancaster University recognises that it is good practice for researchers to manage and retain their research data. Sometimes they are legally required to do so for many years after project funding has ceased.
Why preserve and share?
- The idea behind increasing access to research data at the end of a project, when legally, ethically and commercially appropriate, is that publicly funded research should be a public good and be able to be accessed and reused by the widest possible audience.
- Sharing data also shows your scientific integrity. It allows others to replicate, validate, or correct your results, thereby improving the scientific record.
- Lancaster University supports access to data by other researchers who could re-use the data, thus maximising the effectiveness of our research funding.
Many funders specify time-frames of how long research data has to be preserved. Look at funders requirements to find out more.
There might be reasons not to share your data
- If your data has financial value or is the basis for potentially valuable patents that could be exploited by the University, it may be unwise to share it.
- If the data contains sensitive, personal information about human subjects, it may violate the Data Protection Act, ethics codes, or your own written consent forms to share it, even with other researchers. However, there might be ways of anonymising the data to make it sharable.
Please note that if you think you cannot share your data you may need to provide a statement to your funder justifying why data should be restricted as part of the application process.
A lot of data with personal or confidential information can be anonymised without compromising the value data and be shared. For example you should consider to:
- Remove direct personal identifiers
- Aggregate or reduce the precision of variables that might be identifiable (such as postcodes)
- Generalise text variables to reduce identifiability
Note that re-users of data have the same legal and ethical obligation to NOT disclose confidential information as primary users.
A guide to data anonymisation can be found on the website of the UK Data Archive.
Data Access Restrictions
If necessary, you need to restrict access according to the data's level of detail, sensitivity and confidentiality. Some data centres like the UK Data Archive allow you to do that. You will also be able to do this when you use Pure to publish your data.
The University recommends restricting the dataset's visibility at the electronic document file level (not the dataset record level) to control access to it. If you deposit your data into Pure, you have the following options:
- Unrestricted public access to data files if you choose Public - No restriction in Visibility of your data file in Pure. Data files will be freely available on the Research Directory.
- No public access if you choose Backend - Restricted to Pure users. In this case data files will not be visible on the Research Registry (but the dataset record describing the data will be).
- Access on request. Same as in step 2 but in your description you add that data is available on request and you summarise conditions under which data can be accessed. For example: "Data can be made available on request to bona fide researchers who provide information regarding proposed use".
You can also note any legal or ethical constraints when creating a dataset entry on Pure. You will need to provide details of why your data is subject to these restraints.
Please note that funders expect that most datasets will be publically available (or Open) and you will have to justify restrictions (best to do that in your Data Management Plan).
All research data you want to publish need to conform to Lancaster University's Code of Practice (pdf).
Find out more about Ethics from the Research Support Office.
What if my consent forms do not allow my data to be shared?
If the consent forms explicitly prohibits data publication and sharing, your funding body will not expect you to share your data. You are required to have a data access statement in your research publication that links to a Web page where you explain why data cannot be shared. This can be done in Pure.
However, for future research projects funders like EPSRC and ESRC expect: "when gaining informed consent, include consent for data sharing" (ESRC Data Policy). Get in touch with RDM Support if you have any questions: email@example.com.
Consent form templates
The UK Data Archive has a useful help site including example information sheets and consent forms.
When and for how long do I need to preserve research data?
Data must be kept securely even once the research has ended. Your funder might specify when you need to deposit the data with a data archive, for example "as soon after the end of data collection as is possible" (NERC, DCC summary), "within three months of the end of the award" (ESRC, DCC summary), or "normally within 12 months of the data being generated" (EPSRC, DCC summary).
Many research funders specify which data need preserving and for how long. This can range from "a minimum of three years" (AHRC) to ten years (BBSRC (pdf)). EPSRC (Data Principle VII) expects data to be preserved for 10 years after the last "privileged access" to the data, so this data might have to be kept in perpetuity.
Please consult your funders' data policy expectations for specific details.
Lancaster University expectations
If your research is not funded by an external funder, you will need to comply with Lancaster University's Lancaster University Research Data Management Policy (Word doc) which states "all research data will be stored in either electronic or paper form for a minimum of 10 years after the end of a project, unless ethical considerations, participant confidentiality, FOI requirements or external agencies e.g. NHS, specifically require otherwise."
Data access statements
Data access statements are used in publications to describe where supporting data can be found and under what conditions they can be accessed. They are a requirement of many funders' data policies and are a requirement of the UKRI (formerly RCUK) Policy on Open Access (pdf, Section 3.3 ii).
Tab Content: Select Data
When you have finished your research publications, you will need to consider how you will store your data in the long term.
What data needs to be preserved?
Not all the data you have collected will be suitable for long term preservation; as a guideline you should only preserve the data that underpins your research publications. For example if you haven taken 200 photos during your research but only eighty have been used in your research publications, only these eighty images should be preserved.
Why can’t I just keep everything?
- Massive amounts of data complicate finding and access of truly useful information.
- Storage costs money and requires effort and staff hours.
- Freedom of Information laws mean that what you keep on file may have to be disclosed on request.
Criteria to appraise data
1. Are there relevant institutional or legal requirements?
- Does your funding body have a data policy with a specific retention period for your data?
- Does legislation such as the Data Protection or Freedom of Information Acts affect your data?
2. Do the data have particular value?
- Do the data have historic, scientific or subject-specific value?
3. Is the dataset unique?
- Do other copies exist elsewhere? Will they be preserved?
4. Can the data be reused by others?
- Are human subjects issues addressed?
- Was informed consent obtained from the research subjects for archiving and re-use of personal data?
- Was approval by an Ethics Committee required to collect the data?
5. Has the total cost of retaining the data been considered?
- Is there a part of your research grant available to preserve the data?
6. Is the data well documented?
- Is there documentation to support sharing, access and re-use of the data such as the context of the project, methodology, copyright/IPR and ethical issues?
What data does my funding body expects me to keep and make available?
This is often a tricky question as it depends on your individual research project. In summary UKRI give the following guidelines:
You are expected to publish:
- the data that underpins publications
- the data that validates research findings
- the data that is worth keeping
The important principle is that the validity of the published research findings is testable. The minimum expectation is that you should provide the information that someone would need to be able to validate published work — this is also critical to maximise the impact of publicly funded research and to maintain public trust in science and research.
Every research is different
In some cases one criterion may outweigh another. For instance, a dataset including health data may be impossible to anonymise but have a very high scientific significance. In such cases data should be retained but the issues which challenge re-use must also be addressed (for example by depositing the data with a repository which has a proven mechanism for granting controlled access).
Do I need to share software or code I have written?
This depends on the research which is being carried out. Consider whether it is necessary to validate research findings, such as those published in a journal paper. As a rule of thumb, if your journal paper does not include sufficient detail for others to unambiguously replicate your work, you should share your code as part of your research data.
Even if you don’t need to preserve software, it is good practice to make available the software and adequate documentation to enable others to more easily validate your research findings, and to access and reuse your research data. Normally it will be the PI of the research project and/or Head of Department/Research Centre who will make the decision about what research data should be preserved and made available.
For more information and examples please have a look at the useful article of the Software Sustainability Institute. It refers to EPSRC's data policy but the learnings are applicable to all major funders.
Are you depositing in a data centre?
Where data is destined for deposit into a subject-based data centre, subject-specific evaluation criteria may apply. Where this is the case researchers are advised to follow the guidance provided by the data centre in question.
Please also have a look at our data preservation archive webpage.
Data selection guide
Look at our simple Data selection guide (pdf).
Tab Content: Choose Archive
There are different ways to share data and the route you take may be dependent on your discipline, or what your funder expects.
What is preservation?
In the context of data management, data preservation refers to the process of maintaining access to data so that it can still be found, understood and used in the future. This usually does not mean that data need to be kept forever but for a certain period of time similar to the arrangements in a physical archive.
When do I preserve my data?
Generally, after the publication has been accepted the underpinning data will be submitted to a repository for immediate publication. However, embargoes of up to 12 months are accepted by some funders. Once these have been approved, granted, or accepted, then the data will be made available for reuse within its licencing guidelines.
Where do I preserve my data?
There are two things you need to do first:
Check your funder's Terms and Conditions if you are required to deposit your data in a specific data archive. For a quick overview of the main funding bodies, consult the DCC's table. If you have a Data Management Plan for your research project you should have stated where you intend to deposit your data.
If you are not required by your funder you will be able to deposit your data for preservation with Lancaster University using Pure. See our Data and Pure webpage for details.
If you do deposit your data in data centre that is run or recommended by your funder, you do notneed to deposit your data with Lancaster University. It is not a requirement that all research data must be held within the University. However, you do need to have your metadata about your research data in Pure, just like you would do with research publications. Please consult our Data and Pure page for details.
See below for a decision tree about where to deposit your data:
How long do I need to preserve my data?
Lancaster University's Research Data Management Policy (Word doc) states "that all research data will be stored in either electronic or paper form for a minimum of 10 years after the end of a project, unless ethical considerations, participant confidentiality, FOI requirements or external agencies eg NHS, specifically require otherwise."
However, your funding body might have issued certain requirements as part of the grant conditions. Generally, if your funder's data policy covers long-term curation, the expected periods for preservation range from 3 years to 10 or more years.
Suitable Data Centres
Please also consult the excellent guide by the Digital Curation Centre: Where to keep research data: DCC checklist for evaluating data repositories.
UK data centres run by funding bodies
Depending on your funder's Terms and Conditions, you are required to deposit your data in a specific archive or repository. Your funding body might run its own data centre:
- ESRC funds the UK Data Archive for social science data. Data being collected or produced in all ESRC projects need to be offered to the UK Data Archive using ReShare;
- The Archaeology Data Service (ADS);
- The National Environment Research Council (NERC) Data Centres, such as the British Atmospheric Data Centre (BADC); and
- Science and Technology Facilities Council (STFC) has several data centres available but not all curate data.
These and other data centres run or supported by major funding bodies are very likely to be supported and updated in the future. We encourage Lancaster researchers to deposit their research data with these services.
Other data repositories
There are many services available, depending on your research area, and the list is growing. Two examples are:
- Figshare, a general purpose repository including data sets; and
- Zenodo, another general purpose repository for all fields of science that accepts closed access uploads.
While these services are a good resource to find and access other researchers' data at the moment, we would encourage you not to deposit your data with these services. It is currently not clear if these services will be long-lasting with business models very new and still in development. We will certainly keep evaluating these data repositories and update our recommendations if necessary.
You are free to publish your data here in addition to depositing your data in a data centre recommended by your funder or by the University. Make sure you are happy with Terms and Conditions of the data centre, that persistent identifiers (such as DOIs) are provided and that access restrictions you might require are in place. The data repository registry re3data can help you identify suitable data centres.
Tab Content: Licence Data
Plan licensing early
As a data creator, you should clarify ownership of, and rights relating to, research data before a project starts. If you collaborate on a project there might be multiple rights holders and all have to give permission for data sharing.
Types of licenses
There is a spectrum of permissions that can be assigned to licensing data for use, re-use, or distribution. The least restrictive license states that anyone can use, reuse, share and re-distribute the data for any purpose and without attribution. In essence, you waive your claim to the copyright. Restrictions that can be added to a license include:
- Attribution — If you allow use or re-use of the data, you will need to decide if the license will require the inclusion of attribution or acknowledgement in resulting output.
- Derivative Works — Will you allow re-use of your data in the creation of derivative works, and if so with or without attribution? You can stipulate that any derivative work(s) must be licensed under the same parameters as your data; this is called ShareAlike (or SA).
- Commercial/Non-Commercial Use — You might like to allow re-use of your data for commercial use or for non-commercial (NC) use exclusively?
Various forms of license exist, ranging from standard licenses like Creative Commons to bespoke restrictive data licenses. Standard licenses are available which will be suitable for most research projects but you need to make yourself familiar with the options available. Here are some examples:
- Creative Commons provides a range of licenses that are used widely but were not designed for data specifically. The CC BY-NC-SA licence (Creative Commons Attribution-NonCommercial-ShareAlike) lets others share, remix, tweak, and build upon your work non-commercially, as long as they credit you and license their new creations under the identical terms.
- Open Data Commons (ODC) offer two licenses which are better suited to research data than Creative Commons.
- Open Government Licence (pdf): Suitable for UK public sector databases and datasets.
The majority of these licences have been designed to enable open sharing of data. If you are working with a dataset that cannot be made available under an open licence (eg it contains 3rd party rights that cannot be made freely available), you will need to produce a Data Transfer Agreement tailored to the specific needs of your work.
Open Source Software Licenses
All open source licences allow other people to take the source code for your software and modify it, as long as they give credit to you as an author. Popular and widely used open source software licenses are:
- Apache License 2.0
- BSD 3-Clause "New" or "Revised" license
- BSD 2-Clause "Simplified" or "FreeBSD" license
- GNU General Public License (GPL)
- GNU Library or "Lesser" General Public License (LGPL)
- MIT license
- Mozilla Public License 2.0
- Common Development and Distribution License
- Eclipse Public License
Please note that all open source licenses will allow the commercial use of your code. If you want to maintain control about commercial use of the software you can consider multiple licensing (DCC guide).
The Digital Curation Centre (DCC) has published an extensive guide on how to license data:
Ball, A. (2012). ‘How to License Research Data’. DCC How-to Guides. Edinburgh: Digital Curation Centre.
Another good overview including a table of licence types:
Korn, N., Oppenheim, C. (2011). ‘Licensing Open Data: A Practical Guide’ (pdf).
Tab Content: Data Access Statements
Why do you need a data access statement?
Data access statements are required for all publications arising from publicly-funded research. They are a requirement of many funders' data policies and are a requirement of the UKRI (Formerly RCUK) Policy on Open Access. Sometimes they are called data availability statements.
Some funders have indicated that they now check for the inclusion of data access statements in publications that acknowledge their support. The requirement applies to all papers that acknowledge EPSRC funding with a publication date after 1 May 2015 (see our EPSRC guide).
Does this mean all data needs to be published?
The aim of the data access statement is discoverability — the data referenced by the statement do not have to be openly available. There are many reasons why access to data should be restricted and if you are unsure about whether you should publish your data openly please contact firstname.lastname@example.org for advice.
Where to provide the data access statement?
We recommend one of the following options:
- Some journals (for example PLOS) now provide a separate section in articles for the data access statement.
- You can include the data access statement with the acknowledgement of funder support.
If these options are not available you can include a data statement in your main reference section.
What to include in the data access statement?
Depending on if your data is openly available or not one of the following options will apply:
- If data are openly available the name(s) of the data repositories should be provided, as well as any persistent identifiers or accession numbers for the dataset. The data repository could be Pure or an external data preservation archive.
- If there are justifiable legal or ethical reasons why your data cannot be made openly available, these should be included in the data access statement. In this case, the data access statement must direct users to a permanent record that describes any access constraints or conditions that must be satisfied for access to be granted. You can do this in Pure.
- If you did not collect the research data yourself but instead used existing data obtained from another source, this source should be credited.
Please note that a simple direction to interested parties to contact the author would not normally be considered sufficient.
When should I deposit data in order to get a DOI for my Data Access Statement?
The problem is that the paper has to reference the DOI of the dataset, but once a dataset has been given a DOI you cannot change the dataset. Therefore we recommend the following steps:
- In your draft paper's Acknowledgements, include a sentence like "The underlying data in this paper is available from http://dx.doi.org/10.17635/lancaster/researchdata/xxx." as it goes through the journal reviewing process.
- When the paper is accepted, generate and deposit the dataset. You will need to edit the paper's text to give the dataset DOI that you will receive from the Library when the dataset is validated. This version of your paper is the AAM (authors accepted manuscript), which you will also need to deposit in Pure to satisfy HEFCE's open access requirements.
- When the paper finally appears in print or online, update the dataset metadata to give the full details of the paper, including its DOI.
Example data access statements
Please note that the URLs/DOIs in these examples are not genuine.
Openly available data
"All data created during this research are openly available from Lancaster University data archive at http://dx.doi.org/10.17635/lancaster/researchdata/15."
"All data are provided in full in the results section / the supplementary section of this paper."
"Crystal structures are available from the Cambridge Crystallographic Data Centre (Identifier BATHRS) at http://dx.doi.org/10.15125/010203. Microscopy images are openly available from Dryad at http://dx.doi.org/10.17635/lancaster/researchdata/1."
"The 1962 birth cohort data can be accessed via the UK Data Service (http://ukdataservice.ac.uk/)."
Secondary analysis of existing data
"This study was a re-analysis of existing data that are publicly available from EMBL at http://dx.doi.org/10.15125/12345. Further documentation about data processing are available from the Lancaster University data archive at http://dx.doi.org/10.17635/lancaster/researchdata/3."
"The study brought together existing data obtained upon request and subject to licence restrictions from a number of different sources. Full details how these data were obtained are available in the documentation available at http://dx.doi.org/10.17635/lancaster/researchdata/28."
"Due to the (commercially, politically, ethically) sensitive nature of the research, no participants consented to their data being retained or shared. Additional details relating to the data are available from the Lancaster University data archive at http://dx.doi.org/10.17635/lancaster/researchdata/22."
"Anonymised interview transcripts from participants who consented to data sharing, plus other supporting information, are available from the UK Data Service, subject to registration, at http://dx.doi.org/10.17635/lancaster/researchdata/24."
Data available on request only: "Due to ethical concerns, supporting data cannot be made openly available. Further information about the data and conditions for access are available at the Lancaster University data archive: http://dx.doi.org/10.17635/lancaster/researchdata/123."
"Due to the (commercially, politically, ethically) sensitive nature of the research, no interviewees consented to their data being retained or shared. Additional details relating to other aspects of the data are available from the Lancaster University data archive at http://dx.doi.org/10.17635/lancaster/researchdata/222."
"Supporting data are available to bona fide researchers, subject to registration, from the UK Data Service at http://dx.doi.org/10.15125/12345."
"Supporting data will be available from Lancaster University research portal at http://dx.doi.org/10.17635/lancaster/researchdata/17 after a 6 month embargo from the data of publication to allow for commercialisation of research findings."
"Due to confidentiality agreements with research collaborators, supporting data can only be made available to bona fide researchers subject to a non-disclosure agreement. Details of the data and how to request access are available at Lancaster University research portal: http://dx.doi.org/10.15125/12345."
Citation of multiple datasets
"This publication is supported by multiple datasets, which are openly available at locations cited in the reference section."
Physical data (samples, specimens, paper collections etc.)
"Non-digital data supporting this study are stored by the corresponding author at Lancaster University. Details of how to request access to these data are provided in the documentation available from the Lancaster University data archive at http://dx.doi.org/10.17635/lancaster/researchdata/42."
No new data created
"No new data were created during this study."
"This research did not produce new data, other data sources are referenced throughout the paper".
All data are included in paper
"All relevant data are within the paper and its Supporting Information files."
Examples from Journals
Data Availability statement example 2 from a PLOS article (below Figures just under Copyright notice)
Examples from Springer Nature.
We acknowledge the work of the University of Bath in the development of this guidance.