Legal and Ethical Issues within Research Data Management
Navigating the legalities and ethics of research data can be a daunting task, but the information and resources below will help you to get started. Some of the advice only applies to personal data, but not all of it, so this is something that needs to be considered for all datasets.
Accordion
-
Lawful Basis for Processing Data
If you are working with people's personal data, that is data about living people from which they can be identified, where either yourself or the research participant are in the EU, then the General Data Protection Regulation (GDPR) will apply to your research. This states that there must be a valid reason for processing personal data, and there are six of these reasons that the GDPR recognises. Most likely your choice will be either: carried out in the public interest or in the exercise of official authority (public task), consent of the data subject, or legitimate interest pursued by controller.
Alongside this purpose limitation, you must also ensure that personal data you collect is the minimum amount required to undertake your research, and consider secure storage, accuracy and transparency when handling this data.
More information is available at the links below:
Lancaster University - GDPR, what researchers need to know
UK Data Service - Applying GDPR in Research
Information Commissioner's Office - Lawful Basis for Processing
-
Informed Consent
Informing participants of details of the research process is an important and well established consideration. Regarding research data management, this should include your plans to collect, publish and share data, as it is much more difficult to go back and get consent for these activities at a later time. If consent is not saught, then this will restrict what is possible to do when publishing and sharing data, and it is not advisable to offer complete deletion or restriction of all data at this stage.
More information is available at the links below:
-
Disclosure Assessment
Assessing your data for the risk of disclosure of an individual can be thought of in a systematic way. Techniques for accomplishing this mainly include checking for direct and indirect identifiers and determining outliers, as well as analysing the environment and potential risks outside of the dataset, for example, when combined with other external data.
More information is available at the links below:
UK Data Service - Disclosure Assessment
Information Commissioner's Office - Data Protection Impact Assessments
Office for National Statistics - Best Practice for Applying Disclosure Control
-
Anonymisation Techniques
Many sharing issues can be overcome with suitable anonymisation techniques, and these may or may not be appropriate for use with different types of data and within different disciplines. Sometimes a balance must be found between disclosure risk and retaining the value and useability of the dataset.
Some widely used examples are:
- removing direct or indirect identifiers
- removing links between identifiers
- swapping or shuffling identifiers
- using aggregated ranges rather than specific values
- removing outliers
- perturbing data
More information is available at the links below:
UK Data Service - Anonymisation
CESSDA Data Management Expert Guide - Anonymisation
Information Commissioner's Office - Anonymisation Code of Practice
-
Security and Storage
ISS provide advice and support with File Storage, Sharing and Collaboration Services, for:
- select appropriate university systems
- request additional storage
ISS also provide advice and support with Security of Data and Information, for:
- access and storage technical details
- sharing securely with external partners
- secure disposal
The systems are not ISO 27001 certified, but do adhere to the principles.
-
Access Control
Making clear how archived datasets can be accessed and under what conditions is an important part of good research data management.
There are some robust and widely understood reasons why research data may need to be restricted or closed, which are provided for within open data and open science practices, and these include:
- personal, sensitive and confidential data that can't be anonymised
- commercially sensitive data or data with commercial potential
- data that has the potential to compromise safety and security
- data that you are not permitted to share because of licensing or intellectual property restrictions
Generally three levels of access are available:
- open data: data that is publicly available
- safeguarded, mediated, or embargoed data: data that is shared under certain conditions
- closed, restricted or controlled data: data that is only shared within a research team and/or with stakeholders
If the middle option is appropriate, there are several procedures that could be to put in place:
- requests go directly to the author and are decided case by case
- a data repository may place automated restrictions, for example, requiring registration
- datasets may be embargoed, which means they are unavailable for a certain amount of time
- requesters are required to sign a non-disclosure agreement or data sharing agreement prior to the dataset being released
UK Data Service - Access Control
Many publishers will require that you write a data access statement, which is a short description of where the data that accompanies your research paper is held and criteria for gaining access to it. There are three main mechanisms for providing access:
- access to data is provided on direct request
- data is deposited in a data repository
- data is published within the journal paper
-
Licensing
Licensing concerns communicating how datasets can be used once they have been accessed, and there are options for standard, prepared and bespoke licensing.
Generally research data are released under a Creative Commons license. These are a set of 6 standard licenses with different attributions, the most permissive and the default choice being a CC-BY license, that permits any re-use as long as the author is credited.
Lancaster University - Choosing a License
-
Intellectual Property
Intellectual Property rights protect many types of work, including datasets, and it encompasses both copyright and patents. While facts themselves aren't protected, facts presented in a particular form as datasets or databases are.
In most circumstances, Lancaster University owns all research data, but allows Principal Investigators to be the data steward. In addition, funders don't own the research data, but can specify what happens to it in order to release funds.
Further detailed support is available from the teams below:
Lancaster University - Copyright
Lancaster University - Intellectual Property
And further information on how IP and Copyright applies to datasets can be found at the links below: