
Ethics Framework
Find out how we developed our ethical framework and how we look after participant data from forums, surveys and interviews.
There are very important ethical issues in analysing forum posts. People often share details about the things that are causing them distress, in the hope that other people who have faced similar problems can help them. It is vital that the forum feels a safe space in which to do this. We do not want this research to jeopardise this feeling of safety in any way. Therefore, we have developed a comprehensive ethical framework for this study. This has been developed with input from legal, clinical, academic and lived expertise, and approved by the Health Research Authority (IRAS 314029).
As the project progresses we may need to make changes to how the study is conducted. Any changes will be approved by the study sponsor and the ethics committee and will be updated here for information.
We hope that sharing this framework is helpful. We appreciate that some individuals may have further concerns or questions and we are happy to address these. Please email ipof@lancaster.ac.uk.
Ethics Framework Development Process accordion Accordion
-
Framework Development Process
We have developed a data processing strategy based on the following principles and activities.
- We have adopted a user-centred approach, taking the perspective of a person using a forum, and considering their likely expectations and concerns. This is consistent with recommendations made by Perez Vallejos et al 2019.
- We have consulted widely with multiple mental health forum hosts, moderators and users across 5 mental health forums, and have hosted an online focus group (N = 21) to explore the benefits and risks of this research and how best to mitigate these.
- We have examined key guidance relevant to research using online data including:
- We have collaborated with the legal experts at the host NHS Trust (Berkshire Healthcare NHS Foundation Trust) and the information officer at Lancaster University.
- We have drafted our strategy to be consistent with a) Common law duty of confidentiality, a broad principle of law that a person who receives information from another party in confidence cannot take advantage of it. b) GDPR – the data is personal de-identified data. It will be processed by Lancaster University under “task in the public interest” article 6 (1(e )). As the data pertains to health then this is special category data – and the processing can occur under “necessary for scientific research” in accordance with safeguards (Article 9 (2 (j)))
- We have read widely to learn how other researchers in the field have approached their ethical strategy and had face-to-face meetings with key leaders in this field.
Finally, and most importantly, we will continue to design our study and all recruitment materials with hosts, moderators and members of the communities, to ensure full transparency and a user-centric approach.

Forum data
Click on the points below to find out more about how we manage data from forums.
Forum Accordion Accordion
-
Collection
- We will look at the policies of the forum website including all Terms and Conditions, that members are requested to sign up to. If research is prohibited, then we will not use data from this forum.
- If the forum posts can be accessed or viewed without requiring an account, or sign-in (eg Reddit) then we can assume that users do not expect posts to be private and that they use the site knowing the posts are available to everyone. This data is considered public and therefore does not come under GDPR and individual consent for its use in research is not required (BPS guidelines). This kind of data has been used extensively in research to date. However, consistent with good practice guidance, we will only use data from forums where the forum host agrees to this and is willing to collaborate with the research.
- In forums where users give consent for posts to be used in research at the point of sign up AND this consent is given independently of agreeing to the terms and conditions of use of the site i.e. “freely given” such that the person can still use the site if they do not consent to research, then only posts from users who gave consent will be used.
- If consent to research is embedded in accepting the terms and conditions of the site, then although participants have technically approved their posts to be used in research, they are not able to use the site without giving this consent. We will use these posts but only where it is possible to ensure all users have been made aware of the option to opt out.
- If the forum posts are hosted by an NHS or social care organisation, and there are any links between the users in the forum and their medical or social care records, then we will seek the individual consent of each person. This more stringent approach is used in line with “Common law duty of confidentiality” because posts are linked to confidential healthcare data and were collected for personal healthcare rather than research. The consent requested will be for sharing of de-identified posts.
- If there is no consent for research given during the sign-up process for the forum then we will seek individual informed consent for use of the forum posts. The consent will be for sharing of de-identified data only.
- Consistent with current practice, archived posts in forums in which people consented to their posts being used for research at sign up, and that can be fully anonymised i.e. there are no longer any links to personally identifiable information, and usernames can be replaced by PIN with no record retained to allow reidentification (e.g. archived Reddits), will be used without individual consent.
- In all forums we will adhere to the principles of data minimisation. We will only collect the minimum data needed to answer our research questions, and access to the data will be restricted to the minimum number of identified researchers required. They will be fully trained in Information Governance and Good Clinical Practice and will hold honorary contracts or research passports with any participating NHS or social care participating sites.
-
De-Identification
- Where forum posts are publicly visible without restriction, or the data is fully anonymised, then posts they will be shared with Lancaster University directly. Prior to analysis, we will then replace all usernames with a PIN and automatically remove any person and place names.
- For all other forums, posts will be de-identified by the host organisation before being shared. Usernames will be replaced by PIN and the codesheet linking these will be retained by the host. Data will be transferred to Lancaster University, where a further screen will be done to check and remove any person or place names from the posts.
- For forum users who consent to take part in the online survey/interview AND consent for us to link their survey or interview responses, with their forum posts, we will ask them to provide their username/ email. We will then ask the host to provide with the PIN for this username so we can make the link.
- For anyone who contacts us wishing to withdraw from any part of the study after the data has been transferred, we will ask for their username and request the PIN via the host. Alternatively, if they approach the host to request a withdrawal, the host will provide us with PIN to withdraw their data.
-
Transfer
- During the project, we will arrange secure and encrypted data transfer (e.g. secure FTP and HTTPS as recommended by Lancaster’s Information Governance Manager & Data Protection Officer) from the online communities. In addition, we will abide by any host requirements (such as remote login to their host server for analysis) as requested and set out in individual data-sharing agreements.
-
Storage
- All data will be collected and stored securely using Lancaster University's approved IT systems and services in accordance with our Information Security Policy which is aligned with good industry practice and controls as defined in the ISO27000 family of standards. All files saved within university-approved secure cloud storage are held on Lancaster University’s secure servers. Access will be restricted to individual research team members who require it. External members of the research team involved in data collection and analysis will be provided access to university-approved secure cloud storage on Lancaster University’s secure server, accessed via a password-protected account.
- Any online data that includes potentially identifiable data including usernames, or consent forums, will be stored separately from the de-identified data. Any personal contact data will be destroyed at the end of the study period.
- De-identified research data will be archived for 10 years following the end of the project. Audio and video recordings will be kept until findings from the research project have been published. At the end of the default retention period (10 years), all data will be confidentially destroyed by a secure method
-
Analysis
- All analysis of forum data will use de-identified data.
- Data analysis will be conducted by methodological expert members of the research team who have completed information governance training and good clinical practice. The detailed methods of corpus linguistics and Natural Language Processing are described in the detailed research plan.
-
Deletion
All participants have the right to withdraw and request that their data be deleted.
This will be clearly stated on the PIS along with contact details for the research team and forum hosts.
We will make clear that this is only possible, up to the point at which the analysis has started (at least one week following the data collection), after which time, they can request that none of their data is used in paraphrased quotes, but it may have contributed to the aggregated overall analysis.
-
Publication
To further prevent the identification of any participants, we will not use direct quotes from forum posts. All forum sites requiring a login are encrypted so that direct quotes cannot be traced back to the user through online searching.
However, it may be that someone could recognise their own quote, and some open-access forums that do not require a login, are not encrypted. Therefore, we will paraphrase quotes. We will ensure that the meaning and relevant linguistic characteristics of the original quote are maintained (agreed by interrater consensus).
We will further anonymise conversations by adding and removing filler words or rephrasing content with less unique word choices or phrases and repeating this process independently by two researchers.
-
Archiving & Access
All papers will be published open access. We are big supporters of open access and data-sharing principles. However, given the nature of the data, we will not share the forum datasets openly.
Survey and Interview Data
Click on the points below to find out more about how we manage data from surveys and interviews.

Accordion
-
Collection
- All new data collected for the study i.e. survey and interview data, will be collected with individual informed consent. Active forums collaborating in the study will invite users to take part in the online survey and/or interview to explore their experiences of using online forums. The recruitment strategy and all materials will be co-designed with the forum hosts moderators and community members through our Expert Groups.
- Survey data will be collected using Qualtrics- In order to describe the sample and ensure we can manage any risk issues that may arise, we need to ask people to provide demographic details including their name and contact details when they consent.
- After completing the measures/interviews, participants will be thanked, rewarded, and directed to mental health support resources (see “Resources” sheet). Consistent with NIHR guidance, participants will be offered a financial honorarium for providing new data for the study
-
De-Identification
To facilitate linkage between the survey and/or interview and forum data, participants will also have the option to provide their chosen username from the forum, and/or email used in the survey.
However, this will be a separate consent item so they have the option to take part in the survey and/or interview without identifying their username if they wish.
-
Transfer
- Survey data will be collected via a link directly into Lancaster University's system, storing data to the University's secure servers.
- Interviews will be conducted remotely via Microsoft Teams and recorded using an encrypted audio recorder. Audio files will be uploaded to a secure server as soon as possible and deleted from the recorder. Transcription will be done by a University approved and contracted transcriber, and de-identified.
-
Storage
- All data will be collected and stored securely using Lancaster University's approved IT systems and services in accordance with our Information Security Policy which is aligned with good industry practice and controls as defined in the ISO27000 family of standards. All files saved within university-approved secure cloud storage are held on Lancaster University’s secure servers. Access will be restricted to individual research team members who require it. External members of the research team involved in data collection and analysis will be provided access to university-approved secure cloud storage on Lancaster University’s secure server, accessed via a password-protected account.
- Any online data that includes potentially identifiable data including usernames, or consent forums, will be stored separately from the de-identified data. Any personal contact data will be destroyed at the end of the study period.
- De-identified research data will be archived for 10 years following the end of the project. Audio and video recordings will be kept until findings from the research project have been published. At the end of the default retention period (10 years), all data will be confidentially destroyed by a secure method
-
Analysis
- Analysis of survey data within cases will include a detailed description of the community sample and summary statistics on mental health outcomes. Analysis of data across cases will use multilevel data using generalized mixed models and generalized structural equation models using the Mplus (v8.6) software package.
- Analysis of interview data will be managed in NVIVO. Consistent with the realist approach, analysis will be retroductive i.e. seek to identify the hidden causal forces underlying people’s descriptions of their experiences in the forums. These will be coded into our hypothesised CMO configurations from WorkStream 1. They will add to, elaborate, refine, and refute CMOs as appropriate to develop our programme theory. Interviews will be coded and Interviews and initial coding will be done by our researchers, but the analysis will be developed through regular discussion with the co-app team and our Expert Groups
-
Deletion
Survey participants have up to 1 week to remove their data if they wish. Interviews are with identified participants who can withdraw data. We will make clear that this is only possible, up to the point at which the analysis has started (at least one week following the data collection), after which time, they can request that none of their data is used in paraphrased quotes, but it may have contributed to the aggregated overall analysis.
-
Publication
For survey data, we will adopt minimum cell sizes for published results, in line with best practices (such as those from the Office for National Statistics). Direct quotes will be used from interviews where consent is given for this, though any potentially identifying information including reference to names or places, will be removed.
-
Archiving & Access
De-identified survey data will be openly shared on PURE. Interview data will be restricted access and available by request to legitimate research parties assessed on a case-by-case basis if the purpose is consistent with the consent given for this research. Best practice tools will be disseminated by the host Trust and in accordance with the implementation plan developed in workstream 3. They will not refer to any identifiable data.