Opening up Science in the 21st Century – the role of virtual labs.


Posted on

Image of a cloud with two leaves inside.  'DataLabs' written underneath.

In a world where collaborative and transparent science are taking centre stage, there has been a drive to using digital technologies to break down barriers and enable experts from different domains to work together. In the field of Environmental Data Science, there is an ever increasing volume, variety and velocity of environmental datasets in which scientists and decision makers wish to gain useful information from. Statisticians and data scientists specialise in methods that can harvest the required information from the datasets, however barriers often exist in sharing these methods between the respective groups. Such barriers can often be limited access to high powered computing to run the analysis through to a common workspace to develop the methods and share results with decision makers. Furthermore, national funders of research projects are increasingly requiring the digital process of analysis to be more open and share all assumptions and methodology developed to come to a particular result. Figure 1: Overview of DataLabs (Source: Hollaway et al. 2020).

In response to this, through collaboration between Lancaster University and UKCEH, the concept of a ‘data science lab’ has been published in a new paper (Hollaway et al., 2020) along with a description of the initial implementation known as ‘DataLabs’. In essence a DataLab is a digital research platform that sits in the cloud and requires only a web browser to access high powered computer and big data storage in which to access, develop and share complex analytical methods. As shown in Figure 1, DataLabs provide an area of collaboration that serves users of different levels of technical expertise. This can range from interacting with data and methods at a raw code level through to using dashboard applications to execute analysis and visualise the results without seeing a single digit of code. The concept behind a DataLab also specifies that it is dynamic and tailorable in nature. Therefore should users discover a particular lab that they wish to use the method and/or data from, it should be easily adaptable to the new challenge at hand. Under the hood, the DataLab environment is integrated with the underlying infrastructure so that it can evolve to changing user requirements. For example, if a user requires a high level of computational power for a particular part of the analysis, the cloud based architecture can adapt to this requirement. As part of support for method development, DataLabs support an iterative approach to End-to-End support for a particular workflow. This covers the processes from ingestion of the data through to presentation of the final results whereby experts interact at each stage to produce the final result. Finally the DataLabs must act as a trusted broker for the analysis including maintaining provenance of all decisions made so that all users can stakeholders can fully understand the assumptions made in producing the end result.

DataLabs are beginning to play a prominent role in the Data Science of the Natural Environment (DSNE) project enabling environmental scientists, statisticians, data scientists, computer scientists to work together to try and tackle some of environmental sciences grand challenges. These experiences include developing new evaluation methods of climate models centred on changepoint and fuzzy logic and understanding air quality in the UK using Gaussian process models. It is hoped that through these experiences and a detailed research roadmap set out in the DataLabs paper, discussion can be promoted in the international community to take the concept forward and enable DataLabs to support an ever increasing need for open and collaborative science.

The DataLabs paper is published in the Patterns Journal, by Cell Press and is available online from Thursday 17th September.

References

Hollaway et al., Tackling the Challenges of 21st-Century Open Science and Beyond: A Data Science Lab Approach, Patterns (2020), https://doi.org/10.1016/j.patter.2020.100103


Disclaimer

The opinions expressed by our bloggers and those providing comments are personal, and may not necessarily reflect the opinions of Lancaster University. Responsibility for the accuracy of any of the information contained within blog posts belongs to the blogger.


Back to blog listing