statistical challenges in data fusion
Posted on

Data fusion is a process of synthesising data at multi-level and from different sources to obtain the most accurate and complete information, especially for environmental monitoring purposes. Professor Claire Miller, from University of Glasgow, highlighted her recent projects focusing on the applications for water quality and catchment, and talked about data fusion in the context of environmental statistics. This blog summarises Claire’s seminar delivered to DSNE on Thursday 7th October.
Facing the problem of either lake and river water pollution or issues in terms of water abstraction, multiple potential data sources are obtained to aid in the monitoring of lake and river water quality and catchment management. Data include, for example, those collected from long-term monitoring programmes, automatic sensor monitoring data, data from hyperspectral drones and data from processed satellite retrievals. However, information gaps still exist making global challenges associated with water pollution or water abstraction difficult to address. Where data do exist, the challenge lies in appropriately combining the available data streams to fill the knowledge gaps by providing improved estimation and prediction of, for example, water quality or quantity. Claire’s presentation gave examples of the statistical challenges presented in such a context through specific applications from the NERC GloboLakes project, the EPSRC GCRF Ramganga Water Data Fusion project and the NERC Digital Environment programme project on creating a 'digital infrastructure to support management of water resources'.
The NERC GloboLakes project Globolakes Overview (stir.ac.uk) is a research programme investigating the state of lakes and their response to climatic and other environmental drivers of change at a global scale using long-term satellite observations. The main aim of the project was to look at lake colour and to transform that into information on water quality for 1000 lakes across the world, and the aim of data fusion for in-situ and satellite data was to increase information on spatio-temproal trends for lake water quality determinants. Lake Balaton (Wilkie et al. 2019) was studied as an example using a method of nonparametric statistical downscaling, which enables the fusion of data of different spatiotemporal support through treating the data at each location as observations of smooth functions over time. This is incorporated within a Bayesian hierarchical model with smoothly spatially varying coefficients, which provides predictions at any location or time, with associated estimates of uncertainty. The method is motivated by an application for the fusion of in situ and satellite remote sensing log(chlorophyll-a) data from Lake Balaton, in order to improve the understanding of water quality patterns over space and time.
The Ramganga Water Data Fusion project is funded by the UK Global Challenges Research Fund with the aim of informing work such as risk-based modelling and developing future monitoring design to improve mitigation efforts. Ramganga, a tributary of the river Ganga in northern India, is a viral resource of water for millions of lives. A variety of data are available for investigating quality of freshwater ecosystems and catchments. New sources of data such as satellites, drones and sensors provide better spatial and temporal coverage of the river network. The NERC Digital Environment programme project aims at developing and utilising a network of smart sensors within a river catchment to improve the understanding and regulation of water abstraction. The goal of the project was to instrument a rural catchment in Scotland, with new, small sensors for river level and soil moisture, and to provide near real time information on water quantity in a mall river network to stakeholders, including farmers.
To sum up, statistical challenges for data fusion include quality of data streams, missing or sparse data, varying levels of uncertainty depending on data production, variety in spatio-temporal scales, and computation efficiency. Novel efficient approaches across statistics and data analytics can be used to combine all that information and the particular approach through smoothing, functional data analysis and Bayesian inference could provide the most informative overall estimated picture to assess water quality with associated uncertainty.
References:
Gong, M., Miller, C., Scott, M. et al. State space functional principal component analysis to identify spatiotemporal patterns in remote sensing lake water quality. Stoch Environ Res Risk Assess (2021). https://doi.org/10.1007/s00477-021-02017-w
Maberly, S.C., O’Donnell, R.A., Woolway, R.I. et al. Global lake thermal regions shift under climate change. Nat Commun 11, 1232 (2020). https://doi.org/10.1038/s41467-020-15108-z
Spyrakos, E., O'Donnell, R., Hunter, P.D., Miller, C., Scott, M., Simis, S.G.H., Neil, C., Barbosa, C.C.F., Binding, C.E., Bradt, S., Bresciani, M., Dall'Olmo, G., Giardino, C., Gitelson, A.A., Kutser, T., Li, L., Matsushita, B., Martinez-Vicente, V., Matthews, M.W., Ogashawara, I., Ruiz-Verdú, A., Schalles, J.F., Tebbs, E., Zhang, Y. and Tyler, A.N. (2018), Optical types of inland and coastal waters. Limnol. Oceanogr., 63: 846-870. https://doi.org/10.1002/lno.10674
Wilkie, CJ, Miller, CA, Scott, EM, et al. Nonparametric statistical downscaling for the fusion of data of different spatiotemporal support. Environmetrics. 2019; 30:e2549. https://doi.org/10.1002/env.2549
Header image credit: Wilkie et al. (2019)
Professor Claire Miller is the Head of Statistics at University of Glasgow. Information on her research group is available at https://www.gla.ac.uk/schools/mathematicsstatistics/research/stats/ai3/analytics/environmentalai/digitalenvironment/
Related Blogs
Disclaimer
The opinions expressed by our bloggers and those providing comments are personal, and may not necessarily reflect the opinions of Lancaster University. Responsibility for the accuracy of any of the information contained within blog posts belongs to the blogger.
Back to blog listing