Evaluation of multi-variate time series clustering for imputation of air pollution data

Alahamade, Wedad, Lake, Iain ORCID: https://orcid.org/0000-0003-4407-5357, Reeves, Claire ORCID: https://orcid.org/0000-0003-4071-1926 and De La Iglesia, Beatriz ORCID: https://orcid.org/0000-0003-2675-5826 (2021) Evaluation of multi-variate time series clustering for imputation of air pollution data. Geoscientific Instrumentation, Methods and Data Systems, 10. 265–285. ISSN 2193-0856

[thumbnail of Accepted_Manuscript]
Preview
PDF (Accepted_Manuscript) - Accepted Version
Available under License Creative Commons Attribution.

Download (5MB) | Preview
[thumbnail of Published_Version]
Preview
PDF (Published_Version) - Published Version
Available under License Creative Commons Attribution.

Download (5MB) | Preview

Abstract

Air pollution is one of the world's leading risk factors for death, with 6.5 million deaths per year worldwide attributed to air pollution-related diseases. Understanding the behaviour of certain pollutants through air quality assessment can produce improvements in air quality management that will translate to health and economic benefits. However problems with missing data and uncertainty hinder that assessment. We are motivated by the need to enhance the air pollution data available. We focus on the problem of missing air pollutant concentration data either because a limited set of pollutants is measured at a monitoring site or because an instrument is not operating, so a particular pollutant is not measured for a period of time. In our previous work, we have proposed models which can impute a whole missing time series to enhance air quality monitoring. Some of these models are based on a Multivariate Time Series (MVTS) clustering method. Here, we apply our method to real data and show how different graphical and statistical model evaluation functions enable us to select the imputation model that produces the most plausible imputations. We then compare the Daily Air Quality Index (DAQI) values obtained after imputation with observed values incorporating missing data. Our results show that using an ensemble model that aggregates the spatial similarity obtained by the geographical correlation between monitoring stations and the fused temporal similarity between pollutants concentrations produced very good imputation results. Furthermore, the analysis enhances understanding of the different pollutant behaviours, and of the characteristics of different stations according to their environmental type.

Item Type: Article
Faculty \ School: Faculty of Science > School of Computing Sciences
Faculty of Science > School of Environmental Sciences
University of East Anglia Research Groups/Centres > Theme - ClimateUEA
UEA Research Groups: Faculty of Science > Research Groups > Environmental Social Sciences
University of East Anglia Schools > Faculty of Science > Tyndall Centre for Climate Change Research
Faculty of Science > Research Centres > Tyndall Centre for Climate Change Research
Faculty of Science > Research Groups > Centre for Ocean and Atmospheric Sciences
Faculty of Medicine and Health Sciences > Research Centres > Norwich Institute for Healthy Aging
Faculty of Medicine and Health Sciences > Research Centres > Business and Local Government Data Research Centre (former - to 2023)
Faculty of Science > Research Groups > Data Science and Statistics
Depositing User: LivePure Connector
Date Deposited: 08 Oct 2021 00:56
Last Modified: 10 May 2023 00:05
URI: https://ueaeprints.uea.ac.uk/id/eprint/81606
DOI: 10.5194/gi-2021-11

Downloads

Downloads per month over past year

Actions (login required)

View Item View Item