Alahamade, Wedad, Lake, Iain ORCID: https://orcid.org/0000-0003-4407-5357, Reeves, Claire E. ORCID: https://orcid.org/0000-0003-4071-1926 and De La Iglesia, Beatriz ORCID: https://orcid.org/0000-0003-2675-5826 (2020) Clustering Imputation for Air Pollution Data. In: Hybrid Artificial Intelligent Systems. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) . Springer, Cham, pp. 585-597. ISBN 978-3-030-61705-9
Preview |
PDF (Accepted_Manuscript)
- Accepted Version
Download (2MB) | Preview |
Abstract
Air pollution is a global problem. The assessment of air pollution concentration data is important for evaluating human exposure and the associated risk to health. Unfortunately, air pollution monitoring stations often have periods of missing data or do not measure all pollutants. In this study, we experiment with different approaches to estimate the whole time series for a missing pollutant at a monitoring station as well as missing values within a time series. The main goal is to reduce the uncertainty in air quality assessment. To develop our approach we combine single and multiple imputation, nearest neighbour geographical distance methods and a clustering algorithm for time series. For each station that measures ozone, we produce various imputations for this pollutant and measure the similarity/error between the imputed and the real values. Our results show that imputation by average based on clustering results combined with multiple imputation for missing values is the most reliable and is associated with lower average error and standard deviation.
Downloads
Downloads per month over past year
Actions (login required)
View Item |