Cognitive development Respiratory Tract Illness and Effects of eXposure (CORTEX) project: Data processing challenges in combining high spatial resolution pollution level data with individual level health and education data

Lyons, Jane, Mizen, Amy, Rodgers, Sarah, Berridge, Damon, Akbari, Ashley, Wilkinson, Paul, Milojevic, Ai, Doherty, Ruth, Dearden, Lorraine, Lake, Iain ORCID: https://orcid.org/0000-0003-4407-5357, Carruthers, David, Strickland, Sarah, Mavrogianni, Anna and Davies, Gwyneth (2018) Cognitive development Respiratory Tract Illness and Effects of eXposure (CORTEX) project: Data processing challenges in combining high spatial resolution pollution level data with individual level health and education data. International Journal of Population Data Science, 3 (2). ISSN 2399-4908

Full text not available from this repository. (Request a copy)

Abstract

Background and Objectives: There is a lack of evidence of the adverse effects of air pollution and pollen on cognition for people with air quality-related health conditions. The CORTEX project combined routinely collected health and education data, high spatial resolution air pollution modelling, and daily pollen measurements for 18,241 pupils living in Cardiff, UK, between 2009 and 2015, to investigate the acute effects of air quality and respiratory conditions on education attainment. Datasets: Air pollutants PM2.5, PM10, NO2, and ozone levels were modelled for 157,361 home and school locations, anonymised into the Secure Anonymised Information Linkage (SAIL) Databank, and summarised into minimum, average and maximum readings for 4 daily time periods reflecting pupil home/school exposure. Adding a unique Residential Anonymised Linking Field (RALF) allowed linkage of pollution estimates to individual level data. Annual pollution datasets contained 369 columns and 472,083-rows, with one column per location, pollutant, daily time-period and day of year. Dataset transformation produced a 5 column, 3,446,205,900-row matrix per year. Methods and Conclusions: An algorithm using Structured Query Language (SQL) to manage data held within a relational database management system, was designed to reduce dimensionality from 24 billion to 18,241 rows of data. The algorithm calculated average means for each pollutant (PM2.5, PM10, NO2, and ozone levels) over the revision and examination periods, and summarised data into one row per pupil. The algorithm adjusted for weekends, school, and bank holidays, it calculated daily pollutant exposure for each pupil, and successfully linked 95% of pupil pollution exposures to their health and education data.

Item Type: Article
Faculty \ School: Faculty of Science > School of Environmental Sciences
University of East Anglia Research Groups/Centres > Theme - ClimateUEA
UEA Research Groups: University of East Anglia Schools > Faculty of Science > Tyndall Centre for Climate Change Research
Faculty of Science > Research Centres > Tyndall Centre for Climate Change Research
Faculty of Science > Research Groups > Environmental Social Sciences
Depositing User: LivePure Connector
Date Deposited: 12 Sep 2018 15:30
Last Modified: 20 Mar 2023 14:44
URI: https://ueaeprints.uea.ac.uk/id/eprint/68261
DOI: 10.23889/ijpds.v3i2.534

Actions (login required)

View Item View Item