Machine learning–based asthma attack prediction models from routinely collected electronic health records: Systematic scoping review

Budiarto, Arif, Tsang, Kevin C. H., Wilson, Andrew M., Sheikh, Aziz and Shah, Syed Ahmar (2023) Machine learning–based asthma attack prediction models from routinely collected electronic health records: Systematic scoping review. JMIR AI, 2. ISSN 2817-1705

[thumbnail of ai-2023-1-e46717]
Preview
PDF (ai-2023-1-e46717) - Published Version
Available under License Creative Commons Attribution.

Download (284kB) | Preview

Abstract

Background: An early warning tool to predict attacks could enhance asthma management and reduce the likelihood of serious consequences. Electronic health records (EHRs) providing access to historical data about patients with asthma coupled with machine learning (ML) provide an opportunity to develop such a tool. Several studies have developed ML-based tools to predict asthma attacks. Objective: This study aims to critically evaluate ML-based models derived using EHRs for the prediction of asthma attacks. Methods: We systematically searched PubMed and Scopus (the search period was between January 1, 2012, and January 31, 2023) for papers meeting the following inclusion criteria: (1) used EHR data as the main data source, (2) used asthma attack as the outcome, and (3) compared ML-based prediction models’ performance. We excluded non-English papers and nonresearch papers, such as commentary and systematic review papers. In addition, we also excluded papers that did not provide any details about the respective ML approach and its result, including protocol papers. The selected studies were then summarized across multiple dimensions including data preprocessing methods, ML algorithms, model validation, model explainability, and model implementation. Results: Overall, 17 papers were included at the end of the selection process. There was considerable heterogeneity in how asthma attacks were defined. Of the 17 studies, 8 (47%) studies used routinely collected data both from primary care and secondary care practices together. Extreme imbalanced data was a notable issue in most studies (13/17, 76%), but only 38% (5/13) of them explicitly dealt with it in their data preprocessing pipeline. The gradient boosting–based method was the best ML method in 59% (10/17) of the studies. Of the 17 studies, 14 (82%) studies used a model explanation method to identify the most important predictors. None of the studies followed the standard reporting guidelines, and none were prospectively validated. Conclusions: Our review indicates that this research field is still underdeveloped, given the limited body of evidence, heterogeneity of methods, lack of external validation, and suboptimally reported models. We highlighted several technical challenges (class imbalance, external validation, model explanation, and adherence to reporting guidelines to aid reproducibility) that need to be addressed to make progress toward clinical adoption.

Item Type: Article
Faculty \ School: Faculty of Medicine and Health Sciences > Norwich Medical School
UEA Research Groups: Faculty of Medicine and Health Sciences > Research Groups > Respiratory and Airways Group
Faculty of Medicine and Health Sciences > Research Groups > Cardiovascular and Metabolic Health
Faculty of Medicine and Health Sciences > Research Centres > Metabolic Health
Depositing User: LivePure Connector
Date Deposited: 18 Dec 2024 01:39
Last Modified: 16 Jan 2025 01:08
URI: https://ueaeprints.uea.ac.uk/id/eprint/98020
DOI: 10.2196/46717

Downloads

Downloads per month over past year

Actions (login required)

View Item View Item