Al Ghamdi, Mostafa (2022) Heterogeneous machine learning ensembles for predicting train delays. Doctoral thesis, University of East Anglia.
Preview |
PDF
Download (2MB) | Preview |
Abstract
Train delays are a serious problem in the UK and other countries. Much research has gone into developing methods for predicting train delays. Most of these methods use only single models or homogeneous ensembles and their performance in terms of accuracy and consistency in general is unsatisfactory. We have therefore developed heterogeneous ensembles that use different types of regression models with an aim of improving their prediction performance.
We first looked at a wide range of base-learner models, including the state-of-the-art methods, Random Forest and XGBoost. Overall, our ensembles were more accurate than any of these single models.
We developed two methods for model selection when building the ensemble, the first uses accuracy and the second uses accuracy and diversity. We found that using accuracy resulted in the most accurate ensembles. We adapted the Coincident Failure Diversity measure for regression and compared its effectiveness with other diversity measures. While it proved the best, overall, we found no relationship between ensemble accuracy and diversity in the regression context. We also investigated the effect of ensemble size.
We compared the performance of our ensembles with the deep learning methods CNN and Tabnet and found that our ensembles were more accurate. However, ensembles of deep learning models proved to be more accurate than those of single machine learning models.
Item Type: | Thesis (Doctoral) |
---|---|
Faculty \ School: | Faculty of Science > School of Computing Sciences |
Depositing User: | Chris White |
Date Deposited: | 11 Jul 2023 10:06 |
Last Modified: | 11 Jul 2023 10:06 |
URI: | https://ueaeprints.uea.ac.uk/id/eprint/92583 |
DOI: |
Downloads
Downloads per month over past year
Actions (login required)
View Item |