Dynamic ensemble selection methods for heterogeneous data mining

Ballard, Chris and Wang, Wenjia (2016) Dynamic ensemble selection methods for heterogeneous data mining. In: 12th World Congress on Intelligent Control and Automation (WCICA), 2016. The Institute of Electrical and Electronics Engineers (IEEE), CHN. ISBN 978-1-4673-8415-5

[thumbnail of Accepted manuscript]
Preview
PDF (Accepted manuscript) - Accepted Version
Download (606kB) | Preview

Abstract

Big data is often collected from multiple sources with possibly different features, representations and granularity and hence is defined as heterogeneous data. Such multiple datasets need to be fused together in some ways for further analysis. Data fusion at feature level requires domain knowledge and can be time-consuming and ineffective, but it could be avoided if decision-level fusion is applied properly. Ensemble methods appear to be an appropriate paradigm to do just that as each subset of heterogeneous data sources can be separately used to induce models independently and their decisions are then aggregated by a decision fusion function in an ensemble. This study investigates how heterogeneous data can be used to generate more diverse classifiers to build more accurate ensembles. A Dynamic Ensemble Selection Optimisation (DESO) framework is proposed, using the local feature space of heterogeneous data to increase diversity among classifiers and Simulated Annealing for optimisation. An implementation example of DESO — BaggingDES is provided with Bagging as a base platform of DESO, to test its performance and also explore the relationship between diversity and accuracy. Experiments are carried out with some heterogeneous datasets derived from real-world benchmark datasets. The statistical analyses of the results show that BaggingDES performed significantly better than the baseline method — decision tree, and reasonably better than the classic Bagging.and accuracy. Experiments were carried out with some heterogeneous datasets derived from real-world benchmark datasets. The statistical analyses of the results show that BaggingDES performed significantly better than the baseline method - decision tree, and reasonably better than the classic Bagging.

Item Type: Book Section
Uncontrolled Keywords: bagging,classification algorithms,clustering algorithms,data integration,training,simulated annealing
Faculty \ School: Faculty of Science > School of Computing Sciences

UEA Research Groups: Faculty of Science > Research Groups > Data Science and Statistics
Depositing User: Pure Connector
Date Deposited: 26 Oct 2016 10:01
Last Modified: 21 Oct 2022 07:30
URI: https://ueaeprints.uea.ac.uk/id/eprint/61062
DOI: 10.1109/WCICA.2016.7578244

Actions (login required)

View Item View Item