Multiple Imputation Ensembles (MIE) for dealing with missing data

Aleryani, Aliya; Wang, Wenjia; De La Iglesia, Beatriz

doi:10.1007/s42979-020-00131-0

Multiple Imputation Ensembles (MIE) for dealing with missing data

Tools

Aleryani, Aliya, Wang, Wenjia and De La Iglesia, Beatriz ORCID: https://orcid.org/0000-0003-2675-5826 (2020) Multiple Imputation Ensembles (MIE) for dealing with missing data. SN Computer Science, 1 (3). ISSN 2661-8907

Preview	PDF (Published_Version) - Published Version Available under License Creative Commons Attribution. Download (2MB) \| Preview
Preview	PDF (MultipleImputationEnsemblesMIE) - Accepted Version Download (469kB) \| Preview

Abstract

Missing data is a significant issue in many real-world datasets, yet there are no robust methods for dealing with it appropriately. In this paper, we propose a robust approach to dealing with missing data in classification problems: Multiple Imputation Ensembles (MIE). Our method integrates two approaches: multiple imputation and ensemble methods and compares two types of ensembles: bagging and stacking. We also propose a robust experimental set-up using 20 benchmark datasets from the UCI machine learning repository. For each dataset, we introduce increasing amounts of data Missing Completely at Random. Firstly, we use a number of single/multiple imputation methods to recover the missing values and then ensemble a number of different classifiers built on the imputed data. We assess the quality of the imputation by using dissimilarity measures. We also evaluate the MIE performance by comparing classification accuracy on the complete and imputed data. Furthermore, we use the accuracy of simple imputation as a benchmark for comparison. We find that our proposed approach combining multiple imputation with ensemble techniques outperform others, particularly as missing data increases.

Item Type:	Article
Additional Information:	Funding Information: We acknowledge support from Grant Number ES/L011859/1, from The Business and Local Government Data Research Centre, funded by the Economic and Social Research Council to provide economic, scientific and social researchers and business analysts with secure data services.
Uncontrolled Keywords:	classification algorithms,dissimilarity measures,ensemble techniques,missing data,multiple imputation,computational theory and mathematics,computer networks and communications,computer science applications,general computer science,artificial intelligence,computer graphics and computer-aided design ,/dk/atira/pure/subjectarea/asjc/1700/1703
Faculty \ School:	Faculty of Science > School of Computing Sciences
UEA Research Groups:	Faculty of Medicine and Health Sciences > Research Centres > Business and Local Government Data Research Centre (former - to 2023) Faculty of Science > Research Groups > Data Science and AI Faculty of Medicine and Health Sciences > Research Centres > Norwich Institute for Healthy Aging Faculty of Science > Research Groups > Norwich Epidemiology Centre Faculty of Medicine and Health Sciences > Research Groups > Norwich Epidemiology Centre Faculty of Science > Research Groups > Health Computing
Related URLs:	https://www.scopus.com/pages/publication...
Depositing User:	LivePure Connector
Date Deposited:	01 May 2020 00:06
Last Modified:	18 Jun 2026 18:32
URI:	https://ueaeprints.uea.ac.uk/id/eprint/74916
DOI:	10.1007/s42979-020-00131-0

Downloads

Downloads per month over past year

Actions (login required)

View Item