Dealing with Missing Data and Uncertainty in the Context of Data Mining

Aleryani, Aliya, Wang, Wenjia and De La Iglesia, Beatriz ORCID: https://orcid.org/0000-0003-2675-5826 (2018) Dealing with Missing Data and Uncertainty in the Context of Data Mining. In: Hybrid Artificial Intelligent Systems. Hybrid Artificial Intelligent Systems . Springer, pp. 289-301. ISBN 978-3-319-92638-4

[thumbnail of Accepted manuscript]
Preview
PDF (Accepted manuscript) - Accepted Version
Download (466kB) | Preview

Abstract

Missing data is an issue in many real-world datasets yet robust methods for dealing with missing data appropriately still need development. In this paper we conduct an investigation of how some methods for handling missing data perform when the uncertainty increases. Using benchmark datasets from the UCI Machine Learning repository we generate datasets for our experimentation with increasing amounts of data Missing Completely At Random (MCAR) both at the attribute level and at the record level. We then apply four classification algorithms: C4.5, Random Forest, Naïve Bayes and Support Vector Machines (SVMs). We measure the performance of each classifiers on the basis of complete case analysis, simple imputation and then we study the performance of the algorithms that can handle missing data. We find that complete case analysis has a detrimental effect because it renders many datasets infeasible when missing data increases, particularly for high dimensional data. We find that increasing missing data does have a negative effect on the performance of all the algorithms tested but the different algorithms tested either using preprocessing in the form of simple imputation or handling the missing data do not show a significant difference in performance.

Item Type: Book Section
Faculty \ School: Faculty of Science > School of Computing Sciences
UEA Research Groups: Faculty of Science > Research Groups > Data Science and Statistics
Faculty of Medicine and Health Sciences > Research Centres > Business and Local Government Data Research Centre (former - to 2023)
Faculty of Science > Research Groups > Norwich Epidemiology Centre
Faculty of Medicine and Health Sciences > Research Groups > Norwich Epidemiology Centre
Related URLs:
Depositing User: LivePure Connector
Date Deposited: 24 Jul 2018 14:30
Last Modified: 21 Apr 2023 01:44
URI: https://ueaeprints.uea.ac.uk/id/eprint/67844
DOI: 10.1007/978-3-319-92639-1_24

Downloads

Downloads per month over past year

Actions (login required)

View Item View Item