Reliability and Effectiveness of Cross-validation in Feature Selection

Aldehim, Ghadah and Wang, Wenjia (2014) Reliability and Effectiveness of Cross-validation in Feature Selection. In: Thirty-fourth SGAI International Conference on Artificial Intelligence, 2014-12-09 - 2014-12-11, Peterhouse College.

Full text not available from this repository. (Request a copy)


Feature selection (FS) is increasingly important in data analysis and machine learning in the big data era. However, how to use the data in feature selection has become a serious issue as the conventional practice of using ALL the data in FS may lead to selection bias and some suggest to use PART of the data instead. This paper investigates the reliability and effectiveness of a PART approach implemented by cross validation mechanism in feature selection filters and compares it with the ALL approach. The reliability is measured by an Inter-system Average Tanimoto Index and the effectiveness of the selected features is measured by the mean generalisation accuracy of classification. The experimnts are carried out by using synthetic datasets generated with a fixed number of relevant features and varied numbers of irrelevant features and instances, and different level of noise, to mimic some possible real world environments. The results indicate that the PART approach is more effective in reducing the bias when the dataset is small but starts to lose its advantage as the dataset size increases.

Item Type: Conference or Workshop Item (Paper)
Uncontrolled Keywords: feature selection,cross-validation,filters,reliability measure
Faculty \ School: Faculty of Science > School of Computing Sciences

UEA Research Groups: Faculty of Science > Research Groups > Data Science and Statistics
Depositing User: Pure Connector
Date Deposited: 08 Sep 2014 12:48
Last Modified: 07 Jul 2021 23:42

Actions (login required)

View Item View Item