Towards Feature Selection for Disk-Based Multirelational Learners: A Case Study with a Boosting Algorithm

Hoche, S. and Wrobel, S. (2003) Towards Feature Selection for Disk-Based Multirelational Learners: A Case Study with a Boosting Algorithm. In: 2nd Workshop on Multi-Relational Data Mining, 2003-08-24 - 2003-08-27.

Full text not available from this repository.

Abstract

Feature selection is an important issue for any learning algorithm, since reduced feature sets lead to an improvement in learning time, reduced model complexity and, in many cases, a reduced risk of overfitting. When performing feature selection for RAM-based learning algorithms, we typically assume that the cost of accessing each feature is uniform. In multirelational data mining, especially when data are to be held in a relational database management system (RDBMS), this is no longer the case. The dominant cost in such a setting is the scan of a relation, so that the cost of using a feature from a relation that needs to be scanned anyway is comparatively small, whereas adding a feature from a relation that has not been used before is high. This means that existing work on feature selection using the uniform cost assumption may not be applicable in a disk-based setting. In this paper, we report the results of a case study that extends prior work on multirelational feature selection, in particular, in the context of a boosting algorithm. As shown by our study, using the previously developed strategies on average leads to larger numbers of relations that need to be considered and loaded into memory, and thus higher cost in a disk-based setting. Instead, a simple relation-oriented strategy can be used to minimize cost of accessing additional relations. We describe experimental results to show how this basic strategy interacts with the feature selection variants proposed previously, and show that significant gains are made even in a main-memory setting.

Item Type: Conference or Workshop Item (Paper)
Faculty \ School: Faculty of Science > School of Computing Sciences
Depositing User: Vishal Gautam
Date Deposited: 23 Jul 2011 15:16
Last Modified: 07 Mar 2023 16:30
URI: https://ueaeprints.uea.ac.uk/id/eprint/21935
DOI:

Actions (login required)

View Item View Item