Debuse, Justin C. W. and Rayward-Smith, Victor J. (1997) Feature subset selection within a simulated annealing data mining algorithm. Journal of Intelligent Information Systems, 9 (1). pp. 57-81. ISSN 0925-9902
Full text not available from this repository. (Request a copy)Abstract
An overview of the principle feature subset selection methods isgiven. We investigate a number of measures of feature subset quality, usinglarge commercial databases. We develop an entropic measure, based upon theinformation gain approach used within ID3 and C4.5 to build trees, which isshown to give the best performance over our databases. This measure is usedwithin a simple feature subset selection algorithm and the technique is usedto generate subsets of high quality features from the databases. A simulatedannealing based data mining technique is presented and applied to thedatabases. The performance using all features is compared to that achievedusing the subset selected by our algorithm. We show that a substantialreduction in the number of features may be achieved together with animprovement in the performance of our data mining system. We also present amodification of the data mining algorithm, which allows it to simultaneouslysearch for promising feature subsets and high quality rules. The effect ofvarying the generality level of the desired pattern is alsoinvestigated.
Item Type: | Article |
---|---|
Faculty \ School: | Faculty of Science > School of Computing Sciences |
Depositing User: | EPrints Services |
Date Deposited: | 01 Oct 2010 13:41 |
Last Modified: | 24 Sep 2024 10:31 |
URI: | https://ueaeprints.uea.ac.uk/id/eprint/3063 |
DOI: | 10.1023/A:1008641220268 |
Actions (login required)
View Item |