What influences the accuracy of decision tree ensembles?

Richards, Graeme and Wang, Wenjia (2012) What influences the accuracy of decision tree ensembles? Journal of Intelligent Information Systems, 39 (3). pp. 627-650. ISSN 0925-9902

Full text not available from this repository. (Request a copy)


An ensemble in machine learning is defined as a set of models (such as classifiers or predictors) that are induced individually from data by using one or more machine learning algorithms for a given task and then work collectively in the hope of generating improved decisions. In this paper we investigate the factors that influence ensemble performance, which mainly include accuracy of individual classifiers, diversity between classifiers, the number of classifiers in an ensemble and the decision fusion strategy. Among them, diversity is believed to be a key factor but more complex and difficult to be measured quantitatively, and it was thus chosen as the focus of this study, together with the relationships between the other factors. A technique was devised to build ensembles with decision trees that are induced with randomly selected features. Three sets of experiments were performed using 12 benchmark datasets, and the results indicate that (i) a high level of diversity indeed makes an ensemble more accurate and robust compared with individual models; (ii) small ensembles can produce results as good as, or better than, large ensembles provided the appropriate (e.g. more diverse) models are selected for the inclusion. This has implications that for scaling up to larger databases the increased efficiency of smaller ensembles becomes more significant and beneficial. As a test case study, ensembles are built based on these findings for a real world application—osteoporosis classification, and found that, in each case of three datasets used, the ensembles out-performed individual decision trees consistently and reliably.

Item Type: Article
Uncontrolled Keywords: machine learning ,data mining,ensemble,diversity,decision tree
Faculty \ School: Faculty of Science > School of Computing Sciences

UEA Research Groups: Faculty of Science > Research Groups > Data Science and Statistics
Depositing User: Pure Connector
Date Deposited: 19 Aug 2014 08:26
Last Modified: 21 Oct 2022 00:05
URI: https://ueaeprints.uea.ac.uk/id/eprint/49925
DOI: 10.1007/s10844-012-0206-7

Actions (login required)

View Item View Item