Farrash, Majed and Wang, Wenjia (2015) An Algorithm for Identifying the Learning Patterns in Big Data. In: IEEE Conference on Big Data, 2015-08-20 - 2015-08-22.
Full text not available from this repository. (Request a copy)Abstract
Divide-and-Conquer is probably the most commonly used strategy to deal with a big data that is too big to be loaded into any computing systems memory as a whole for analysis. It partitions such a big dataset into many smaller subsets that can be loaded into computer memory separately to induce models, which can be combined by machine learning ensemble methods. However, it is not clear that how the size of subsets may affect the learning performance of individual models and their ensemble. This paper proposes an ensemble based algorithm to quickly detect their relational patterns in terms of ensemble accuracy and the size of partitioned data subset. An ensemble framework of the algorithm is implemented and tested on 12 relatively big benchmark datasets. The experimental results indicate that it is able to identify the relation patterns accurately and efficiently in less than 10 steps. The identified patterns show that in most cases it is not necessary to use the whole big dataset for analysis as few smaller subsets are already sufficiently representative of the underlying problem, which is obviously a useful knowledge in big data analysis.
Item Type: | Conference or Workshop Item (Paper) |
---|---|
Uncontrolled Keywords: | big data,machine learning ,data mining,ensemble methods |
Faculty \ School: | Faculty of Science > School of Computing Sciences Faculty of Science |
UEA Research Groups: | Faculty of Science > Research Groups > Data Science and Statistics |
Depositing User: | Pure Connector |
Date Deposited: | 28 Sep 2015 14:50 |
Last Modified: | 21 Oct 2022 23:41 |
URI: | https://ueaeprints.uea.ac.uk/id/eprint/54245 |
DOI: |
Actions (login required)
View Item |