An Algorithm for Identifying the Learning Patterns in Big Data

Farrash, Majed and Wang, Wenjia (2015) An Algorithm for Identifying the Learning Patterns in Big Data. In: IEEE Conference on Big Data, 2015-08-20 - 2015-08-22.

Full text not available from this repository. (Request a copy)


Divide-and-Conquer is probably the most commonly used strategy to deal with a big data that is too big to be loaded into any computing systems memory as a whole for analysis. It partitions such a big dataset into many smaller subsets that can be loaded into computer memory separately to induce models, which can be combined by machine learning ensemble methods. However, it is not clear that how the size of subsets may affect the learning performance of individual models and their ensemble. This paper proposes an ensemble based algorithm to quickly detect their relational patterns in terms of ensemble accuracy and the size of partitioned data subset. An ensemble framework of the algorithm is implemented and tested on 12 relatively big benchmark datasets. The experimental results indicate that it is able to identify the relation patterns accurately and efficiently in less than 10 steps. The identified patterns show that in most cases it is not necessary to use the whole big dataset for analysis as few smaller subsets are already sufficiently representative of the underlying problem, which is obviously a useful knowledge in big data analysis.

Item Type: Conference or Workshop Item (Paper)
Uncontrolled Keywords: big data,machine learning ,data mining,ensemble methods
Faculty \ School: Faculty of Science > School of Computing Sciences

Faculty of Science
UEA Research Groups: Faculty of Science > Research Groups > Data Science and Statistics
Depositing User: Pure Connector
Date Deposited: 28 Sep 2015 14:50
Last Modified: 21 Oct 2022 23:41

Actions (login required)

View Item View Item