Clustering ensemble method

Alqurashi, Tahani

Clustering ensemble method

Tools

Alqurashi, Tahani (2017) Clustering ensemble method. Doctoral thesis, University of East Anglia.

[thumbnail of Alqurashi_Tahani_Final.pdf]

Preview

PDF
Download (18MB) | Preview

Abstract

Clustering is an unsupervised learning paradigm that partitions a given dataset into
clusters so that objects in the same cluster are more similar to each other than to the
objects in the other clusters. However, when clustering algorithms are used individually,
their results are often inconsistent and unreliable. This research applies the
philosophy of Ensemble learning that combines multiple partitions using a consensus
function in order to address these issues to improve a clustering performance.
A clustering ensemble framework is presented consisting of three phases: Ensemble
Member Generation, Consensus and Evaluation. This research focuses on
two points: the consensus function and ensemble diversity. For the first, we proposed
three new consensus functions: the Object-Neighbourhood Clustering Ensemble
(ONCE), the Dual-Similarity Clustering Ensemble (DSCE), and the Adaptive
Clustering Ensemble (ACE). ONCE takes into account the neighbourhood relationship
between object pairs in the similarity matrix, while DSCE and ACE are based
on two similarity measures: cluster similarity and membership similarity.
The proposed ensemble methods were tested on benchmark real-world and artificial
datasets. The results demonstrated that ONCE outperforms the other similar
methods, and is more consistent and reliable than k-means. Furthermore, DSCE
and ACE were compared to the ONCE, CO, MCLA and DICLENS clustering ensemble
methods. The results demonstrated that on average ACE outperforms the
state-of-the-art clustering ensemble methods, which are CO, MCLA and DICLENS.
On diversity, we experimentally investigated all the existing measures for determining
their relationship with the ensemble quality. The results indicate that none of them are capable of discovering a clear relationship and the reasons for this are:
(1) they all are inappropriately defined to measure the useful difference between the
members, and (2) none of them have been used directly by any consensus function.
Therefore, we point out that these two issues need to be addressed in future research.

Item Type:	Thesis (Doctoral)
Faculty \ School:	Faculty of Science > School of Computing Sciences
Depositing User:	Users 4971 not found.
Date Deposited:	22 Feb 2017 11:15
Last Modified:	22 Feb 2017 11:15
URI:	https://ueaeprints.uea.ac.uk/id/eprint/62679
DOI:

Downloads

Downloads per month over past year

Actions (login required)

View Item