Clustering ensemble method

Alqurashi, Tahani (2017) Clustering ensemble method. Doctoral thesis, University of East Anglia.

[img]
Preview
PDF
Download (17MB) | Preview

    Abstract

    Clustering is an unsupervised learning paradigm that partitions a given dataset into
    clusters so that objects in the same cluster are more similar to each other than to the
    objects in the other clusters. However, when clustering algorithms are used individually,
    their results are often inconsistent and unreliable. This research applies the
    philosophy of Ensemble learning that combines multiple partitions using a consensus
    function in order to address these issues to improve a clustering performance.
    A clustering ensemble framework is presented consisting of three phases: Ensemble
    Member Generation, Consensus and Evaluation. This research focuses on
    two points: the consensus function and ensemble diversity. For the first, we proposed
    three new consensus functions: the Object-Neighbourhood Clustering Ensemble
    (ONCE), the Dual-Similarity Clustering Ensemble (DSCE), and the Adaptive
    Clustering Ensemble (ACE). ONCE takes into account the neighbourhood relationship
    between object pairs in the similarity matrix, while DSCE and ACE are based
    on two similarity measures: cluster similarity and membership similarity.
    The proposed ensemble methods were tested on benchmark real-world and artificial
    datasets. The results demonstrated that ONCE outperforms the other similar
    methods, and is more consistent and reliable than k-means. Furthermore, DSCE
    and ACE were compared to the ONCE, CO, MCLA and DICLENS clustering ensemble
    methods. The results demonstrated that on average ACE outperforms the
    state-of-the-art clustering ensemble methods, which are CO, MCLA and DICLENS.
    On diversity, we experimentally investigated all the existing measures for determining
    their relationship with the ensemble quality. The results indicate that none of them are capable of discovering a clear relationship and the reasons for this are:
    (1) they all are inappropriately defined to measure the useful difference between the
    members, and (2) none of them have been used directly by any consensus function.
    Therefore, we point out that these two issues need to be addressed in future research.

    Item Type: Thesis (Doctoral)
    Faculty \ School: Faculty of Science > School of Computing Sciences
    Depositing User: Katie Miller
    Date Deposited: 22 Feb 2017 11:15
    Last Modified: 22 Feb 2017 11:15
    URI: https://ueaeprints.uea.ac.uk/id/eprint/62679
    DOI:

    Actions (login required)

    View Item