Book Section #60852

Alshaqsi, Jamil and Wang, Wenjia (2013) UNSPECIFIED In: UNSPECIFIED IEEE Press. ISBN 978-1-4673-4651-1

Full text not available from this repository. (Request a copy)


In cluster analysis, finding out the number of clusters, K, for a given dataset is an important yet very tricky task, simply because there is often no universally accepted correct or wrong answer for non-trivial real world problems and it also depends on the context and purpose of a cluster study. This paper presents a new hybrid method for estimating the predominant number of clusters automatically. It employs a new similarity measure and then calculates the length of constant similarity intervals, L and considers the longest consistent intervals representing the most probable numbers of the clusters under the set context. An error function is defined to measure and evaluate the goodness of estimations. The proposed method has been tested on 3 synthetic datasets and 8 real-world benchmark datasets, and compared with some other popular methods. The experimental results showed that the proposed method is able to determine the desired number of clusters for all the simulated datasets and most of the benchmark datasets, and the statistical tests indicate that our method is significantly better.

Item Type: Book Section
Uncontrolled Keywords: clustering,similarity measure,k-means clustering algorithm
Faculty \ School: Faculty of Science > School of Computing Sciences
University of East Anglia > Faculty of Science > Research Groups > Computational Biology (subgroups are shown below) > Machine learning in computational biology
?? RGMLS ??
Depositing User: Pure Connector
Date Deposited: 11 Oct 2016 13:00
Last Modified: 25 Jul 2018 01:09
DOI: 10.1109/ICMLA.2012.146

Actions (login required)

View Item