Multi-objective evolutionary algorithms for data clustering

Kirkland, Oliver (2014) Multi-objective evolutionary algorithms for data clustering. Doctoral thesis, University of East Anglia.

[thumbnail of 2014KirklandOAPhD.pdf]
Preview
PDF
Download (1MB) | Preview

Abstract

In this work we investigate the use of Multi-Objective metaheuristics for the data-mining task of clustering. We �first investigate methods of evaluating the quality of
clustering solutions, we then propose a new Multi-Objective clustering algorithm driven by multiple measures of cluster quality and then perform investigations into the performance of different Multi-Objective clustering algorithms.
In the context of clustering, a robust measure for evaluating clustering solutions is an important component of an algorithm. These Cluster Quality Measures (CQMs)
should rely solely on the structure of the clustering solution. A robust CQM should have three properties: it should be able to reward a \good" clustering solution; it
should decrease in value monotonically as the solution quality deteriorates and, it should be able to evaluate clustering solutions with varying numbers of clusters. We
review existing CQMs and present an experimental evaluation of their robustness. We find that measures based on connectivity are more robust than other measures
for cluster evaluation.
We then introduce a new Multi-Objective Clustering algorithm (MOCA). The use of Multi-Objective optimisation in clustering is desirable because it permits the
incorporation of multiple measures of cluster quality. Since the definition of what constitutes a good clustering is far from clear, it is beneficial to develop algorithms that allow for multiple CQMs to be accommodated. The selection of the clustering quality measures to use as objectives for MOCA is informed by our previous work with internal evaluation measures. We explain the implementation details and perform experimental work to establish its worth. We compare MOCA with k-means and find some promising results. We�find that MOCA can generate a pool of clustering solutions that is more likely to contain the optimal clustering solution than the pool of solutions generated by k-means.
We also perform an investigation into the performance of different implementations of MOEA algorithms for clustering. We�find that representations of clustering
based around centroids and medoids produce more desirable clustering solutions and Pareto fronts. We also �find that mutation operators that greatly disrupt the
clustering solutions lead to better exploration of the Pareto front whereas mutation operators that modify the clustering solutions in a more moderate way lead to higher quality clustering solutions.
We then perform more specific investigations into the performance of mutation operators focussing on operators that promote clustering solution quality, exploration of the Pareto front and a hybrid combination. We use a number of techniques to assess the performance of the mutation operators as the algorithms execute. We
confirm that a disruptive mutation operator leads to better exploration of the Pareto front and mutation operators that modify the clustering solutions lead to the discovery of higher quality clustering solutions. We find that our implementation of a hybrid mutation operator does not lead to a good improvement with respect to the other mutation operators but does show promise for future work.

Item Type: Thesis (Doctoral)
Faculty \ School: Faculty of Science > School of Computing Sciences
Depositing User: Users 2593 not found.
Date Deposited: 26 Nov 2014 17:01
Last Modified: 26 Nov 2014 17:01
URI: https://ueaeprints.uea.ac.uk/id/eprint/51331
DOI:

Actions (login required)

View Item View Item