Applying Clustering Analysis to Heterogeneous Data Using Similarity Matrix Fusion (SMF)

Mojahed, Aalaa, Bettencourt-Silva, Joao H., Wang, Wenjia and de la Iglesia, Beatriz ORCID: https://orcid.org/0000-0003-2675-5826 (2015) Applying Clustering Analysis to Heterogeneous Data Using Similarity Matrix Fusion (SMF). In: Machine Learning and Data Mining in Pattern Recognition. Lecture Notes in Computer Science, 9166 . Springer, pp. 251-265. ISBN 978-3-319-21023-0

Full text not available from this repository. (Request a copy)

Abstract

We define a heterogeneous dataset as a set of complex objects, that is, those defined by several data types including structured data, images, free text or time series. We envisage this could be extensible to other data types. There are currently research gaps in how to deal with such complex data. In our previous work, we have proposed an intermediary fusion approach called SMF which produces a pairwise matrix of distances between heterogeneous objects by fusing the distances between the individual data types. More precisely, SMF aggregates partial distances that we compute separately from each data type, taking into consideration uncertainty. Consequently, a single fused distance matrix is produced that can be used to produce a clustering using a standard clustering algorithm. In this paper we extend the practical work by evaluating SMF using the k-means algorithm to cluster heterogeneous data. We used a dataset of prostate cancer patients where objects are described by two basic data types, namely: structured and time-series data. We assess the results of clustering using external validation on multiple possible classifications of our patients. The result shows that the SMF approach can improved the clustering configuration when compared with clustering on an individual data type.

Item Type: Book Section
Uncontrolled Keywords: heterogeneous data,big data,distance measure,intermediate data fusion,clustering,uncertainty,sdg 3 - good health and well-being ,/dk/atira/pure/sustainabledevelopmentgoals/good_health_and_well_being
Faculty \ School: Faculty of Science > School of Computing Sciences
UEA Research Groups: Faculty of Science > Research Groups > Data Science and Statistics
Faculty of Medicine and Health Sciences > Research Centres > Business and Local Government Data Research Centre (former - to 2023)
Faculty of Science > Research Groups > Norwich Epidemiology Centre
Faculty of Medicine and Health Sciences > Research Groups > Norwich Epidemiology Centre
Related URLs:
Depositing User: Pure Connector
Date Deposited: 28 Sep 2015 10:02
Last Modified: 19 Apr 2023 01:26
URI: https://ueaeprints.uea.ac.uk/id/eprint/54458
DOI: 10.1007/978-3-319-21024-7_17

Actions (login required)

View Item View Item