A Comparison of Two Document Clustering Approaches for Clustering Medical Documents

Saad, F.H., de la Iglesia, B. ORCID: https://orcid.org/0000-0003-2675-5826 and Bell, G. D. (2006) A Comparison of Two Document Clustering Approaches for Clustering Medical Documents. In: 2006 International Conference on Data Mining, 2006-06-26 - 2006-06-29.

Full text not available from this repository. (Request a copy)


Medical data is often presented as free text in the form of medical reports. Such documents contain important information about patients, disease progression and management, but are difficult to analyse with conventional data mining techniques due to their unstructured nature. Clustering the medical documents into small number of meaningful clusters may facilitate discovering patterns by allowing us to extract a number of relevant features from each cluster, thus introducing structure into the data and facilitating the application of conventional data mining techniques. For this approach to work, it is essential to produce high-quality clustering. Thus, the main goals of this paper are (1) to experimentally evaluate the performance of six criterion functions in the context of partitional clustering approach, (2) to compare the clustering results of agglomerative approach and partitional approach for each of the criterion functions using real-world medical documents, and (3) to establish the right clustering algorithm to produce high quality clustering of real-world medical documents in order to discover hidden knowledge by analyzing the produced clusters. Our experimental results show that the clustering solutions produced by the agglomerative approach are consistently better than those produced by the partitional approach for all the criterion functions. Moreover, the results show that different criterion functions lead to substantially different results. In addition, we examine the quality of the features produced for each cluster for a classification task. The task involves discriminating between successful and unsuccessful procedures. The features extracted are used to produce an accurate classification of the data.

Item Type: Conference or Workshop Item (Paper)
Faculty \ School: Faculty of Science > School of Computing Sciences
UEA Research Groups: Faculty of Medicine and Health Sciences > Research Centres > Business and Local Government Data Research Centre
Faculty of Science > Research Groups > Data Science and Statistics
Depositing User: Vishal Gautam
Date Deposited: 14 Jun 2011 08:03
Last Modified: 16 Feb 2023 10:31
URI: https://ueaeprints.uea.ac.uk/id/eprint/22768

Actions (login required)

View Item View Item