Effect of Document Representation on the Performance of Medical Document Classification

Saad, F. H., de la Iglesia, B. and Bell, G. D. (2006) Effect of Document Representation on the Performance of Medical Document Classification. In: Proceedings of the 2006 International Conference on Data Mining (DMIN-06), 2006-06-26 - 2006-06-29.

Full text not available from this repository. (Request a copy)


Text classification in the medical domain is a real world problem with wide applicability. This paper investigates extensively the effect of text representation approaches on the performance of medical document classification. To accomplish this objective, we evaluated seven different approaches to represent real word medical documents. The text representation approaches investigated in this paper are basic word representation (bag-of-words), key-phrases, collocation extracted from preprocessed text, collocation extracted from postprocessed text, single-word-nouns, combination of singleword-noun and adjectives and combination of single-wordnoun, adjective and verbs. A set of experiments was conducted to make comprehensive evaluation of the effects of these representation approaches using real world medical documents by measuring the classification performance. We measured classification performance with information retrieval metrics; precision, recall, F-measure and accuracy. Our experimental results show that bag-ofwords representation outperforms all other representation approaches. In addition, careful use of selected features improve the classification performance.

Item Type: Conference or Workshop Item (Paper)
Faculty \ School: Faculty of Science > School of Computing Sciences
Related URLs:
Depositing User: Vishal Gautam
Date Deposited: 14 Jun 2011 08:00
Last Modified: 12 Jul 2021 23:39
URI: https://ueaeprints.uea.ac.uk/id/eprint/22767

Actions (login required)

View Item View Item