A Graph based Methodology for Web Structure Mining:with a Case Study on the Webs of UK Universities

Alqurashi, Tahani and Wang, Wenjia (2014) A Graph based Methodology for Web Structure Mining:with a Case Study on the Webs of UK Universities. In: International Conference on Web Intelligence, Mining and Semantics, 2014-06-02 - 2014-06-05.

Full text not available from this repository. (Request a copy)

Abstract

Web structure mining is to extract knowledge from the hyperlink structure data of world wide webs for improving web design for clear content presentation and easy navigation. This paper presents a graph-based methodology for web structure mining. The structure of a website is firstly mapped onto a graph with its nodes representing web pages and links representing hyperlinks between pages and other websites. Then the characteristics of the web graph, such as, the degree of each node, density, connectivity, the closeness centralisation, and the node clusters, can be analysed quantitatively. The methodology is tested on the web structural data collected from 110 UK’s university websites. After cleansing and pre-processing the data, the graphs were constructed and analysed to obtain the aforementioned properties for each web and other useful information, such as page size and the length of the optimal path as they both affect the navigability. Based on the evaluation of the properties, some guidelines and criteria are devised for quantifying the structural quality of the webs into five categories from very poor to very good. The average degree and the percentage of strongly connected component (SCC) pages together with the average distance were found to be the most important properties in determining the structural quality of a web.

Item Type: Conference or Workshop Item (Paper)
Uncontrolled Keywords: web mining,graph theory,web structure
Faculty \ School: Faculty of Science > School of Computing Sciences


Faculty of Science
UEA Research Groups: Faculty of Science > Research Groups > Data Science and Statistics
Depositing User: Pure Connector
Date Deposited: 08 Sep 2014 12:46
Last Modified: 20 Oct 2022 23:41
URI: https://ueaeprints.uea.ac.uk/id/eprint/49933
DOI:

Actions (login required)

View Item View Item