Detection of nuisance call centres using improved hybrid time series classification algorithms

Middlehurst, Matthew (2023) Detection of nuisance call centres using improved hybrid time series classification algorithms. Doctoral thesis, University of East Anglia.

[thumbnail of MM 230525 Final PhD Thesis.pdf] PDF
Restricted to Repository staff only until 30 June 2026.

Request a copy

Abstract

Nuisance calling has become a major problem in the last couple of decades with the advent of digital VoIP and increasing automation. Millions of these nuisance calls are made every month, wasting the time of those who receive them or aiming to scam vulnerable consumers. To take action against call centres making these calls, reliably detecting that a Calling Line Identification (CLI) number is carrying nuisance traffic and identifying the source using the CLI is key. This must be done carefully, however. Phone calls and call records contain sensitive information, and consumer privacy must be taken into account and protected.

Time Series Classification (TSC) is a field of research focusing on classifying data in the format of ordered series. We further develop and investigate the use of these algorithms for detecting nuisance call centres. A high accuracy TSC algorithm, HIVE-COTE, is a hybrid heterogenous ensemble of TSC approaches, with each ensemble member extracting a different type of feature from a time series. While HIVE-COTE has shown to generally perform well on the UCR archive of time series datasets, it is slow and many algorithms have caught up to its performance as the field has developed.

In this thesis, we answer two research questions. The first question is whether we can detect nuisance traffic from call centres with minimal information using time series data and TSC algorithms. The second is whether we can improve HIVE-COTE through updating the members of the ensemble which have fallen behind in performance and scalability.

We investigate improvements to HIVE-COTE through introducing more accurate and scalable alternatives to its phase-independent interval and bag-of-words dictionary based constituent classifiers, and altering the way it generates accuracy estimates. This results in the HIVE-COTE 2.0 (HC2) algorithm. We demonstrate the HC2 is much more usable than the original HIVE-COTE for a wide variety of datasets, and once again performs as a state-of-the-art classifier in terms of accuracy on the UCR and UEA time series archives.

With our improved TSC algorithms, we classify time series data from nuisance and legitimate call centres. Given a CLI number, we format the call volume and call duration of calls made from a CLI into a time series. This approach requires no identifying information from any consumer or call centre using the CLI, while providing a profile of how calls are being made by that number. We show that our algorithms can perfectly separate legitimate and nuisance call centres using data provided by British Telecom (BT). An investigation into classifying time series with as few calls as possible shows it is still possible to achieve 100% accuracy with only a portion of the day’s traffic, allowing action to be taken in real-time.

Item Type: Thesis (Doctoral)
Faculty \ School: Faculty of Science > School of Computing Sciences
Depositing User: Chris White
Date Deposited: 02 Aug 2023 13:08
Last Modified: 02 Aug 2023 13:08
URI: https://ueaeprints.uea.ac.uk/id/eprint/92762
DOI:

Actions (login required)

View Item View Item