Treating class imbalance in non-technical loss detection: An exploratory analysis of a real dataset

Ghori, Khawaja Moyeezullah, Awais, Muhammad ORCID: https://orcid.org/0000-0001-6421-9245, Khattak, Akmal Saeed, Imran, Muhammad, Fazal-E-Amin and Szathmary, Laszlo (2021) Treating class imbalance in non-technical loss detection: An exploratory analysis of a real dataset. IEEE Access, 9. pp. 98928-98938. ISSN 2169-3536

[thumbnail of Treating_Class_Imbalance_in_Non-Technical_Loss_Detection_An_Exploratory_Analysis_of_a_Real_Dataset]
Preview
PDF (Treating_Class_Imbalance_in_Non-Technical_Loss_Detection_An_Exploratory_Analysis_of_a_Real_Dataset) - Published Version
Available under License Creative Commons Attribution.

Download (1MB) | Preview

Abstract

Non-Technical Loss (NTL) is a significant concern for many electric supply companies due to the financial impact caused as a result of suspect consumption activities. A range of machine learning classifiers have been tested across multiple synthesized and real datasets to combat NTL. An important characteristic that exists in these datasets is the imbalance distribution of the classes. When the focus is on predicting the minority class of suspect activities, the classifiers’ sensitivity to the class imbalance becomes more important. In this paper, we evaluate the performance of a range of classifiers with under-sampling and over-sampling techniques. The results are compared with the untreated imbalanced dataset. In addition, we compare the performance of the classifiers using penalized classification model. Lastly, the paper presents an exploratory analysis of using different sampling techniques on NTL detection in a real dataset and identify the best performing classifiers. We conclude that logistic regression is the most sensitive to the sampling techniques as the change of its recall is measured around 50% for all sampling techniques. While the random forest is the least sensitive to the sampling technique, the difference in its precision is observed between 1% – 6% for all sampling techniques.

Item Type: Article
Additional Information: Funding Information: This work was supported in part by the European Union and the European Social Fund under Project EFOP-3.6.1-16-2016-00022, and in part by the Deanship of Scientific Research at King Saud University through Research Group under Project RG-1441-490.
Uncontrolled Keywords: class imbalance,cost-sensitive learning,non-technical loss detection,over-sampling,sampling techniques,under-sampling,computer science(all),materials science(all),engineering(all) ,/dk/atira/pure/subjectarea/asjc/1700
Faculty \ School: Faculty of Science > School of Computing Sciences
UEA Research Groups: Faculty of Science > Research Groups > Data Science and AI
Related URLs:
Depositing User: LivePure Connector
Date Deposited: 17 Oct 2023 00:44
Last Modified: 15 Dec 2024 01:27
URI: https://ueaeprints.uea.ac.uk/id/eprint/93306
DOI: 10.1109/access.2021.3095145

Downloads

Downloads per month over past year

Actions (login required)

View Item View Item