Classifying dangerous species of mosquito using machine learning

Flynn, Michael (2022) Classifying dangerous species of mosquito using machine learning. Doctoral thesis, University of East Anglia.

[thumbnail of 2022FlynnMPhD.pdf]
Download (7MB) | Preview


This thesis begins by presenting the performance of modern Time Series Classification (TSC) approaches, including HIVE-COTEv2 & InceptionTime, on 4 new insect wingbeat datasets. The experiments throughout this thesis endeavour to explore whether it is possible to classify flying insects into their respective species and into group based on their sex. Furthermore, it is hypothesised that a hierarchical approach to classifying flying insects is possible via filtering “easy” cases using cheap to obtain features, reducing the number of times processing intensive approaches are utilised. Experiments are undertaken on 3 representations of the data: Harmonic Spectral Product (HSP), the raw data and spectral data. HSP is a method of extracting the fundamental frequency of a signal. It represents a logical benchmark for comparison and, is easy and quick to extract. In one dataset, InsectSounds, species are separated into sex. Evaluation of the results achieved with the HSP representation showed that despite a relatively poor overall accuracy this feature produces a low type II error with respect to female mosquitoes. It is shown that classes of mosquitoes species that are female were more likely to be miss-classified as other female mosquito classes and, where fly classes are miss-classified as mosquito classes, they are typically classified as male mosquitoes. Previous work had shown that transformation into the frequency domain has a positive effect on performance. Audio data is typically recorded at a high sample rate, which results in high spectral resolution. As a result, approaches from the literature have used truncation of high and low frequency data to reduce runtime. It is hypothesised that inclusion of low frequency data will aid classification. This is because low frequency data is likely caused by the body of the mosquito and morphological differences, such as size, are strongly correlated to sex. The results show that the performance of all approaches was improved by the use of spectral data. The results also showed that spectral data that included low frequency information resulted in a higher overall accuracy than transformations that discarded it.

Formative experiments showed that HIVE-COTEv1 was the most accurate approach at classifying flying insects. HIVE-COTEv1 is a heterogeneous approach that consists of 4 modules, Random Interval Spectral Ensemble (RISE), Bag Of SFA Symbols (BOSS), Shapelet Transform Classifier (STC) and Time Series Forest (TSF). The predictive power of these modules are combined via Cross-validation Accuracy Weighted Probabilistic Ensemble (CAWPE). The RISE approach was chosen as the spectral component as it was “best in class” at the inception of HIVE-COTEv1. It is suggested that a significant improvement to the usability and accuracy of RISE, would translate as an improvement in the performance of HIVE-COTEv1. The introduction of contracting provided a method through witch the training time of RISE could be effectively controlled, improving its usability. A review of the interval selection procedure led to improvements that had a significant positive effect on accuracy. A review of spectral transforms and the method of combining them led to a further improvement to accuracy, and an architecture in which multiple transformations are applied.

In order for smart traps to be effective they are required to work for extended periods in rural locations. Implementations of hierarchical approaches show that two expert features, HSP and time of flight (TOF) are effective in reducing test time and therefore the amount of processing required. This is achieved via first classifying the test case using simple approaches, such as BayesNet, and only if the confidence in the prediction does not meet a parameterised threshold using a more powerful approach. In an evaluation of several methods of combination, the most efficient of these is shown to increase classification accuracy by 0.6%, increase the TPR of female mosquitoes by 48/10,000, decrease the FNR of female mosquitoes by 83/15,000 and reduce test time by 1.5 hours over 25,000 instances, when compared to the single best approach InceptionTime. Furthermore, a cumulative approach to combining the expert features with the InceptionTime approach resulted in a 4.14% increase in accuracy, an increase in the TPR of female mosquitoes of 139/10,000 and a decrease in the FNR of female
mosquitoes of 45/15,000.

Item Type: Thesis (Doctoral)
Faculty \ School: Faculty of Science > School of Computing Sciences
Depositing User: Chris White
Date Deposited: 09 Aug 2022 11:12
Last Modified: 09 Aug 2022 11:12

Actions (login required)

View Item View Item