Mining time-series data using discriminative subsequences

Hills, Jonathan F. F. (2014) Mining time-series data using discriminative subsequences. Doctoral thesis, University of East Anglia.

[thumbnail of J_Hills_Thesis_Approved.pdf]
Preview
PDF
Download (5MB) | Preview

Abstract

Time-series data is abundant, and must be analysed to extract usable knowledge. Local-shape-based methods offer improved performance for many problems, and a
comprehensible method of understanding both data and models.
For time-series classification, we transform the data into a local-shape space using a shapelet transform. A shapelet is a time-series subsequence that is discriminative
of the class of the original series. We use a heterogeneous ensemble classifier on the transformed data. The accuracy of our method is significantly better than the time-series classification benchmark (1-nearest-neighbour with dynamic time-warping distance), and significantly better than the previous best shapelet-based classifiers.
We use two methods to increase interpretability: First, we cluster the shapelets using a novel, parameterless clustering method based on Minimum Description Length,
reducing dimensionality and removing duplicate shapelets. Second, we transform the shapelet data into binary data reflecting the presence or absence of particular
shapelets, a representation that is straightforward to interpret and understand.
We supplement the ensemble classifier with partial classifocation. We generate rule sets on the binary-shapelet data, improving performance on certain classes, and revealing the relationship between the shapelets and the class label. To aid interpretability, we use a novel algorithm, BruteSuppression, that can substantially reduce
the size of a rule set without negatively affecting performance, leading to a more compact, comprehensible model.
Finally, we propose three novel algorithms for unsupervised mining of approximately repeated patterns in time-series data, testing their performance in terms of
speed and accuracy on synthetic data, and on a real-world electricity-consumption device-disambiguation problem. We show that individual devices can be found automatically
and in an unsupervised manner using a local-shape-based approach.

Item Type: Thesis (Doctoral)
Faculty \ School: Faculty of Science > School of Computing Sciences
Depositing User: Jackie Webb
Date Deposited: 26 Jun 2015 13:17
Last Modified: 26 Jun 2015 13:17
URI: https://ueaeprints.uea.ac.uk/id/eprint/53397
DOI:

Downloads

Downloads per month over past year

Actions (login required)

View Item View Item