Bagnall, Anthony J. and Janacek, Gareth J. (2005) Clustering time series with clipped data. Machine Learning, 58 (2). pp. 151-178. ISSN 0885-6125
Full text not available from this repository. (Request a copy)Abstract
Clustering time series is a problem that has applications in a wide variety of fields, and has recently attracted a large amount of research. Time series data are often large and may contain outliers. We show that the simple procedure of clipping the time series (discretising to above or below the median) reduces memory requirements and significantly speeds up clustering without decreasing clustering accuracy. We also demonstrate that clipping increases clustering accuracy when there are outliers in the data, thus serving as a means of outlier detection and a method of identifying model misspecification. We consider simulated data from polynomial, autoregressive moving average and hidden Markov models and show that the estimated parameters of the clipped data used in clustering tend, asymptotically, to those of the unclipped data. We also demonstrate experimentally that, if the series are long enough, the accuracy on clipped data is not significantly less than the accuracy on unclipped data, and if the series contain outliers then clipping results in significantly better clusterings. We then illustrate how using clipped series can be of practical benefit in detecting model misspecification and outliers on two real world data sets: an electricity generation bid data set and an ECG data set.
Item Type: | Article |
---|---|
Faculty \ School: | Faculty of Science > School of Computing Sciences |
UEA Research Groups: | Faculty of Science > Research Groups > Data Science and Statistics |
Depositing User: | Vishal Gautam |
Date Deposited: | 07 Mar 2011 15:01 |
Last Modified: | 06 Aug 2023 00:22 |
URI: | https://ueaeprints.uea.ac.uk/id/eprint/22353 |
DOI: | 10.1007/s10994-005-5825-6 |
Actions (login required)
View Item |