Bagnall, A. J. and Janacek, G. J. (2004) Clustering time series from ARMA models with clipped data. In: 10thACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2004-08-22 - 2004-08-25.
Full text not available from this repository. (Request a copy)Abstract
Clustering time series is a problem that has applications in a wide variety of fields, and has recently attracted a large amount of research. In this paper we focus on clustering data derived from Autoregressive Moving Average (ARMA) models using k-means and k-medoids algorithms with the Euclidean distance between estimated model parameters. We justify our choice of clustering technique and distance metric by reproducing results obtained in related research. Our research aim is to assess the affects of discretising data into binary sequences of above and below the median, a process known as clipping, on the clustering of time series. It is known that the fitted AR parameters of clipped data tend asymptotically to the parameters for unclipped data. We exploit this result to demonstrate that for long series the clustering accuracy when using clipped data from the class of ARMA models is not significantly different to that achieved with unclipped data. Next we show that if the data contains outliers then using clipped data produces significantly better clusterings. We then demonstrate that using clipped series requires much less memory and operations such as distance calculations can be much faster. Finally, we demonstrate these advantages on three real world data sets.
Item Type: | Conference or Workshop Item (Paper) |
---|---|
Faculty \ School: | Faculty of Science > School of Computing Sciences |
UEA Research Groups: | Faculty of Science > Research Groups > Data Science and Statistics |
Depositing User: | Vishal Gautam |
Date Deposited: | 14 Jun 2011 15:48 |
Last Modified: | 19 Mar 2024 12:30 |
URI: | https://ueaeprints.uea.ac.uk/id/eprint/22354 |
DOI: | 10.1145/1014052.1014061 |
Actions (login required)
View Item |