Consistent Algorithms For Clustering Time Series

The assumption that a given time series is stationary ergodic is one of the most general assumptions used in statistics; in particular, it allows for arbitrary long-range serial dependence, and sub- sumes most of the nonparametric as well as modelling assumptions used in the literature on clustering time series, such as i.i.d., (hidden) Markov, or mixing time series.

This allows us to define the following clustering objective: group a pair of time series into the same cluster if and only if the distribution that generates them is the same.

It is assumed that the data are generated by a mixture of κ different distributions that have a particular known form (such as Gaussian, Hidden Markov models, or graphical models). Thus, each of the N samples is independently generated according to one of these κ distributions (with some fixed probability). Since the model of the data is specified quite well, one can use likelihood-based distances (and then, for example, the k-means algorithm), or Bayesian inference, to cluster the data. Another typical objective is to estimate the parameters of the distributions in the mixture (e.g., Gaussians), rather than actually clustering the data points. Clearly, the main difference between this setting and ours is in that we do not assume any known model for the data; we do not even require independence between the samples.

http://www.jmlr.org/papers/volume17/khaleghi16a/khaleghi16a.pdf

Comments