Bounty: 50
I have been looking at a lot of recent changepoint detection algorithms ( *-PELT, NEWMA, …) but it seems they all work on a single (or multiple) variable(s) that are composed of each a single value for each date.
My problem is a bit different as I have a variable amount of values per "date" (could even be represented using CDF or KDE) and I’d like to detect changes in behavior of those values. (For example changes in mean, standard deviation, shape, etc).
So instead of having series of single values, for example:
x0 = 0.1
x1 = 0.5
x2 = 0.3
x3 = 0.4
x4 = 2.5
x5 = 2.1
x6 = 2.3
I instead have series of multiple values (count per "date" can change), for example:
x0 = (0.1,0.11,0.45,0.26,...)
x1 = (0.5,0.3,0.4,0.43,...)
x2 = (0.3,0.2)
x3 = (0.4,0.21,0.32,0.54)
x4 = (2.5,2.1,2.65,2.57,...)
x5 = (2.1,2.15,2.6,2.33, 2.41)
x6 = (2.3, 2.12, 2.39, 2.54, 2.16)
I had a few ideas but that I don’t like very much:
- Computing a descriptive statistic (mean, median, stddev) for each date, and apply changepoint detections to those
- This can get quite expensive
- This doesn’t seem reliable
- assign each value of a "date" to multiple fake "dates"
- Can and will skew the results
- There is a big loss of information
Is there some algorithm that could could work with such data?
Edit:
http://www.jmlr.org/papers/v20/16-155.html Could be answering this question, still have to read it.