#StackBounty: #python #scikit-learn #time-series TimeSeriesSplit – how to aggregate (or un-silo) splits?

Bounty: 50

There are lots of examples online that show how to use TimeSeriesSplit to create multiple training/test sets. However, they don’t show how to actually aggregate these in practice.

For example, this is provided from the scikit-learn documentation:

from sklearn.model_selection import TimeSeriesSplit
X = np.array([[1, 2], [3, 4], [1, 2], [3, 4], [1, 2], [3, 4]])
y = np.array([1, 2, 3, 4, 5, 6])
tscv = TimeSeriesSplit(n_splits=5)
for train_index, test_index in tscv.split(X):
   print("TRAIN:", train_index, "TEST:", test_index)
   X_train, X_test = X[train_index], X[test_index]
   y_train, y_test = y[train_index], y[test_index]

Which produces the results:

TRAIN: [0] TEST: [1]

TRAIN: [0 1] TEST: [2]

TRAIN: [0 1 2] TEST: [3]

TRAIN: [0 1 2 3] TEST: [4]

TRAIN: [0 1 2 3 4] TEST: [5]

However, it is not clear on how to actually utilize these multiple splits in a training regime. I can use each individually, but then future trainers are not benefitting from the previous split. Right now my best guess would be combining all splits together work? So I’m left with:

TRAIN: [0, 0, 1, 0, 1, 2, 0, 1, 2, 3, 0, 1, 2, 3, 4] TEST [1, 2, 3, 4, 5]

Or is there something else I’m missing?


Get this bounty!!!

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.