#StackBounty: #machine-learning #time-series #neural-networks #data-leakage data leakage when scaling time series

Bounty: 150

Suppose I want to forecast future values of $y$ past values of features $x$.
In this example I am using:

  • the training set goes from $t_0$ to $t_{15}$
  • values from $x_{t_0}$ to $x_{t_{10}}$ to forecast $y_{t_{11}}$
  • values from $x_{t_1}$ to $x_{t_{11}}$ to forecast $y_{t_{12}}$
  • and so on until I use $x_{t_6}$ to $x_{t_{15}}$ to forecast $y_{t_{16}}$

I scale the my feature $x$ using only data in the training set (up to $t_{15}$)

Nevertheless when I try to predict $y_{t_{17}}$ you can see from the picture below that I use some data points that have also been used for scaling.

Is this leakage?

enter image description here

Get this bounty!!!

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.