#StackBounty: #time-series #cross-validation Cross-validation with time series data, when dealing with one data sample at a time?

Bounty: 50

Background:

  • I’m training a Neural Network for a classification task.
  • I have a dataset consisting of 1M samples (100 features for each sample), that was collected over a period of 5 days. The data for each feature always comes from the same sensor. One example of a feature is a temperature sensor.
  • My training and validation sets are sampled (without shuffling) with 8-fold cross-validation from the data of the first 4 days. Or to phrase it differently: I always use data from 3.5 days for training, and the consecutive data samples from half a day for validation purposes. So to be clear, this half day’s worth of data could be from either of the four first days.
  • My test set is the data from the 5th day.
  • My model takes one data sample at a time as an input and outputs a prediction for the correct class. No historical measurements are included. No predictions of future states are done. A data sample only ever contains the current sensor readings.

My concern:
From what I understand, applying cross-validation to time series data is usually done slightly different than what I presented above (see for example this post), e.g. with a type of forward-chaining / rolling method, to avoid a look-ahead bias and because we can’t fully expect completely i.i.d data samples. My data could be said to be a sort of time series, even though I do not necessarily model and treat it that way. For example, I only ever feed 1 data sample at a time to my network, without including any historical data measurements. Because of this, my gut tells me that it should be fine to use a normal k-fold cross-validation in this particular case, and that I would only need to change the approach if it was properly modelled as a time series task (for example by feeding several historical data samples at a time to my model to estimate the current state). Is my gut right or wrong about this? If it is not, why?


Get this bounty!!!

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.