*Bounty: 50*

I am familiar with “regular” cross-validation, but now I want to make timeseries predictions while using cross-validation with a simple linear regression function.

I write down a simple example, to help clarify my two questions: one about the train/test split, one question about how to train/test for models when the aim is to predict for different n, with n the steps of prediction, in advance.

(1) **The data**

Assume I have data for timepoints 1,…,10 as follows:

```
timeseries = [0.5,0.3,10,4,5,6,1,0.4,0.1,0.9]
```

(2) **Transforming the data into a format useful for supervised learning**

As far as I understand, we can use “lags”, i.e. shifts in the data to create a dataset suited for supervised learning:

```
input = [NaN,0.5,0.3,10,4,5,6,1,0.4,0.1]
output/response = [0.5,0.3,10,4,5,6,1,0.4,0.1,0.9]
```

Here I have simply shifted the timeseries by one for creating the output vector.

As far as I understand, I could now use input as the input for a linear regression model, and output for the response (the NaN could be approximated our replaced with a random value).

(3) Question 1: **Cross-validation** (“backtesting”)

Say I want to do now 2-splits, do I have to shift the train as well as the test sets?

I.e. something like:

Train-set:

Independent variable: [NaN,0.5,0.3,10,4,5]

Output/response variable:[0.5,0.3,10,4,5,6]

Test-set:

Independent variable: [1,0.4,0.1]

Output/response variable:[0.4,0.1,0.9]

(ii) Question 2: **Predicting different lags in advance**:

As obvious, I have shifted dependent to independent variables by 1. Assuming now I would like to train a model which can predict 5 time steps in advance — can I keep this lag of one, and nevertheless use the model to predict n+1,…,n+5,… or do I change the shift from independent to dependent variable to 5? What exactly is the difference?

Get this bounty!!!