*Bounty: 50*

*Bounty: 50*

I have $N$ (time) sequences of data with length $2048$. Each of these sequences correseponds to a different target output. However, I know that only a small part of the sequence is needed to actually predict this target output, say a sub-sequence of length $128$.

I could split up each of the sequences into $16$ partitions of $128$, so that I end up with $16N$ training smaples. However, I could drastically increase the number of training samples if I use a sliding window instead: there are $2048-128 = 1920$ unique sub-sequences of length $128$ that preserve the time series. That means I could in fact generate $1920N$ unique training samples, even though most of the input is overlapping.

I could also use a larger increment between individual "windows", which would reduce the number of sub-sequences but it could remove any autocorrelation between them.

**Is it better to split my data into $16N$ non-overlapping sub-sequences or $1920N$ partially overlapping sub-sequences?**