Is it better to split sequences into overlapping or non-overlapping training samples?

I have $$N$$ (time) sequences of data with length $$2048$$. Each of these sequences correseponds to a different target output. However, I know that only a small part of the sequence is needed to actually predict this target output, say a sub-sequence of length $$128$$.

I could split up each of the sequences into $$16$$ partitions of $$128$$, so that I end up with $$16N$$ training smaples. However, I could drastically increase the number of training samples if I use a sliding window instead: there are $$2048-128 = 1920$$ unique sub-sequences of length $$128$$ that preserve the time series. That means I could in fact generate $$1920N$$ unique training samples, even though most of the input is overlapping.

I could also use a larger increment between individual "windows", which would reduce the number of sub-sequences but it could remove any autocorrelation between them.

Is it better to split my data into $$16N$$ non-overlapping sub-sequences or $$1920N$$ partially overlapping sub-sequences?

