My question here is mostly about general-intuition logic: when using a RNN (LSTM) for predicting a time series, and you have the goal of, for example, predicting at
100 steps ahead a series of
one single feature, what kinds of configurations would make sense for the layer sizes of a simple
input | hidden | output RNN, and of the window size (assuming you want to look at more than one point at one so you pick a “window” / “interval”?
- use only 1 input neuron and look at one point at a time, or pick a window/interval? – obviously that by looking only at 1 point, you lean entirely on the memory aspect of the network for detecting useful patterns… would this be a good thing?
- when using a window, does it make sense to use a set of overlapping-windows, eg. to slide a window of 10 points one point at a time, or non-overlapping windows?
- should there number of units in the hidden layer roughly equal the number of steps you aim to predict ahead? (also, similar thing with the relationship between window size and steps to predict ahead)
Or, if answering this is too hard or time consuming, where could one get the intuitions for answering such questions?
(Because obviously the space of possible configurations is huge, and unless you have a ton of time or resource, you need some intuitions so you can start form a configuration that “roughly works”…)