I have a sequence (> 100 million) of symbols and several models predict the next symbol. To combine these predictions I’m using stacked generalization with a multilayer perceptron trained with online gradient descent.
For the inputs the network uses the predictions of the models so the total inputs are nmodels * nsymbols. As outputs there is one node per symbol. The network is trained on each new symbol.
The network doesn’t stabilize, that is, if it stops being trained the predictions become worse and the weights are constantly changing. But the predictions are better than any single model and better than any simple combination such as weighted majority, exponential moving average etc.
I’ve also tried using as inputs the last N predictions(N * nmodels * nsymbols), the last N predictions + the last N symbols of the sequence, the last N predictions + the last N symbols + the last N probabilities the network assigned for the actual symbol. In all cases the predictions improve but still no stabilization.
The sequence has some times long sub-sequences where the predictions of the models are very accurate but most of the time this doesn’t happen.
My main concern is with understanding why this happens. Any ideas?
EDIT: The sequence appears to be non-stationary.