## #StackBounty: #python #forecasting #model-selection #ensemble #gradient How to calculate gradient for custom objective function in xgbo…

### Bounty: 100

I’m trying to build an implementation of the Feature-based Forecast Model Averaging approach in Python (https://robjhyndman.com/papers/fforma.pdf). However, I’m sort of stuck on computing the gradient and hessian for my custom objective function.

The idea in the paper is as follows: there is an array `contribution_to_error` that contains for each of the time series and each of the models that you use, the average prediction error of that model (the Mean Absolute Percentage Error). That is, element $$x_{i,j}$$ contains the average error of model $$j$$ for time series $$i$$. An example of the input is shown below, the `contribution_to_error` contains the `values` from the dataframe.

Then it uses a softmax transform to map the errors to model weights. The loss function then is the weights times the original errors (the weighted average of the errors).

``````def fforma_objective(self, predt: np.ndarray, dtrain) -> (np.ndarray, np.ndarray):
'''
Compute...
'''
#labels of the elements in the training set
y = dtrain.get_label().astype(int)
n_train = len(y)
self.y_obj = y

preds = np.reshape(predt,
self.contribution_to_error[y, :].shape,
order='F')

preds_transformed = softmax(preds, axis=1)

weighted_avg_loss_func = (preds_transformed*self.contribution_to_error[y, :]).sum(axis=1).reshape((n_train, 1))

grad = preds_transformed*(self.contribution_to_error[y, :] - weighted_avg_loss_func)

``````

My question is the following. This paper looks at average prediction errors for the models (considering for instance a 12 period ahead horizon). I would like to change it to looking at the forecasting errors of the individual periods, and optimizing over that. This way, you could benefit from the information that some models underestimate a specific forecast and other models overestimate a forecast, which could perhaps ‘cancel’ out. So the input would then be (ds runs from 1 to 12, the individual periods):

Now how do I need to change the gradient and Hessian if I use these individual errors? Including the fact that some errors are actually negative instead of only looking at absolute values.

My idea is the following:

``````# Objective function for lgb
def fforma_objective(self, predt: np.ndarray, dtrain) -> (np.ndarray, np.ndarray):
'''
Compute...
'''
#labels of the elements in the training set
y = dtrain.get_label().astype(int)
n_train = len(y)
self.y_obj = y

preds = np.reshape(predt,
self.contribution_to_error[y, :].shape,
order='F')

preds_transformed = softmax(preds, axis=1)

#Changed to use all individual errors.
preds_transformed_new = np.repeat(preds_transformed, 12, axis = 0)

#The np.abs here makes sure that after weighting for the individual periods, you do look at absolute errors. Otherwise grouping might show incorrect mean errors.
weighted_avg_loss_func = np.abs((preds_transformed_new*self.errors_full.loc[y].reindex(y, level = 0)).sum(axis = 1)).groupby('unique_id').mean().values.reshape((n_train, 1))

weighted_avg_loss_func_ungrouped = np.abs((preds_transformed_new*self.errors_full.loc[y].reindex(y, level = 0)).sum(axis = 1))

grad = preds_transformed*((np.abs(self.errors_full.loc[y].reindex(y, level = 0)) - np.array([weighted_avg_loss_func_ungrouped.values]).T).groupby('unique_id')[self.errors_full.columns].mean().values)

hess = (np.abs(self.errors_full.loc[y].reindex(y, level=0))*preds_transformed_new*(1-preds_transformed_new)).groupby('unique_id')[self.errors_full.columns].mean().values - grad*preds_transformed

``````

Any feedback would be much appreciated

Get this bounty!!!

## #StackBounty: #neural-networks #ensemble #sgd Neural network doesn't converge but has good performance

### Bounty: 100

I have a sequence (> 100 million) of symbols and several models predict the next symbol. To combine these predictions I’m using stacked generalization with a multilayer perceptron trained with online gradient descent.

For the inputs the network uses the predictions of the models so the total inputs are nmodels * nsymbols. As outputs there is one node per symbol. The network is trained on each new symbol.

The network doesn’t stabilize, that is, if it stops being trained the predictions become worse and the weights are constantly changing. But the predictions are better than any single model and better than any simple combination such as weighted majority, exponential moving average etc.

I’ve also tried using as inputs the last N predictions(N * nmodels * nsymbols), the last N predictions + the last N symbols of the sequence, the last N predictions + the last N symbols + the last N probabilities the network assigned for the actual symbol. In all cases the predictions improve but still no stabilization.

The sequence has some times long sub-sequences where the predictions of the models are very accurate but most of the time this doesn’t happen.

My main concern is with understanding why this happens. Any ideas?

EDIT: The sequence appears to be non-stationary.

Get this bounty!!!

## #StackBounty: #neural-networks #ensemble #sgd Neural network doesn't converge but has good performance

### Bounty: 100

I have a sequence (> 100 million) of symbols and several models predict the next symbol. To combine these predictions I’m using stacked generalization with a multilayer perceptron trained with online gradient descent.

For the inputs the network uses the predictions of the models so the total inputs are nmodels * nsymbols. As outputs there is one node per symbol. The network is trained on each new symbol.

The network doesn’t stabilize, that is, if it stops being trained the predictions become worse and the weights are constantly changing. But the predictions are better than any single model and better than any simple combination such as weighted majority, exponential moving average etc.

I’ve also tried using as inputs the last N predictions(N * nmodels * nsymbols), the last N predictions + the last N symbols of the sequence, the last N predictions + the last N symbols + the last N probabilities the network assigned for the actual symbol. In all cases the predictions improve but still no stabilization.

The sequence has some times long sub-sequences where the predictions of the models are very accurate but most of the time this doesn’t happen.

My main concern is with understanding why this happens. Any ideas?

EDIT: The sequence appears to be non-stationary.

Get this bounty!!!

## #StackBounty: #neural-networks #ensemble #sgd Neural network doesn't converge but has good performance

### Bounty: 100

I have a sequence (> 100 million) of symbols and several models predict the next symbol. To combine these predictions I’m using stacked generalization with a multilayer perceptron trained with online gradient descent.

For the inputs the network uses the predictions of the models so the total inputs are nmodels * nsymbols. As outputs there is one node per symbol. The network is trained on each new symbol.

The network doesn’t stabilize, that is, if it stops being trained the predictions become worse and the weights are constantly changing. But the predictions are better than any single model and better than any simple combination such as weighted majority, exponential moving average etc.

I’ve also tried using as inputs the last N predictions(N * nmodels * nsymbols), the last N predictions + the last N symbols of the sequence, the last N predictions + the last N symbols + the last N probabilities the network assigned for the actual symbol. In all cases the predictions improve but still no stabilization.

The sequence has some times long sub-sequences where the predictions of the models are very accurate but most of the time this doesn’t happen.

My main concern is with understanding why this happens. Any ideas?

EDIT: The sequence appears to be non-stationary.

Get this bounty!!!

## #StackBounty: #neural-networks #ensemble #sgd Neural network doesn't converge but has good performance

### Bounty: 100

I have a sequence (> 100 million) of symbols and several models predict the next symbol. To combine these predictions I’m using stacked generalization with a multilayer perceptron trained with online gradient descent.

For the inputs the network uses the predictions of the models so the total inputs are nmodels * nsymbols. As outputs there is one node per symbol. The network is trained on each new symbol.

The network doesn’t stabilize, that is, if it stops being trained the predictions become worse and the weights are constantly changing. But the predictions are better than any single model and better than any simple combination such as weighted majority, exponential moving average etc.

I’ve also tried using as inputs the last N predictions(N * nmodels * nsymbols), the last N predictions + the last N symbols of the sequence, the last N predictions + the last N symbols + the last N probabilities the network assigned for the actual symbol. In all cases the predictions improve but still no stabilization.

The sequence has some times long sub-sequences where the predictions of the models are very accurate but most of the time this doesn’t happen.

My main concern is with understanding why this happens. Any ideas?

EDIT: The sequence appears to be non-stationary.

Get this bounty!!!

## #StackBounty: #neural-networks #ensemble #sgd Neural network doesn't converge but has good performance

### Bounty: 100

I have a sequence (> 100 million) of symbols and several models predict the next symbol. To combine these predictions I’m using stacked generalization with a multilayer perceptron trained with online gradient descent.

For the inputs the network uses the predictions of the models so the total inputs are nmodels * nsymbols. As outputs there is one node per symbol. The network is trained on each new symbol.

The network doesn’t stabilize, that is, if it stops being trained the predictions become worse and the weights are constantly changing. But the predictions are better than any single model and better than any simple combination such as weighted majority, exponential moving average etc.

I’ve also tried using as inputs the last N predictions(N * nmodels * nsymbols), the last N predictions + the last N symbols of the sequence, the last N predictions + the last N symbols + the last N probabilities the network assigned for the actual symbol. In all cases the predictions improve but still no stabilization.

The sequence has some times long sub-sequences where the predictions of the models are very accurate but most of the time this doesn’t happen.

My main concern is with understanding why this happens. Any ideas?

EDIT: The sequence appears to be non-stationary.

Get this bounty!!!

## #StackBounty: #neural-networks #ensemble #sgd Neural network doesn't converge but has good performance

### Bounty: 100

I have a sequence (> 100 million) of symbols and several models predict the next symbol. To combine these predictions I’m using stacked generalization with a multilayer perceptron trained with online gradient descent.

For the inputs the network uses the predictions of the models so the total inputs are nmodels * nsymbols. As outputs there is one node per symbol. The network is trained on each new symbol.

The network doesn’t stabilize, that is, if it stops being trained the predictions become worse and the weights are constantly changing. But the predictions are better than any single model and better than any simple combination such as weighted majority, exponential moving average etc.

I’ve also tried using as inputs the last N predictions(N * nmodels * nsymbols), the last N predictions + the last N symbols of the sequence, the last N predictions + the last N symbols + the last N probabilities the network assigned for the actual symbol. In all cases the predictions improve but still no stabilization.

The sequence has some times long sub-sequences where the predictions of the models are very accurate but most of the time this doesn’t happen.

My main concern is with understanding why this happens. Any ideas?

EDIT: The sequence appears to be non-stationary.

Get this bounty!!!

## #StackBounty: #neural-networks #ensemble #sgd Neural network doesn't converge but has good performance

### Bounty: 100

I have a sequence (> 100 million) of symbols and several models predict the next symbol. To combine these predictions I’m using stacked generalization with a multilayer perceptron trained with online gradient descent.

For the inputs the network uses the predictions of the models so the total inputs are nmodels * nsymbols. As outputs there is one node per symbol. The network is trained on each new symbol.

The network doesn’t stabilize, that is, if it stops being trained the predictions become worse and the weights are constantly changing. But the predictions are better than any single model and better than any simple combination such as weighted majority, exponential moving average etc.

I’ve also tried using as inputs the last N predictions(N * nmodels * nsymbols), the last N predictions + the last N symbols of the sequence, the last N predictions + the last N symbols + the last N probabilities the network assigned for the actual symbol. In all cases the predictions improve but still no stabilization.

The sequence has some times long sub-sequences where the predictions of the models are very accurate but most of the time this doesn’t happen.

My main concern is with understanding why this happens. Any ideas?

EDIT: The sequence appears to be non-stationary.

Get this bounty!!!

## #StackBounty: #neural-networks #ensemble #sgd Neural network doesn't converge but has good performance

### Bounty: 100

I have a sequence (> 100 million) of symbols and several models predict the next symbol. To combine these predictions I’m using stacked generalization with a multilayer perceptron trained with online gradient descent.

For the inputs the network uses the predictions of the models so the total inputs are nmodels * nsymbols. As outputs there is one node per symbol. The network is trained on each new symbol.

The network doesn’t stabilize, that is, if it stops being trained the predictions become worse and the weights are constantly changing. But the predictions are better than any single model and better than any simple combination such as weighted majority, exponential moving average etc.

I’ve also tried using as inputs the last N predictions(N * nmodels * nsymbols), the last N predictions + the last N symbols of the sequence, the last N predictions + the last N symbols + the last N probabilities the network assigned for the actual symbol. In all cases the predictions improve but still no stabilization.

The sequence has some times long sub-sequences where the predictions of the models are very accurate but most of the time this doesn’t happen.

My main concern is with understanding why this happens. Any ideas?

EDIT: The sequence appears to be non-stationary.

Get this bounty!!!

## #StackBounty: #neural-networks #ensemble #sgd Neural network doesn't converge but has good performance

### Bounty: 100

I have a sequence (> 100 million) of symbols and several models predict the next symbol. To combine these predictions I’m using stacked generalization with a multilayer perceptron trained with online gradient descent.

For the inputs the network uses the predictions of the models so the total inputs are nmodels * nsymbols. As outputs there is one node per symbol. The network is trained on each new symbol.

The network doesn’t stabilize, that is, if it stops being trained the predictions become worse and the weights are constantly changing. But the predictions are better than any single model and better than any simple combination such as weighted majority, exponential moving average etc.

I’ve also tried using as inputs the last N predictions(N * nmodels * nsymbols), the last N predictions + the last N symbols of the sequence, the last N predictions + the last N symbols + the last N probabilities the network assigned for the actual symbol. In all cases the predictions improve but still no stabilization.

The sequence has some times long sub-sequences where the predictions of the models are very accurate but most of the time this doesn’t happen.

My main concern is with understanding why this happens. Any ideas?

EDIT: The sequence appears to be non-stationary.

Get this bounty!!!