*Bounty: 100*

*Bounty: 100*

I’m trying to build an implementation of the Feature-based Forecast Model Averaging approach in Python (https://robjhyndman.com/papers/fforma.pdf). However, I’m sort of stuck on computing the gradient and hessian for my custom objective function.

The idea in the paper is as follows: there is an array `contribution_to_error`

that contains for each of the time series and each of the models that you use, the average prediction error of that model (the Mean Absolute Percentage Error). That is, element $x_{i,j}$ contains the average error of model $j$ for time series $i$. An example of the input is shown below, the `contribution_to_error`

contains the `values`

from the dataframe.

Then it uses a softmax transform to map the errors to model weights. The loss function then is the weights times the original errors (the weighted average of the errors).

```
def fforma_objective(self, predt: np.ndarray, dtrain) -> (np.ndarray, np.ndarray):
'''
Compute...
'''
#labels of the elements in the training set
y = dtrain.get_label().astype(int)
n_train = len(y)
self.y_obj = y
preds = np.reshape(predt,
self.contribution_to_error[y, :].shape,
order='F')
preds_transformed = softmax(preds, axis=1)
weighted_avg_loss_func = (preds_transformed*self.contribution_to_error[y, :]).sum(axis=1).reshape((n_train, 1))
grad = preds_transformed*(self.contribution_to_error[y, :] - weighted_avg_loss_func)
hess = (self.contribution_to_error[y,:])*preds_transformed*(1.0-preds_transformed) - grad*preds_transformed
return grad.flatten('F'), hess.flatten('F')
```

My question is the following. This paper looks at average prediction errors for the models (considering for instance a 12 period ahead horizon). I would like to change it to looking at the forecasting errors of the individual periods, and optimizing over that. This way, you could benefit from the information that some models underestimate a specific forecast and other models overestimate a forecast, which could perhaps ‘cancel’ out. So the input would then be (ds runs from 1 to 12, the individual periods):

**Now how do I need to change the gradient and Hessian if I use these individual errors? Including the fact that some errors are actually negative instead of only looking at absolute values.**

My idea is the following:

```
# Objective function for lgb
def fforma_objective(self, predt: np.ndarray, dtrain) -> (np.ndarray, np.ndarray):
'''
Compute...
'''
#labels of the elements in the training set
y = dtrain.get_label().astype(int)
n_train = len(y)
self.y_obj = y
preds = np.reshape(predt,
self.contribution_to_error[y, :].shape,
order='F')
preds_transformed = softmax(preds, axis=1)
#Changed to use all individual errors.
preds_transformed_new = np.repeat(preds_transformed, 12, axis = 0)
#The np.abs here makes sure that after weighting for the individual periods, you do look at absolute errors. Otherwise grouping might show incorrect mean errors.
weighted_avg_loss_func = np.abs((preds_transformed_new*self.errors_full.loc[y].reindex(y, level = 0)).sum(axis = 1)).groupby('unique_id').mean().values.reshape((n_train, 1))
weighted_avg_loss_func_ungrouped = np.abs((preds_transformed_new*self.errors_full.loc[y].reindex(y, level = 0)).sum(axis = 1))
grad = preds_transformed*((np.abs(self.errors_full.loc[y].reindex(y, level = 0)) - np.array([weighted_avg_loss_func_ungrouped.values]).T).groupby('unique_id')[self.errors_full.columns].mean().values)
hess = (np.abs(self.errors_full.loc[y].reindex(y, level=0))*preds_transformed_new*(1-preds_transformed_new)).groupby('unique_id')[self.errors_full.columns].mean().values - grad*preds_transformed
return grad.flatten('F'), hess.flatten('F')
```

Any feedback would be much appreciated