## #StackBounty: #classification #cross-validation #scikit-learn #hyperparameter #ensemble Should I perform nested CV with Grid Search to …

### Bounty: 50

I’m doing classification of 8 types of hand gestures with stacking models. For that I initially split the data into training and test sets. Then I used `GridSerachCV` to tune the hyper-parameters.

Here’s the code :

``````param_grid = [

{
#Random forest
'bootstrap': [True, False],
'max_depth': [40, 50, 60, 70, 80],
#'max_features': [2, 3],
'min_samples_leaf': [3, 4, 5],
'min_samples_split': [8, 10, 12],
'n_estimators': [10, 15, 20, 25],
'criterion' : ['gini', 'entropy'],
'random_state' : [45]
},

{
#K Nearest Neighbours
'n_neighbors':[5,6,7,9,11],
'leaf_size':[1,3,5,7],
'algorithm':['auto', 'ball_tree', 'kd_tree', 'brute'],
'metric':['euclidean', 'manhattan']

},

{
#SVM
'C': list(np.arange(1, 5, 0.01)),
'gamma': ['scale', 'auto'],
'kernel': ['rbf', 'poly', 'sigmoid', 'linear'],
'decision_function_shape': ['ovo', 'ovr'],
'random_state' : [45]
}
]

models_to_train = [RandomForestClassifier(), KNeighborsClassifier(), svm.SVC()]

final_models = []
for i, model in enumerate(models_to_train):
params = param_grid[i]

clf = GridSearchCV(estimator=model, param_grid=params, cv=20, scoring = 'accuracy').fit(data_train, label_train)
final_models.append(clf.best_estimator_)
``````

Now, I trained the best models, output by `GridSearchCV`, on the training data and evaluated it on the test data:

``````estimators = [
('rf', final_models[0]),
('knn', final_models[1])
]
clf = StackingClassifier(
estimators=estimators, final_estimator=final_models[2]
)

category_predicted = clf.fit(data_train, label_train).predict(data_test)
acc = accuracy_score(label_test, category_predicted) * 100
``````

My doubt is:

I performed train-test split in the beginning and I didn’t use nested CV because I thought it would increase time complexity a lot as I was using ensemble model. The model produced very good accuracy, more than 95%. Is there a high possibility that the model may give very low accuracy if the train-test split changes? So, should I stop doing train-test split in the beginning and should perform nested CV with Grid Search on the entire data (like what is described here )?

Get this bounty!!!

## #StackBounty: #classification #cross-validation #hyperparameter #ensemble #tuning Should I perform nested CV with Grid Search to make m…

### Bounty: 50

I’m doing classification of 8 types of hand gestures with stacking models. For that I initially split the data into training and test sets. Then I used `GridSerachCV` to tune the hyper-parameters.

Here’s the code :

``````param_grid = [

{
#Random forest
'bootstrap': [True, False],
'max_depth': [40, 50, 60, 70, 80],
#'max_features': [2, 3],
'min_samples_leaf': [3, 4, 5],
'min_samples_split': [8, 10, 12],
'n_estimators': [10, 15, 20, 25],
'criterion' : ['gini', 'entropy'],
'random_state' : [45]
},

{
#K Nearest Neighbours
'n_neighbors':[5,6,7,9,11],
'leaf_size':[1,3,5,7],
'algorithm':['auto', 'ball_tree', 'kd_tree', 'brute'],
'metric':['euclidean', 'manhattan']

},

{
#SVM
'C': list(np.arange(1, 5, 0.01)),
'gamma': ['scale', 'auto'],
'kernel': ['rbf', 'poly', 'sigmoid', 'linear'],
'decision_function_shape': ['ovo', 'ovr'],
'random_state' : [45]
}
]

models_to_train = [RandomForestClassifier(), KNeighborsClassifier(), svm.SVC()]

final_models = []
for i, model in enumerate(models_to_train):
params = param_grid[i]

clf = GridSearchCV(estimator=model, param_grid=params, cv=20, scoring = 'accuracy').fit(data_train, label_train)
final_models.append(clf.best_estimator_)
``````

Now, I trained the best models, output by `GridSearchCV`, on the training data and evaluated it on the test data:

``````estimators = [
('rf', final_models[0]),
('knn', final_models[1])
]
clf = StackingClassifier(
estimators=estimators, final_estimator=final_models[2]
)

category_predicted = clf.fit(data_train, label_train).predict(data_test)
acc = accuracy_score(label_test, category_predicted) * 100
``````

My doubt is:

I performed train-test split in the beginning and I didn’t use nested CV because I thought it would increase time complexity a lot as I was using ensemble model. The model produced very good accuracy, more than 95%. Is there a high possibility that the model may give very low accuracy if the train-test split changes? So, should I stop doing train-test split in the beginning and should perform nested CV with Grid Search on the entire data (like what is described here )?

Get this bounty!!!

## #StackBounty: #python #forecasting #model-selection #ensemble #gradient How to calculate gradient for custom objective function in xgbo…

### Bounty: 100

I’m trying to build an implementation of the Feature-based Forecast Model Averaging approach in Python (https://robjhyndman.com/papers/fforma.pdf). However, I’m sort of stuck on computing the gradient and hessian for my custom objective function.

The idea in the paper is as follows: there is an array `contribution_to_error` that contains for each of the time series and each of the models that you use, the average prediction error of that model (the Mean Absolute Percentage Error). That is, element $$x_{i,j}$$ contains the average error of model $$j$$ for time series $$i$$. An example of the input is shown below, the `contribution_to_error` contains the `values` from the dataframe.

Then it uses a softmax transform to map the errors to model weights. The loss function then is the weights times the original errors (the weighted average of the errors).

``````def fforma_objective(self, predt: np.ndarray, dtrain) -> (np.ndarray, np.ndarray):
'''
Compute...
'''
#labels of the elements in the training set
y = dtrain.get_label().astype(int)
n_train = len(y)
self.y_obj = y

preds = np.reshape(predt,
self.contribution_to_error[y, :].shape,
order='F')

preds_transformed = softmax(preds, axis=1)

weighted_avg_loss_func = (preds_transformed*self.contribution_to_error[y, :]).sum(axis=1).reshape((n_train, 1))

grad = preds_transformed*(self.contribution_to_error[y, :] - weighted_avg_loss_func)

``````

My question is the following. This paper looks at average prediction errors for the models (considering for instance a 12 period ahead horizon). I would like to change it to looking at the forecasting errors of the individual periods, and optimizing over that. This way, you could benefit from the information that some models underestimate a specific forecast and other models overestimate a forecast, which could perhaps ‘cancel’ out. So the input would then be (ds runs from 1 to 12, the individual periods):

Now how do I need to change the gradient and Hessian if I use these individual errors? Including the fact that some errors are actually negative instead of only looking at absolute values.

My idea is the following:

``````# Objective function for lgb
def fforma_objective(self, predt: np.ndarray, dtrain) -> (np.ndarray, np.ndarray):
'''
Compute...
'''
#labels of the elements in the training set
y = dtrain.get_label().astype(int)
n_train = len(y)
self.y_obj = y

preds = np.reshape(predt,
self.contribution_to_error[y, :].shape,
order='F')

preds_transformed = softmax(preds, axis=1)

#Changed to use all individual errors.
preds_transformed_new = np.repeat(preds_transformed, 12, axis = 0)

#The np.abs here makes sure that after weighting for the individual periods, you do look at absolute errors. Otherwise grouping might show incorrect mean errors.
weighted_avg_loss_func = np.abs((preds_transformed_new*self.errors_full.loc[y].reindex(y, level = 0)).sum(axis = 1)).groupby('unique_id').mean().values.reshape((n_train, 1))

weighted_avg_loss_func_ungrouped = np.abs((preds_transformed_new*self.errors_full.loc[y].reindex(y, level = 0)).sum(axis = 1))

grad = preds_transformed*((np.abs(self.errors_full.loc[y].reindex(y, level = 0)) - np.array([weighted_avg_loss_func_ungrouped.values]).T).groupby('unique_id')[self.errors_full.columns].mean().values)

hess = (np.abs(self.errors_full.loc[y].reindex(y, level=0))*preds_transformed_new*(1-preds_transformed_new)).groupby('unique_id')[self.errors_full.columns].mean().values - grad*preds_transformed

``````

Any feedback would be much appreciated

Get this bounty!!!

## #StackBounty: #neural-networks #ensemble #sgd Neural network doesn't converge but has good performance

### Bounty: 100

I have a sequence (> 100 million) of symbols and several models predict the next symbol. To combine these predictions I’m using stacked generalization with a multilayer perceptron trained with online gradient descent.

For the inputs the network uses the predictions of the models so the total inputs are nmodels * nsymbols. As outputs there is one node per symbol. The network is trained on each new symbol.

The network doesn’t stabilize, that is, if it stops being trained the predictions become worse and the weights are constantly changing. But the predictions are better than any single model and better than any simple combination such as weighted majority, exponential moving average etc.

I’ve also tried using as inputs the last N predictions(N * nmodels * nsymbols), the last N predictions + the last N symbols of the sequence, the last N predictions + the last N symbols + the last N probabilities the network assigned for the actual symbol. In all cases the predictions improve but still no stabilization.

The sequence has some times long sub-sequences where the predictions of the models are very accurate but most of the time this doesn’t happen.

My main concern is with understanding why this happens. Any ideas?

EDIT: The sequence appears to be non-stationary.

Get this bounty!!!

## #StackBounty: #neural-networks #ensemble #sgd Neural network doesn't converge but has good performance

### Bounty: 100

I have a sequence (> 100 million) of symbols and several models predict the next symbol. To combine these predictions I’m using stacked generalization with a multilayer perceptron trained with online gradient descent.

For the inputs the network uses the predictions of the models so the total inputs are nmodels * nsymbols. As outputs there is one node per symbol. The network is trained on each new symbol.

The network doesn’t stabilize, that is, if it stops being trained the predictions become worse and the weights are constantly changing. But the predictions are better than any single model and better than any simple combination such as weighted majority, exponential moving average etc.

I’ve also tried using as inputs the last N predictions(N * nmodels * nsymbols), the last N predictions + the last N symbols of the sequence, the last N predictions + the last N symbols + the last N probabilities the network assigned for the actual symbol. In all cases the predictions improve but still no stabilization.

The sequence has some times long sub-sequences where the predictions of the models are very accurate but most of the time this doesn’t happen.

My main concern is with understanding why this happens. Any ideas?

EDIT: The sequence appears to be non-stationary.

Get this bounty!!!

## #StackBounty: #neural-networks #ensemble #sgd Neural network doesn't converge but has good performance

### Bounty: 100

I have a sequence (> 100 million) of symbols and several models predict the next symbol. To combine these predictions I’m using stacked generalization with a multilayer perceptron trained with online gradient descent.

For the inputs the network uses the predictions of the models so the total inputs are nmodels * nsymbols. As outputs there is one node per symbol. The network is trained on each new symbol.

The network doesn’t stabilize, that is, if it stops being trained the predictions become worse and the weights are constantly changing. But the predictions are better than any single model and better than any simple combination such as weighted majority, exponential moving average etc.

I’ve also tried using as inputs the last N predictions(N * nmodels * nsymbols), the last N predictions + the last N symbols of the sequence, the last N predictions + the last N symbols + the last N probabilities the network assigned for the actual symbol. In all cases the predictions improve but still no stabilization.

The sequence has some times long sub-sequences where the predictions of the models are very accurate but most of the time this doesn’t happen.

My main concern is with understanding why this happens. Any ideas?

EDIT: The sequence appears to be non-stationary.

Get this bounty!!!

## #StackBounty: #neural-networks #ensemble #sgd Neural network doesn't converge but has good performance

### Bounty: 100

I have a sequence (> 100 million) of symbols and several models predict the next symbol. To combine these predictions I’m using stacked generalization with a multilayer perceptron trained with online gradient descent.

For the inputs the network uses the predictions of the models so the total inputs are nmodels * nsymbols. As outputs there is one node per symbol. The network is trained on each new symbol.

The network doesn’t stabilize, that is, if it stops being trained the predictions become worse and the weights are constantly changing. But the predictions are better than any single model and better than any simple combination such as weighted majority, exponential moving average etc.

I’ve also tried using as inputs the last N predictions(N * nmodels * nsymbols), the last N predictions + the last N symbols of the sequence, the last N predictions + the last N symbols + the last N probabilities the network assigned for the actual symbol. In all cases the predictions improve but still no stabilization.

The sequence has some times long sub-sequences where the predictions of the models are very accurate but most of the time this doesn’t happen.

My main concern is with understanding why this happens. Any ideas?

EDIT: The sequence appears to be non-stationary.

Get this bounty!!!

## #StackBounty: #neural-networks #ensemble #sgd Neural network doesn't converge but has good performance

### Bounty: 100

I have a sequence (> 100 million) of symbols and several models predict the next symbol. To combine these predictions I’m using stacked generalization with a multilayer perceptron trained with online gradient descent.

For the inputs the network uses the predictions of the models so the total inputs are nmodels * nsymbols. As outputs there is one node per symbol. The network is trained on each new symbol.

The network doesn’t stabilize, that is, if it stops being trained the predictions become worse and the weights are constantly changing. But the predictions are better than any single model and better than any simple combination such as weighted majority, exponential moving average etc.

I’ve also tried using as inputs the last N predictions(N * nmodels * nsymbols), the last N predictions + the last N symbols of the sequence, the last N predictions + the last N symbols + the last N probabilities the network assigned for the actual symbol. In all cases the predictions improve but still no stabilization.

The sequence has some times long sub-sequences where the predictions of the models are very accurate but most of the time this doesn’t happen.

My main concern is with understanding why this happens. Any ideas?

EDIT: The sequence appears to be non-stationary.

Get this bounty!!!

## #StackBounty: #neural-networks #ensemble #sgd Neural network doesn't converge but has good performance

### Bounty: 100

I have a sequence (> 100 million) of symbols and several models predict the next symbol. To combine these predictions I’m using stacked generalization with a multilayer perceptron trained with online gradient descent.

For the inputs the network uses the predictions of the models so the total inputs are nmodels * nsymbols. As outputs there is one node per symbol. The network is trained on each new symbol.

The network doesn’t stabilize, that is, if it stops being trained the predictions become worse and the weights are constantly changing. But the predictions are better than any single model and better than any simple combination such as weighted majority, exponential moving average etc.

I’ve also tried using as inputs the last N predictions(N * nmodels * nsymbols), the last N predictions + the last N symbols of the sequence, the last N predictions + the last N symbols + the last N probabilities the network assigned for the actual symbol. In all cases the predictions improve but still no stabilization.

The sequence has some times long sub-sequences where the predictions of the models are very accurate but most of the time this doesn’t happen.

My main concern is with understanding why this happens. Any ideas?

EDIT: The sequence appears to be non-stationary.

Get this bounty!!!

## #StackBounty: #neural-networks #ensemble #sgd Neural network doesn't converge but has good performance

### Bounty: 100

I have a sequence (> 100 million) of symbols and several models predict the next symbol. To combine these predictions I’m using stacked generalization with a multilayer perceptron trained with online gradient descent.

For the inputs the network uses the predictions of the models so the total inputs are nmodels * nsymbols. As outputs there is one node per symbol. The network is trained on each new symbol.

The network doesn’t stabilize, that is, if it stops being trained the predictions become worse and the weights are constantly changing. But the predictions are better than any single model and better than any simple combination such as weighted majority, exponential moving average etc.

I’ve also tried using as inputs the last N predictions(N * nmodels * nsymbols), the last N predictions + the last N symbols of the sequence, the last N predictions + the last N symbols + the last N probabilities the network assigned for the actual symbol. In all cases the predictions improve but still no stabilization.

The sequence has some times long sub-sequences where the predictions of the models are very accurate but most of the time this doesn’t happen.

My main concern is with understanding why this happens. Any ideas?

EDIT: The sequence appears to be non-stationary.

Get this bounty!!!