*Bounty: 50*

*Bounty: 50*

I’m doing classification of 8 types of hand gestures with stacking models. For that I initially split the data into training and test sets. Then I used `GridSerachCV`

to tune the hyper-parameters.

Here’s the code :

```
param_grid = [
{
#Random forest
'bootstrap': [True, False],
'max_depth': [40, 50, 60, 70, 80],
#'max_features': [2, 3],
'min_samples_leaf': [3, 4, 5],
'min_samples_split': [8, 10, 12],
'n_estimators': [10, 15, 20, 25],
'criterion' : ['gini', 'entropy'],
'random_state' : [45]
},
{
#K Nearest Neighbours
'n_neighbors':[5,6,7,9,11],
'leaf_size':[1,3,5,7],
'algorithm':['auto', 'ball_tree', 'kd_tree', 'brute'],
'metric':['euclidean', 'manhattan']
},
{
#SVM
'C': list(np.arange(1, 5, 0.01)),
'gamma': ['scale', 'auto'],
'kernel': ['rbf', 'poly', 'sigmoid', 'linear'],
'decision_function_shape': ['ovo', 'ovr'],
'random_state' : [45]
}
]
models_to_train = [RandomForestClassifier(), KNeighborsClassifier(), svm.SVC()]
final_models = []
for i, model in enumerate(models_to_train):
params = param_grid[i]
clf = GridSearchCV(estimator=model, param_grid=params, cv=20, scoring = 'accuracy').fit(data_train, label_train)
final_models.append(clf.best_estimator_)
```

Now, I trained the best models, output by `GridSearchCV`

, on the training data and evaluated it on the test data:

```
estimators = [
('rf', final_models[0]),
('knn', final_models[1])
]
clf = StackingClassifier(
estimators=estimators, final_estimator=final_models[2]
)
category_predicted = clf.fit(data_train, label_train).predict(data_test)
acc = accuracy_score(label_test, category_predicted) * 100
```

My doubt is:

I performed train-test split in the beginning and I didn’t use nested CV because I thought it would increase time complexity a lot as I was using ensemble model. The model produced very good accuracy, more than 95%. **Is there a high possibility that the model may give very low accuracy if the train-test split changes? So, should I stop doing train-test split in the beginning and should perform nested CV with Grid Search on the entire data (like what is described here )?**