*Bounty: 50*

I miss some very basic distinction between cross-validations used for parameter tuning and cross-validation used for calculating the performance of my algorithms (RMSE).

I have two functions: one performs grid search and the other calculates cross-validated RMSE.

```
def grid_search(clf, param_grid, x_train, y_train, kf):
grid_model = GridSearchCV(estimator = clf,
param_grid = param_grid,
cv = kf, verbose = 2)
grid_model.fit(x_train, y_train)
def rmse_cv(clf, x_train, y_train, kf):
rmses_cross = np.sqrt(-cross_val_score(clf, x_train, y_train, scoring="neg_mean_squared_error", cv = kf))
return rmses_cross
```

The functions are called this way:

```
X_train, X_test, y_train, y_test = train_test_split(dataset, Y, test_size=0.2, random_state=26)
kf = KFold(10, shuffle = True, random_state = 26)
grid_search(clf, param_grid, X_train, y_train, kf)
# adjust parameters of a regressor
rmses_cross = rmse_cv(clf, splits, X_train, y_train, kf)
```

As you see I use the same KFold for my parameter tuning and exactly the same KFold set for my calculation of cross-validation RMSE.

And on basis of the calculated cross RMSEs I chose which algorithms performs better. BUT RMSEs are calculated exactly on the same folds on which hyper parameter tuning was performed.

Is it incorrect to do so? I feel that while tuning my model learns on the hold-out folds and it would be incorrect to use them when calculating the RMSEs. Should I choose different KFold for calculation of RMSE?

**EDIT**:

Why do those two codes produce two different results? I though the cross_val_score refits a given model to each fold. And therefore applying cross_val_score on grid_model or parameterised model should be the same.

```
kf = KFold(10, shuffle = True, random_state = 26)
```

First:

```
grid_model = grid_search(clf, param_grid, X_train, y_train, kf)
grid_model.fit(x_train, y_train)
clf = SVM(kernel='rbf',C=grid_model.best_params_['C'])
rmses_cross = np.sqrt(-cross_val_score(clf, x_train, y_train,
scoring="neg_mean_squared_error",cv = kf))
```

Second:

```
grid_model = grid_search(clf, param_grid, X_train, y_train, kf)
grid_model.fit(x_train, y_train)
rmses_cross = np.sqrt(-cross_val_score(grid_model, x_train, y_train,
scoring="neg_mean_squared_error", cv = kf))
```

Get this bounty!!!