I am trying to build a Gaussian process (GP) regression for a problem in which each experiment is computationally expensive, using cross-validation. Here is how I do it:
- Build the GP regressor on the full available dataset, with hyperparameter optimization (anisotropic Gaussian kernel)
- Perform 10-fold cross validation using the optimized hyperparameter set from the previous step
Now, what model should I select as the output of my procedure?
- The model trained on the full dataset, considering that its global performance is validated by each cross validation fold?
- A compound of each of the 10 models from cross validation?
- The model from cross validation with the highest score?
I’m currently going for #1 but I was opposed that my model was then not properly validated. But I think it is implicitly validated because I used the same hyperparameters in cross validation as in the model. #2 would perhaps be better but this does not feel right to me. #3, in my opinion, is not an option, because that could mean selecting a model which performs well just because the few validation cases are adapted to it.
Am I doing the process right?