#StackBounty: #python #search #grid #knn #imputation grid search with own estimator in python

Bounty: 50

I am trying to build my own estimator (regressor) and use it for imputation (KnnImputation).
For now – I wrote a very basic knn imputation, when it will work I will change the algorithm a bit.

I’m having a problem using the grid search “GridSearchCV”.
I tried to print the shapes of the data to get a sense of the problem. When the configuration is 10-fold cross validation I would expect the prints to be 10 times this lines:

xTrain size : 2487
yTrain size : 2487
test size : 276

But instead I get this lines:

xTrain size : 2487
yTrain size : 2487
test size : 276

test size : 2487

more the 60 times.

Any idea what is the problem?

My Code:

class KnnImputation(BaseEstimator, RegressorMixin):

    def __init__(self, k=5, distance='euclidean'):
        self.k = k
        self.distance = distance

    def get_params(self, deep=False):
        return {'k': self.k, 'distance': self.distance}

    def set_params(self, **parameters):
        self.k = parameters['k']
        self.distance = parameters['distance']

    def fit(self, X, y):
        self.xTrain = X.values
        self.yTrain = y.values
        print("nxTrain size : " + str(self.xTrain.shape[0]))
        print("yTrain size : " + str(self.yTrain.shape[0]))

        return self

    def predict(self, xTest):
        xTest = xTest.values
        num_test = xTest.shape[0]
        print("test size : " + str(num_test)+"n")
        yPred = np.zeros(num_test, dtype=self.yTrain.dtype).reshape(-1, 1)

        for i in range(num_test):
            distances = np.sum(np.abs(self.xTrain - xTest[i, :]), axis=1)
            idx = np.argsort(distances)
            minIndexes = idx[:self.k]
            kClosest = self.yTrain[minIndexes[:]]
            yPred[i] = np.mean(kClosest)

        return yPred

kf = KFold(n_splits=10, shuffle=False, random_state=23)
NN = KnnImputation()
gridSearchNN = GridSearchCV(NN, param_grid=params, scoring="neg_mean_squared_error", n_jobs=1,
                                cv=kf.split(xTrain, yTrain), verbose=1)
gridSearchNN.fit(X=xTrain, y=yTrain)

The Error:

  File "C:Users....dataImputation.py", line 85, in knnImputationMethod
    gridSearchNN.fit(X=xTrain, y=yTrain)
  File "C:Users...Anaconda3libsite-packagessklearnmodel_selection_search.py", line 740, in fit
    self.best_estimator_.fit(X, y, **fit_params)
AttributeError: 'NoneType' object has no attribute 'fit'

Get this bounty!!!

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.