#StackBounty: #xgboost xgboost gain vs kolmogorov smirnov

Bounty: 50

After running xgboost model with:

objective = 'binary:logistic'
eval_metric = 'logloss' 

I have a group of 3 variables that have the highest values of gain. Now, if I replace each one of the 20 more important variables according to this metric by their mean one by one and calculate the kolmogorov smirnov coefficient (KS), I get that the one that reduces the most the ks is not one of those 3, but one that has a relative low gain.


    Gain    Cover
v1  21.5%   2.5%
v2  12.9%   4.1%
v3  11.1%   1.8%
v4  3.5%    3.4%
v5  2.7%    1.7%
v6  2.4%    2.5%
v7  2.3%    2.2%
v8  2.2%    1.9%
v9  1.9%    4.0%
v10 1.9%    2.0%
v11 1.9%    0.9%
v12 1.6%    4.6% *****

ks of replacing each variable by its mean (one by one)

v12         39% *****
the rest    45%

How is this explained? Thanks.

Get this bounty!!!

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.