*Bounty: 50*

*Bounty: 50*

There’s a `sklearn`

calibration curve example which shows curves for different classifiers. I changed it to reproduce an issue I am having on a true dataset by adding class imbalance (.95,.05). I get the following curve.

I see that for the blue curve “Mean predicted value” never goes above 0.8. There is an argument to “Normalize:”

`Whether y_prob needs to be normalized into the bin [0, 1], i.e. is not a proper probability. If True, the smallest value in y_prob is mapped onto 0 and the largest one onto 1.`

When I normalize:

This pulls the blue curve to the right, but I am confused because logistic regression *does* output proper probabilities. Has it simply not seen enough positive cases because of the class imbalance to confidently give good probability estimates (there are 4,500, which is not small!)? If I were to report this, should I leave it as the truncated plot to show that there is an upper limit to the predicted probabilities? In general, I have found that when I calibrate the classifiers with independent data, it sometimes changes closeness to the curve, but it does not always pull the curve over.

Update:

Austin and Steyerberg (2013) 10.1002/sim.5941 Figure 2 plots are not normalized (note the left hand side of plots for low outcome prevalence):

So the “truncated” curves should be kept and indicate poor calibration?

Also it’s interesting that N=500 and prevalence=0.1 (50 cases and 450 controls) has better calibration than N=10000 and prevalence=0.01 (100 cases), suggesting that it’s not the the raw number of cases, but the actual relative number (the prevalence), that leads to poor calibration…