sklearn calibration curve example which shows curves for different classifiers. I changed it to reproduce an issue I am having on a true dataset by adding class imbalance (.95,.05). I get the following curve.
I see that for the blue curve “Mean predicted value” never goes above 0.8. There is an argument to “Normalize:”
Whether y_prob needs to be normalized into the bin [0, 1], i.e. is not a proper probability. If True, the smallest value in y_prob is mapped onto 0 and the largest one onto 1.
When I normalize:
This pulls the blue curve to the right, but I am confused because logistic regression does output proper probabilities. Has it simply not seen enough positive cases because of the class imbalance to confidently give good probability estimates (there are 4,500, which is not small!)? If I were to report this, should I leave it as the truncated plot to show that there is an upper limit to the predicted probabilities? In general, I have found that when I calibrate the classifiers with independent data, it sometimes changes closeness to the curve, but it does not always pull the curve over.
Austin and Steyerberg (2013) 10.1002/sim.5941 Figure 2 plots are not normalized (note the left hand side of plots for low outcome prevalence):
So the “truncated” curves should be kept and indicate poor calibration?
Also it’s interesting that N=500 and prevalence=0.1 (50 cases and 450 controls) has better calibration than N=10000 and prevalence=0.01 (100 cases), suggesting that it’s not the the raw number of cases, but the actual relative number (the prevalence), that leads to poor calibration…