*Bounty: 50*

*Bounty: 50*

I have been working on a multilabel classification problem. I want to classify whether each of 25 labels is present on a given sample. The labels are not mutually exclusive. Ultimately, I would like to rank the classifier’s outputs to say something like “labels A, B, and D are most likely with probabilities X, Y, Z”.

I have built a multioutput classifier using logistic regression as the base classifier in scikit. It looks like each label classifier is an independent binary classifier. My question is, how can I compare the probabilities output by each classifier? As I said, I ultimately want to be able to compare the likelihood of a given label with that of the other labels in order to rank the certainty of their appearing. I know logistic regression outputs well-calibrated models, but are the probabilities of the 25 binary classifiers directly comparable? Would calibrating these classifiers help to ensure that their output probabilities are comparable?