#StackBounty: #probability #neural-networks #ensemble Model ensembling – averaging of probabilities

From the BatchNorm paper, section 4.2.3, (https://arxiv.org/abs/1502.03167),

The ensemble prediction was based on the arithmetic average of class
probabilities predicted by the constituent networks.

Is there a theoretical basis for doing this? Is the output value after averaging of individual probabilities, still a valid probability?

