*Bounty: 50*

*Bounty: 50*

Beta distribution appears under two parametrizations (or here)

$$ f(x) propto x^{alpha} (1-x)^{beta} tag{1} $$

or the one that seems to be used more commonly

$$ f(x) propto x^{alpha-1} (1-x)^{beta-1} tag{2} $$

But why exactly is there “$-1$” in the second formula?

The first formulation intuitively *seem* to more directly correspond to binomial distribution

$$ g(k) propto p^k (1-p)^{n-k} tag{3} $$

but “seen” from the $p$’s perspective. This is especially clear in beta-binomial model where $alpha$ can be understood as a *prior* number of successes and $beta$ is a *prior* number of failures.

So why exactly did the second form gain popularity and what is the *rationale* behind it? What are the *consequences* of using either of the parametrization (e.g. for the connection with binomial distribution)?

It would be great if someone could additionally point origins of such choice and the initial arguments for it, but it is not a necessity for me.