*Bounty: 100*

*Bounty: 100*

To fix notation, let a set of possible data $X$ and a set of admissible parameter values $Theta$ be given. Let $mathscr P(X)$ be the set of probability distributions on $X$. A parametric statistical model over $X$ and $Theta$ is a mapping $p:Thetatomathscr P(X)$. If $p$ is a statistical model over $X$ and $Theta$, we use the notation $p(cdot,|,theta)$ for the distribution that $theta$ is mapped to by $p$.

Let $p_1$ be a statistical model over $X$ and $Theta_1$, and let $p_2$ be a statistical model over $X$ and $Theta_2$. I’m tempted to propose something like the following notions of equivalence for such models:

**Candidate 1.** $p_1$ and $p_2$ are *form-equivalent* provided they are equal up to reparameterization; there exists a bijection $f:Theta_1toTheta_2$ for which $p_1(x,|,theta_1) = p_2(x,|,f(theta_1))$ for all $xin X$ and $theta_1inTheta_1$.

**Candidate 2.** Let $x^{(N)} = (x_1, x_2, dots, x_N)$ be a sequence of data (each $x_nin X$). Let $hattheta_1(x^{(N)})$ and $hattheta_2(x^{(N)})$ be parameter estimates computed by fitting models 1 and 2 to this sequence of data according to a procedure that assumes they are independently generated, namely generated by the distributions

begin{align}

p_1^{(N)}(x^{(N)},|,theta )

&= p_1(x_1,|,theta)p_1(x_2,|,theta)cdots p_1(x_N,|theta) \

p_2^{(N)}(x^{(N)},|,theta )

&= p_2(x_1,|,theta)p_2(x_2,|,theta)cdots p_2(x_N,|theta).

end{align}

We say that $p_1$ and $p_2$ are *asymptotically inference-equivalent* provided they agree as closely as one desires given that they are fitted with enough data. More precisely, given any $epsilon > 0$, there exists an $N_*>0$** such that if $N>N_*$ then

begin{align}

|p_1(x,|,hattheta(x^{(N)})) – p_2(x,|,hattheta(x^{(N)}))| < epsilon

end{align}

for all $xin X$.

**Candidate 3.** $p_1$ and $p_2$ are *inference-equivalent* provided they agree when fitted to any amount of data, however large or small.

**Questions.**

Are definitions of this type adopted in the statistics literature? Are there any interesting, useful theorems proven about equivalent statistical models if so? Perhaps multiple types of equivalence like the candidates above are discussed in the literature in which case is there discussion of which definitions of equivalence imply one another?