#StackBounty: #bayesian #information-theory #mutual-information Maximizing the information gain on a Gaussian RV with a noisy compariso…

The question

Let

• \$X sim mathcal{N}(0,1)\$ be a random variable denoting the location of a target on the real line.
• \$Y_a\$ be a binary random variable encoding the (noisy) answer to the question: “is \$X > a\$ ?”, with the following conditional distribution:
\$\$p(y_a = 1 mid X = x) = sigma(x – a),\$\$
where \$sigma(x)\$ is a sigmoidal function, for example the logistic function \$1 / [1 + exp(-x)]\$.

my question is a follows: which value of \$a\$ should I choose in order to maximize the information I gain about \$X\$ by observing \$Y_a\$?

More precisely, I am interested in maximizing the mutual information:

\$\$max_a I(X, Y_a).\$\$

This can be understood as maximizing the expected reduction in the posterior’s entropy: \$I(X, Y_a) = H(X) – mathbf{E}_{Y_a} [ H(X mid Y_a = y) ]\$

My attempt at a solution

I am convinced that one should pick \$a = 0\$, i.e.:
\$\$I(X, Y_0) ge I(X, Y_a) quad forall a in mathbf{R},\$\$
but I have not yet been able to prove it formally.

I started by rewriting the mutual information as follows:
\$\$
begin{align}
I(X, Y_a) &= H(Y_a) – H(Y_a | X) \
&= H left[int_{-infty}^{+infty} sigma(x – a) mathcal{N}(x mid 0, 1) dx right]
– int_{-infty}^{+infty} H[sigma(x – a)] mathcal{N}(x mid 0, 1) dx
end{align}
\$\$
and then I took the derivative w.r.t \$a\$:
\$\$
begin{align}
frac{partial}{partial a}I(X, Y_a) =
&mathrm{logit} left[ int_{-infty}^{+infty} sigma(x – a) mathcal{N}(x mid 0, 1) dx right]
int_{-infty}^{+infty} sigma'(x – a) mathcal{N}(x mid 0, 1) dx \
&- int_{-infty}^{+infty} mathrm{logit}[sigma(x – a)] sigma'(x – a) mathcal{N}(x mid 0, 1) dx
end{align}
\$\$

It is then easy to verify that \$a = 0\$ is a stationary point (in the first term, the integral inside the logit evaluates to \$1/2\$, and in the second term, we take the integral of an odd function). So \$a = 0\$ is at least a local maximum.

I haven’t managed to show that this is the only zero of the derivative (Note that, clearly, \$I(X, Y_a)\$ is not concave in \$a\$). Numerical simulations indicate that there is no other zero, however.

Some other “facts” that I collected and that might be helpful:

• \$I(X, Y_a) to 0\$ as \$|a| to infty\$
• If we pick the \$sigma(x – a) = epsilon + (1-2epsilon)mathbf{1}_{{X>a}}\$, i.e., a “noisy” indicator function the problem becomes much easier. I can show that
\$\$I(X, Y_a) = H[epsilon + (1-2epsilon)Phi(a)] – H(epsilon),\$\$
where \$Phi(a)\$ is the Gaussian CDF. Then, \$a=0\$ is clearly the only maximizer.
• I know that \$I(X, Y)\$ is convex in \$p_X\$ and concave in \$p_{Y mid X}\$. But that doesn’t help here, I think, because my distributions are parametric (and don’t form a convex set).

Get this bounty!!!

This site uses Akismet to reduce spam. Learn how your comment data is processed.