#StackBounty: #bayesian #information-theory #mutual-information Maximizing the information gain on a Gaussian RV with a noisy compariso…

Bounty: 50

The question

Let

  • $X sim mathcal{N}(0,1)$ be a random variable denoting the location of a target on the real line.
  • $Y_a$ be a binary random variable encoding the (noisy) answer to the question: “is $X > a$ ?”, with the following conditional distribution:
    $$p(y_a = 1 mid X = x) = sigma(x – a),$$
    where $sigma(x)$ is a sigmoidal function, for example the logistic function $1 / [1 + exp(-x)]$.

my question is a follows: which value of $a$ should I choose in order to maximize the information I gain about $X$ by observing $Y_a$?

More precisely, I am interested in maximizing the mutual information:

$$max_a I(X, Y_a).$$

This can be understood as maximizing the expected reduction in the posterior’s entropy: $I(X, Y_a) = H(X) – mathbf{E}_{Y_a} [ H(X mid Y_a = y) ]$


My attempt at a solution

I am convinced that one should pick $a = 0$, i.e.:
$$I(X, Y_0) ge I(X, Y_a) quad forall a in mathbf{R},$$
but I have not yet been able to prove it formally.

I started by rewriting the mutual information as follows:
$$
begin{align}
I(X, Y_a) &= H(Y_a) – H(Y_a | X) \
&= H left[int_{-infty}^{+infty} sigma(x – a) mathcal{N}(x mid 0, 1) dx right]
– int_{-infty}^{+infty} H[sigma(x – a)] mathcal{N}(x mid 0, 1) dx
end{align}
$$
and then I took the derivative w.r.t $a$:
$$
begin{align}
frac{partial}{partial a}I(X, Y_a) =
&mathrm{logit} left[ int_{-infty}^{+infty} sigma(x – a) mathcal{N}(x mid 0, 1) dx right]
int_{-infty}^{+infty} sigma'(x – a) mathcal{N}(x mid 0, 1) dx \
&- int_{-infty}^{+infty} mathrm{logit}[sigma(x – a)] sigma'(x – a) mathcal{N}(x mid 0, 1) dx
end{align}
$$

It is then easy to verify that $a = 0$ is a stationary point (in the first term, the integral inside the logit evaluates to $1/2$, and in the second term, we take the integral of an odd function). So $a = 0$ is at least a local maximum.

I haven’t managed to show that this is the only zero of the derivative (Note that, clearly, $I(X, Y_a)$ is not concave in $a$). Numerical simulations indicate that there is no other zero, however.

Some other “facts” that I collected and that might be helpful:

  • $I(X, Y_a) to 0$ as $|a| to infty$
  • If we pick the $sigma(x – a) = epsilon + (1-2epsilon)mathbf{1}_{{X>a}}$, i.e., a “noisy” indicator function the problem becomes much easier. I can show that
    $$I(X, Y_a) = H[epsilon + (1-2epsilon)Phi(a)] – H(epsilon),$$
    where $Phi(a)$ is the Gaussian CDF. Then, $a=0$ is clearly the only maximizer.
  • I know that $I(X, Y)$ is convex in $p_X$ and concave in $p_{Y mid X}$. But that doesn’t help here, I think, because my distributions are parametric (and don’t form a convex set).


Get this bounty!!!

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.