*Bounty: 50*

*Bounty: 50*

## The question

Let

- $X sim mathcal{N}(0,1)$ be a random variable denoting the location of a target on the real line.
- $Y_a$ be a binary random variable encoding the (noisy) answer to the question: “is $X > a$ ?”, with the following conditional distribution:

$$p(y_a = 1 mid X = x) = sigma(x – a),$$

where $sigma(x)$ is a sigmoidal function, for example the logistic function $1 / [1 + exp(-x)]$.

my question is a follows: **which value of $a$ should I choose in order to maximize the information I gain about $X$ by observing $Y_a$?**

More precisely, I am interested in maximizing the mutual information:

$$max_a I(X, Y_a).$$

This can be understood as maximizing the expected reduction in the posterior’s entropy: $I(X, Y_a) = H(X) – mathbf{E}_{Y_a} [ H(X mid Y_a = y) ]$

## My attempt at a solution

I am convinced that one should pick $a = 0$, i.e.:

$$I(X, Y_0) ge I(X, Y_a) quad forall a in mathbf{R},$$

but I have not yet been able to prove it formally.

I started by rewriting the mutual information as follows:

$$

begin{align}

I(X, Y_a) &= H(Y_a) – H(Y_a | X) \

&= H left[int_{-infty}^{+infty} sigma(x – a) mathcal{N}(x mid 0, 1) dx right]

– int_{-infty}^{+infty} H[sigma(x – a)] mathcal{N}(x mid 0, 1) dx

end{align}

$$

and then I took the derivative w.r.t $a$:

$$

begin{align}

frac{partial}{partial a}I(X, Y_a) =

&mathrm{logit} left[ int_{-infty}^{+infty} sigma(x – a) mathcal{N}(x mid 0, 1) dx right]

int_{-infty}^{+infty} sigma'(x – a) mathcal{N}(x mid 0, 1) dx \

&- int_{-infty}^{+infty} mathrm{logit}[sigma(x – a)] sigma'(x – a) mathcal{N}(x mid 0, 1) dx

end{align}

$$

It is then easy to verify that $a = 0$ is a stationary point (in the first term, the integral inside the logit evaluates to $1/2$, and in the second term, we take the integral of an odd function). So $a = 0$ is at least a local maximum.

I haven’t managed to show that this is the *only* zero of the derivative (Note that, clearly, $I(X, Y_a)$ is not concave in $a$). Numerical simulations indicate that there is no other zero, however.

Some other “facts” that I collected and that might be helpful:

- $I(X, Y_a) to 0$ as $|a| to infty$
- If we pick the $sigma(x – a) = epsilon + (1-2epsilon)mathbf{1}_{{X>a}}$, i.e., a “noisy” indicator function the problem becomes much easier. I can show that

$$I(X, Y_a) = H[epsilon + (1-2epsilon)Phi(a)] – H(epsilon),$$

where $Phi(a)$ is the Gaussian CDF. Then, $a=0$ is clearly the only maximizer. - I know that $I(X, Y)$ is convex in $p_X$ and concave in $p_{Y mid X}$. But that doesn’t help here, I think, because my distributions are parametric (and don’t form a convex set).