#StackBounty: #r #regression #logistic #mathematical-statistics Proving that logistic regression on $I(X>c)$ by $X$ itself recovers …

Bounty: 100

Backgrounds

Suppose that $X sim mathcal{N} (0,sigma^2)$, and define $Cequiv I(X>c)$ , for a given constant(decision boundary) $c$.

Now assume we perform a logistic regression:

$$mathrm{logit}(P(C=1)) sim beta_0 + beta_1X $$

Note that for logistic regression, the fitted $displaystyle -frac{hat{beta_0}}{hat{beta_1}}$ corresponds to the mean of underlying logistic distribution.


Problem

My hypothesis says the value should be the same, or at least similar as the criterion $c$, i.e.

$$
c approx -frac{hat{beta_0}}{hat{beta_1}}
$$

I would like to prove or reject the above argument.


Simulation

It is really hard to analytically derive the distribution of $displaystyle -frac{hat{beta_0}}{hat{beta_1}}$. Therefore with R, I simulated for various possible sets of $(sigma, c)$ to test my hypothesis. Suppose we set, for instance,

  • $sigma: 5,10,15,20$
  • $c : -5,4,12$
N = 1000
for(sig in c(5,10,15,20)){
  for (c in c(-5, 4, 12)){
    X = rnorm(N, sd=sig)
    C = (X > c)*1
    DATA = data.frame(x=X, c=C)
    coef = summary(glm(C ~ X, DATA, family = "binomial"))$coefficients
    print(sprintf("True c: %.2f, Estimated c: %.2f", c, -coef[1,1]/coef[2,1]))
  }
}

Note the true $c$ and the estimated $-hat{beta_0}big/hat{beta_1}$ are similar as seen in the following output:

[1] "True c: -5.00, Estimated c: -5.01"
[1] "True c: 4.00, Estimated c: 4.01"
[1] "True c: 12.00, Estimated c: 11.83"
[1] "True c: -5.00, Estimated c: -5.01"
[1] "True c: 4.00, Estimated c: 3.98"
[1] "True c: 12.00, Estimated c: 11.97"
[1] "True c: -5.00, Estimated c: -5.01"
[1] "True c: 4.00, Estimated c: 3.97"
[1] "True c: 12.00, Estimated c: 12.00"
[1] "True c: -5.00, Estimated c: -5.01"
[1] "True c: 4.00, Estimated c: 3.99"
[1] "True c: 12.00, Estimated c: 12.00"

Try to prove

To compute maximum likelihood estimates(MLE), we have the log-likelihood to maximize:

$$
begin{aligned}
widehat{(beta_0, beta_1)} &= mathrm{argmax}{(beta_0, beta_1)} mathrm{LogLik}(beta_0, beta_1) [8pt]
&approx mathrm{argmax}
{(beta_0, beta_1)} mathbb{E}X mathrm{LogLik}(beta_0, beta_1) [8pt]
&= mathrm{argmax}
{(beta_0, beta_1)} mathbb{E}X left[ Ccdot(beta_0 + beta_1X) – log[1 + exp(beta_0 + beta_1X) right] [8pt]
&= mathrm{argmax}
{(beta_0, beta_1)} mathbb{E}_X left[ I(X > c) cdot(beta_0 + beta_1X) – log[1 + exp(beta_0 + beta_1X) right] [8pt]
end{aligned}
$$

Note that

  • $displaystyle mathbb{E}_X(I(X>c)) = P(X>c) = 1-Phi(c/sigma)$
  • $displaystyle mathbb{E}_X(XI(X>c)) = mathbb{E}_X left(Truncmathcal{N}(0,sigma^2,min=c right) = sigma frac{phi(c/sigma)}{1-Phi(c/sigma)}$ (Wiki-Truncated Normal Distribution)

I’m currently finding $mathbb{E}X log(1+exp(beta_0 + beta_1X))$. However, I’m not sure if it is a valid approach. For instance if $mathbb{E}_X$ is a linear function of $beta_0,beta_1$ then $mathrm{argmax}{(beta_0, beta_1)} mathbb{E}_X$ may have no solution.

Any help will be appreciated.


Get this bounty!!!

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.