#StackBounty: #r #logistic #simulation #accuracy Simulation of logistic regression's accuracy

Bounty: 50

I ran a MC simulation of $10^5$ GLM regressions (logistic, logit link) in R. To do so, I assumed:

  • The outcomes ($y$) were repeatedly sampled from a Bernoulli distribution ($N=1000$)
  • The one explanatory variable ($x$) was sampled independently from y from a half-normal distribution ($x≥0$)
  • I then calculated the accuracy of predictions (with cutoff 0.5) as true positive + true negatives over all N

Naïvely, I was perhaps expecting an mean/median accuracy of 0.5, but that wasn’t true. The average accuracy was around 51.5%. Is there a good intuition or theoretical result for this?

Get this bounty!!!

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.