I ran a MC simulation of $$10^5$$ GLM regressions (logistic, logit link) in R. To do so, I assumed:

• The outcomes ($$y$$) were repeatedly sampled from a Bernoulli distribution ($$N=1000$$)
• The one explanatory variable ($$x$$) was sampled independently from y from a half-normal distribution ($$x≥0$$)
• I then calculated the accuracy of predictions (with cutoff 0.5) as true positive + true negatives over all N

Naïvely, I was perhaps expecting an mean/median accuracy of 0.5, but that wasn’t true. The average accuracy was around 51.5%. Is there a good intuition or theoretical result for this?

