#StackBounty: #hypothesis-testing #modeling #covariate Hypothesis testing: Why is a null model that fits the data well better than one …

Bounty: 50

Let’s say we have two models: a null model, $M_0$, and an alternative model $M_1$. The only difference between them is that, in $M_0$ one parameter is fixed at $0$ and in $M_1$, that parameter is fixed at the value that maximizes the likelihood of model $M_1$. This is a typical setup for a likelihood ratio test.

My intuition is that the better $M_0$ describes the data-generating process, and thus the less the residual variation in the fitted model, the better. By “better”, I mean for a given sample size, effect size, and false positive rate, I will have more power to reject the null.

That’s a bit hand-wavey. I’ll make a simulation with a linear regression model.


lm_lrs_no_covar <- function(y, x) {
    2*(logLik(lm(formula = y ~ x)) - logLik(lm(formula = y ~ 1)))

lm_lrs_yes_covar <- function(y, x, z) {
  2*(logLik(lm(formula = y ~ x + z)) - logLik(lm(formula = y ~ z)))

n <- 1e2
num_sims <- 1e4

no_covar <- yes_covar <- rep(NA, num_sims)

for (sim_idx in 1:num_sims) {

  x <- runif(n = n)
  z <- runif(n = n)
  y <- rnorm(n = n, mean = 0.2*x + z)

  yes_covar[sim_idx] <- lm_lrs_yes_covar(y = y, x = x, z = z)
  no_covar[sim_idx] <- lm_lrs_no_covar(y = y, x = x)

plot(x = sort(no_covar),
     y = sort(yes_covar),
     type = 'l')
abline(a = 0, b = 1)

In fact, this plot shows that the LR statistic from the model with the covariate takes the same distribution as the LR statistic from the model without the covariate. Since both the “with covariate” and the “without covariate” tests have the same differences in degrees of freedom between null and alternative models (1), they both have the same power to reject the null.

enter image description here

But in the context of Wald testing, it seems obvious that improved the model fit and thus reducing the residual variance must improve the standard error of the estimate and therefore improve the power to reject the null at any given effect size.

Where have I gone wrong? Surely a model that fits the data well must outperform one that doesn’t? Surely any benefit reaped by a Wald test must somehow also be reaped by an LR test? (Neyman-Pearson lemma)

Get this bounty!!!