*Bounty: 50*

*Bounty: 50*

Related question: Hypothesis testing: Why is a null model that fits the data well better than one that doesn't?

I simulated a response, `y`

, that is influenced by a covariate but not by the “treatment” (the thing I care about) as follows:

```
n <- 1e3
covar_effect <- 1
trt_effect <- 0
covar <- rnorm(n = n)
trt <- rnorm(n = n)
y <- rnorm(n = n, mean = covar*covar_effect + trt*trt_effect)
```

I computed the likelihood ratio statistic (LRS) between a null model that omits `trt`

and an alternative model that includes `trt`

in two ways.

- both the null and alternative model omit
`covar`

- both the null and alternative model include
`covar`

In both cases, the distribution of the LRS was $chi^2_1$, as you can see in this figure:

Here’s the gist showing how I ran this simulation in R.

Then, I repeated the process, but simulated a situation where there actually is a treatment effect:

```
n <- 1e3
covar_effect <- 1
trt_effect <- 0.1
covar <- rnorm(n = n)
trt <- rnorm(n = n)
y <- rnorm(n = n, mean = covar*covar_effect + trt*trt_effect)
```

Consistent with intuition, the LRS was greater between the null and alternative that include the covariate than between the null and alternative that omit it:

If I were thinking about things from a parameter estimation perspective and using a Wald test to test whether the effect of `trt`

is zero, I could readily understand why including the covariate in the null and alternative models would increase my power to reject the null — the standard error of the effect estimate is $frac{hat{sigma}}{V(x)}$, so anything that decreases $sigma$ will increase the precision of the estimate.

But I am not thinking about things from the perspective of a Wald test. I am thinking about a likelihood ratio. There are some likelihood relationships that are obvious to me (I hope this notation is clear):

LRSs will be positive:

- L(y|trt) > L(y)
- L(y|trt, covar) > L(y|covar)

These comparisons are directly relevant but, the models are nested so the inequality seems obvious:

- L(y|covar) > L(y)
- L(y|trt, covar) > L(y|trt)

But the increased LRS with the covariate modeled doesn’t follow directly from any of that. It relates to this inequality:

(L(y|trt,covar) – L(y|covar)) > (L(y|trt) – L(y))

which I can rearrange into:

L(y|trt, covar) + L(y) > L(y|covar) + L(y|trt)

Now why would that be true? Can anyone offer a mathematical or conceptual explanation of why it’s beneficial to model a covariate in both the null and alternative models?