Bounty: 100
Consider the simple regression model $y = a x + b + varepsilon$, where $x$ is a covariate, $y$ is the observed response and $varepsilon$ is the unobserved noise (with no distribution assumption). We can add to the model a covariate $omega$ which is a realisation of a gaussian noise, and this obviously increases the $R^2$ coefficient of the fit.
However, is the expected adjusted $R^2$ cofficient of the augmented model the same as the adjusted $R^2$ coefficient of the original model?
This would mean that the adjusted $R^2$ is apt at eliminating the inflation of $R^2$ which is due to random noise in the covariates. The answer may be completely trivial. In any case, I’m interested to learn more about this and in references (for a mathematically mature audience) on the subject.
Remark.
An example seems to indicate it may be the case. With R’s Ozone data, the adjusted $R^2$ coefficient of the model $texttt{O3}$ ~ $texttt{T12}$ is found as follows:
out = with(ozone, lm(O3 ~ T12))
summary(out)$adj.r.squared
## 0.2640621
For the model $texttt{O3}$ ~ $texttt{T12}$ + $texttt{noise}$, where $texttt{noise}$ is an instance of white noise, the average adjusted $R^2$ is almost the same:
r2.aug <- function() {
df = with(ozone, data.frame(O3, T12, noise=rnorm(50)))
summary(lm(df$O3 ~ df$T12 + df$noise))$adj.r.squared
}
set.seed(1)
mean(replicate(10000, r2.aug()))
## 0.264054
Get this bounty!!!