Bounty: 100
Suppose I have the following model: $$Y = mu + epsilon = Xbeta + epsilon,$$ where $Y$ is $n times 1$, $X$ is $n times p$, $beta$ is $p times 1$, and $epsilon$ is $n times 1$. I assume that $epsilon$ are independent with mean 0 and variance $sigma^2I$.
In OLS, the fitted values are $hat{Y} = HY$, where $H = X(X^TX)^{-1}X^T$ is the $N times N$ hat matrix. I want to find the MSE of $hat{Y}$.
By the bias-variance decomposition, I know that
begin{align}
MSE(hat{Y}) &= bias^2(hat{Y}) + var(hat{Y})\
&= (E[HY] – mu)^T(E[HY] – mu) + var(HY)\
&= (Hmu – mu)^T(Hmu – mu) + sigma^2H\
&= 0 + sigma^2H
end{align}
I’m confused by the dimension in the last step. The $bias^2$ term is a scalar. However, $var(hat{Y})$ is an $N times N$ matrix. How can one add a scalar to an $N times N$ matrix where $N neq 1$?