## #StackBounty: #regression #mathematical-statistics #multivariate-analysis #least-squares #covariance Robust Covariance in Multivariate …

### Bounty: 50

Assume we are in the OLS setting with $$y = Xbeta + epsilon$$. When $$y$$ is a response vector, and $$X$$ are covariates, we can get two types of covariance estimates:

The homoskedastic covariance
$$cov(hat{beta}) = (X’X)^{-1} (e’e)$$, and robust covariance
$$cov(hat{beta}) = (X’X)^{-1} X’ diag(e^2) X (X’X)^{-1}$$.

I’m looking for help on how to derive these covariances when $$Y$$ is a response matrix, and $$E$$ is a residual matrix. There is a fairly detailed derivation on slide 49 here, but I think there are some steps missing.

For the homoskedastic case, each column of $$E$$ is assumed to have a covariance structure of $$sigma_{kk} I$$, which is the usual structure for a single vector response. Each row of $$E$$ is also assumed to be i.i.d with covariance $$Sigma$$.

The derivation starts with collapsing the $$Y$$ and $$E$$ matrices back into vectors. In this structure $$Var(Vec(E)) = Sigma otimes I$$.

First question: I understand the kronecker product produces a block diagonal matrix with $$Sigma$$ on the block diagonal, but where did $$sigma_{kk}$$ go to? Is it intentional that the $$sigma_{kk}$$ values are pooled together so that the covariance is constant on the diagonal, similar to the vector response case?

Using $$Sigma otimes I$$, the author gives a derivation for $$cov(hat{beta})$$ on slide 66.

begin{align} cov(hat{beta}) &= ((X’X)^{-1} X’ otimes I) (I otimes Sigma) (X (X’X)^{-1} otimes I) \ &= (X’X)^{-1} otimes Sigma end{align}.

The first line looks like a standard sandwich estimator. The second line is an elegant reduction because of the I matrix and properties of the kronecker product.

Second question: What is the extension for robust covariances?
I imagine we need to revisit the meat of the sandwich estimator, ($$I otimes Sigma$$), which comes from the homoskedastic assumption per response in the Y matrix. If we use robust covariances, we should say that each column of $$E$$ has variance $$diag(e_k^2)$$. We can retain the second assumption that rows in E are i.i.d. Since the different columns in $$E$$ no longer follow the pattern $$scalar * I$$, I don’t believe $$Var(Vec(E))$$ factors into a kronecker product as it did before. Perhaps $$Var(Vec(E))$$ is some diagonal matrix, $$D$$?

Revisiting the sandwich-like estimator, is the extension for robust covariance

begin{align} cov(hat{beta}) &= ((X’X)^{-1} X’ otimes I) (D) (X (X’X)^{-1} otimes I) \ &= ? end{align}.

This product doesn’t seem to reduce; we cannot invoke the mixed product property because D does not take the form of a scalar multiplier on I.

The first question is connected to this second question. In the first question on homoskedastic variances, $$sigma_{kk}$$ disappeared, allowing $$Var(Vec(E))$$ to take the form $$Sigma otimes I$$. But if the diagonal of $$Var(Vec(E))$$ was not constant, it would actually have the same structure as the robust covariance case ($$Var(Vec(E))$$ is some diagonal matrix $$D$$). So, what allowed $$sigma_{kk}$$ to disappear, and is there a similar trick for the robust case that would allow the $$D$$ matrix to factor?

Get this bounty!!!

## #StackBounty: #least-squares #measurement-error Measurement error in one indep variable in OLS with multiple regression

### Bounty: 50

Suppose I regress (with OLS) $$y$$ on $$x_1$$ and $$x_2$$. Suppose I have i.i.d. sample of size n, and that $$x_1$$ is observed with error but $$y$$ and $$x_2$$ are observed without error. What is the probability limit of the estimated coefficient on $$x_1$$?

Let us suppose for tractability that the measurement error of $$x_1$$ is "classical". That is the measurement error is normally distributed with mean 0 and is uncorrelated with $$x_2$$ or the error term.

Get this bounty!!!

## #StackBounty: #least-squares #measurement-error Measurement error in one indep variable in OLS with multivariate regression

### Bounty: 50

Suppose I regress (with OLS) $$y$$ on $$x_1$$ and $$x_2$$. Suppose I have i.i.d. sample of size n, and that $$x_1$$ is observed with error but $$y$$ and $$x_2$$ are observed without error. What is the probability limit of the estimated coefficient on $$x_1$$?

Let us suppose for tractability that the measurement error of $$x_1$$ is "classical". That is the measurement error is normally distributed with mean 0 and is uncorrelated with $$x_2$$ or the error term.

Get this bounty!!!

## #StackBounty: #regression #multiple-regression #least-squares #mse What's the MSE of \$hat{Y}\$ in ordinary least squares using bias…

### Bounty: 100

Suppose I have the following model: $$Y = mu + epsilon = Xbeta + epsilon,$$ where $$Y$$ is $$n times 1$$, $$X$$ is $$n times p$$, $$beta$$ is $$p times 1$$, and $$epsilon$$ is $$n times 1$$. I assume that $$epsilon$$ are independent with mean 0 and variance $$sigma^2I$$.

In OLS, the fitted values are $$hat{Y} = HY$$, where $$H = X(X^TX)^{-1}X^T$$ is the $$N times N$$ hat matrix. I want to find the MSE of $$hat{Y}$$.

By the bias-variance decomposition, I know that

begin{align} MSE(hat{Y}) &= bias^2(hat{Y}) + var(hat{Y})\ &= (E[HY] – mu)^T(E[HY] – mu) + var(HY)\ &= (Hmu – mu)^T(Hmu – mu) + sigma^2H\ &= 0 + sigma^2H end{align}

I’m confused by the dimension in the last step. The $$bias^2$$ term is a scalar. However, $$var(hat{Y})$$ is an $$N times N$$ matrix. How can one add a scalar to an $$N times N$$ matrix where $$N neq 1$$?

Get this bounty!!!

## #StackBounty: #regression #multiple-regression #least-squares #mse What's the MSE of \$hat{Y}\$ in ordinary least squares using bias…

### Bounty: 100

Suppose I have the following model: $$Y = mu + epsilon = Xbeta + epsilon,$$ where $$Y$$ is $$n times 1$$, $$X$$ is $$n times p$$, $$beta$$ is $$p times 1$$, and $$epsilon$$ is $$n times 1$$. I assume that $$epsilon$$ are independent with mean 0 and variance $$sigma^2I$$.

In OLS, the fitted values are $$hat{Y} = HY$$, where $$H = X(X^TX)^{-1}X^T$$ is the $$N times N$$ hat matrix. I want to find the MSE of $$hat{Y}$$.

By the bias-variance decomposition, I know that

begin{align} MSE(hat{Y}) &= bias^2(hat{Y}) + var(hat{Y})\ &= (E[HY] – mu)^T(E[HY] – mu) + var(HY)\ &= (Hmu – mu)^T(Hmu – mu) + sigma^2H\ &= 0 + sigma^2H end{align}

I’m confused by the dimension in the last step. The $$bias^2$$ term is a scalar. However, $$var(hat{Y})$$ is an $$N times N$$ matrix. How can one add a scalar to an $$N times N$$ matrix where $$N neq 1$$?

Get this bounty!!!

## #StackBounty: #regression #multiple-regression #least-squares #mse What's the MSE of \$hat{Y}\$ in ordinary least squares using bias…

### Bounty: 100

Suppose I have the following model: $$Y = mu + epsilon = Xbeta + epsilon,$$ where $$Y$$ is $$n times 1$$, $$X$$ is $$n times p$$, $$beta$$ is $$p times 1$$, and $$epsilon$$ is $$n times 1$$. I assume that $$epsilon$$ are independent with mean 0 and variance $$sigma^2I$$.

In OLS, the fitted values are $$hat{Y} = HY$$, where $$H = X(X^TX)^{-1}X^T$$ is the $$N times N$$ hat matrix. I want to find the MSE of $$hat{Y}$$.

By the bias-variance decomposition, I know that

begin{align} MSE(hat{Y}) &= bias^2(hat{Y}) + var(hat{Y})\ &= (E[HY] – mu)^T(E[HY] – mu) + var(HY)\ &= (Hmu – mu)^T(Hmu – mu) + sigma^2H\ &= 0 + sigma^2H end{align}

I’m confused by the dimension in the last step. The $$bias^2$$ term is a scalar. However, $$var(hat{Y})$$ is an $$N times N$$ matrix. How can one add a scalar to an $$N times N$$ matrix where $$N neq 1$$?

Get this bounty!!!

## #StackBounty: #regression #multiple-regression #least-squares #mse What's the MSE of \$hat{Y}\$ in ordinary least squares using bias…

### Bounty: 100

Suppose I have the following model: $$Y = mu + epsilon = Xbeta + epsilon,$$ where $$Y$$ is $$n times 1$$, $$X$$ is $$n times p$$, $$beta$$ is $$p times 1$$, and $$epsilon$$ is $$n times 1$$. I assume that $$epsilon$$ are independent with mean 0 and variance $$sigma^2I$$.

In OLS, the fitted values are $$hat{Y} = HY$$, where $$H = X(X^TX)^{-1}X^T$$ is the $$N times N$$ hat matrix. I want to find the MSE of $$hat{Y}$$.

By the bias-variance decomposition, I know that

begin{align} MSE(hat{Y}) &= bias^2(hat{Y}) + var(hat{Y})\ &= (E[HY] – mu)^T(E[HY] – mu) + var(HY)\ &= (Hmu – mu)^T(Hmu – mu) + sigma^2H\ &= 0 + sigma^2H end{align}

I’m confused by the dimension in the last step. The $$bias^2$$ term is a scalar. However, $$var(hat{Y})$$ is an $$N times N$$ matrix. How can one add a scalar to an $$N times N$$ matrix where $$N neq 1$$?

Get this bounty!!!

## #StackBounty: #regression #multiple-regression #least-squares #mse What's the MSE of \$hat{Y}\$ in ordinary least squares using bias…

### Bounty: 100

Suppose I have the following model: $$Y = mu + epsilon = Xbeta + epsilon,$$ where $$Y$$ is $$n times 1$$, $$X$$ is $$n times p$$, $$beta$$ is $$p times 1$$, and $$epsilon$$ is $$n times 1$$. I assume that $$epsilon$$ are independent with mean 0 and variance $$sigma^2I$$.

In OLS, the fitted values are $$hat{Y} = HY$$, where $$H = X(X^TX)^{-1}X^T$$ is the $$N times N$$ hat matrix. I want to find the MSE of $$hat{Y}$$.

By the bias-variance decomposition, I know that

begin{align} MSE(hat{Y}) &= bias^2(hat{Y}) + var(hat{Y})\ &= (E[HY] – mu)^T(E[HY] – mu) + var(HY)\ &= (Hmu – mu)^T(Hmu – mu) + sigma^2H\ &= 0 + sigma^2H end{align}

I’m confused by the dimension in the last step. The $$bias^2$$ term is a scalar. However, $$var(hat{Y})$$ is an $$N times N$$ matrix. How can one add a scalar to an $$N times N$$ matrix where $$N neq 1$$?

Get this bounty!!!

## #StackBounty: #regression #multiple-regression #least-squares #mse What's the MSE of \$hat{Y}\$ in ordinary least squares?

### Bounty: 100

Suppose I have the following model: $$Y = mu + epsilon = Xbeta + epsilon,$$ where $$Y$$ is $$n times 1$$, $$X$$ is $$n times p$$, $$beta$$ is $$p times 1$$, and $$epsilon$$ is $$n times 1$$. I assume that $$epsilon$$ are independent with mean 0 and variance $$sigma^2I$$.

In OLS, the fitted values are $$hat{Y} = HY$$, where $$H = X(X^TX)^{-1}X^T$$ is the $$N times N$$ hat matrix. I want to find the MSE of $$hat{Y}$$.

By the bias-variance decomposition, I know that

begin{align} MSE(hat{Y}) &= bias^2(hat{Y}) + var(hat{Y})\ &= (E[HY] – mu)^T(E[HY] – mu) + var(HY)\ &= (Hmu – mu)^T(Hmu – mu) + sigma^2H\ &= 0 + sigma^2H end{align}

I’m confused by the dimension in the last step. The $$bias^2$$ term is a scalar. However, $$var(hat{Y})$$ is an $$N times N$$ matrix. How can one add a scalar to an $$N times N$$ matrix where $$N neq 1$$?

Get this bounty!!!

## #StackBounty: #regression #multiple-regression #least-squares #mse What's the MSE of \$hat{Y}\$ in ordinary least squares?

### Bounty: 100

Suppose I have the following model: $$Y = mu + epsilon = Xbeta + epsilon,$$ where $$Y$$ is $$n times 1$$, $$X$$ is $$n times p$$, $$beta$$ is $$p times 1$$, and $$epsilon$$ is $$n times 1$$. I assume that $$epsilon$$ are independent with mean 0 and variance $$sigma^2I$$.

In OLS, the fitted values are $$hat{Y} = HY$$, where $$H = X(X^TX)^{-1}X^T$$ is the $$N times N$$ hat matrix. I want to find the MSE of $$hat{Y}$$.

By the bias-variance decomposition, I know that

begin{align} MSE(hat{Y}) &= bias^2(hat{Y}) + var(hat{Y})\ &= (E[HY] – mu)^T(E[HY] – mu) + var(HY)\ &= (Hmu – mu)^T(Hmu – mu) + sigma^2H\ &= 0 + sigma^2H end{align}

I’m confused by the dimension in the last step. The $$bias^2$$ term is a scalar. However, $$var(hat{Y})$$ is an $$N times N$$ matrix. How can one add a scalar to an $$N times N$$ matrix where $$N neq 1$$?

Get this bounty!!!