*Bounty: 50*

*Bounty: 50*

lets say I run the simple regression, $y_i = beta_o + beta_1x_i + epsilon_i$.. Assume $cov(epsilon,x)$=0

This yields the formula people write in terms of covariances for the slope parameter:

$hat{beta_1}$ = $frac{sum(x-bar{x})y_i}{sum({x-bar{x})^2}}$

and then plugging in the true assumed dgp for y, we get:

= $beta + frac{sum(x-bar{x})epsilon_i}{sum({x-bar{x})^2}}$

With this, I have a few questions.

- is this is now a statement not about the population, but the ‘draw’ of $epsilon_i$‘s we so happened to draw in this sample? so it is the numerator second term the $textit{sample}$ covariance between epsilon and x? if true, can I think of each random sample as a given draw of $epsilon_i$‘s, and that draw is what drives the sampling variability of the estimator?

2.taking probability limits, the covaraince =0 seems to be sufficient for consistency of the estimator. however, is covariance only not sufficient for unbiasedness? is mean indepence of $epsilon$ and x needec for finite sample properties?

- An also a question about thinking about ‘identification’. if i think of the model above as the causal model, and I can say my ols is consistent, does that mean I have ‘identified’ the true $beta_1$? so can it hink of the model not being identified if the $cov(epsilon,x) neq 0$, which would say that $hat{beta}$ converges in probability to the true $beta_1$ + some other term? so I fail to isolate the underlying parameter?