*Bounty: 50*

*Bounty: 50*

Think of a linear regression model:

$$

Y=Xbeta+epsilon

$$

where $epsilon|Xsim Nleft(0,sigma^{2}right)$. The vector of

parameters $beta$ can be consistenly estimated by OLS. I have one

conceptual question here: first, $epsilon$ has units of $Y.$ However,

when we minimize the (unweighted) squared sum of residuals, we treat

each observation the same. Conceptually, think of two different values

of $X.$ If the true conditional mean function is linear, that is

in the population:

$$

mathbb{E}left(Y|Xright)=Xbeta

$$

then the conditional mean of $Y$ changes with values of $X.$ If

$beta>0$ , then the value of the conditional mean grows with the

values of $X.$ However, under homoskedasticity, the variance of the

error term is assumed to be constant. However, for large values of

the conditional mean, the **percentage deviation** of the disturbance

decreases and tends to 0. In other words, the error dispersion becomes

more and more miniscule. If $X$ is non-stochastic, then the conditional

variance of $Y$ is $sigma^{2}$ as well. As such, the conditional

coefficient of variaion of $Y$ is:

$$

frac{sigma^{2}}{Xbeta}

$$

which goes to $0,$ simply as $X$ increases.. I guess my question

is isn’t the heteroskedastic assumption restrictive in its basic principle.

In other words, even if heteroscedasticity is not related to the value

of the regressors, isn’t this conceptually incorrect as we are imposing

that the average **percentage** deviation of the error term from

the conditional mean decreases without bound? Should the error term

not also be *scaled* by the value of $Y?$ Another way of stating is: an error of an equal magnitude means different things for different values of the regressand/regressors. An error of 1 when Y=10 is much more than an error of 1 when Y=100000; but OLS treats these as symmetric.